Nouman Khalid, Developer in Lahore, Punjab, Pakistan
Nouman is available for hire
Hire Nouman

Nouman Khalid

Verified Expert  in Engineering

Data Engineer and Developer

Location
Lahore, Punjab, Pakistan
Toptal Member Since
November 3, 2022

Nouman is a senior data engineer with over seven years of experience building data-intensive applications, tackling challenging architectural and scalability problems, and collecting and sorting data in data-centric companies. He is helping a news publishing company become the first to fully understand user behavior and make infrastructure more robust, reusable, and scalable. With his solid background, Nouman is eager to take on new challenges and deliver outstanding results.

Portfolio

Data Kitchens
Amazon Web Services (AWS), Data Build Tool (dbt), SQL, Python 3, Dagster...
Axel Springer
Python 3, Apache Airflow, Snowflake, Spark, Amazon Web Services (AWS), ETL...
NorthBay Solutions
Apache Spark, Python 3, Node.js, Amazon Web Services (AWS), Data Lakes, ETL...

Experience

Availability

Full-time

Preferred Environment

Python 3, Amazon Web Services (AWS), SQL, Data Build Tool (dbt), Snowflake

The most amazing...

...design I've built is a reusable data ingestion framework using AWS.

Work Experience

Senior Data Engineer

2022 - PRESENT
Data Kitchens
  • Engineered a robust data orchestration pipeline framework from scratch, utilizing Airflow as the orchestrator, ensuring seamless data flow, monitoring, and error handling.
  • Leveraged dbt to create optimized transformation processes, enhancing data quality and reliability while reducing processing time by 55% on average.
  • Set up and managed multiple data ingestion pipelines for disparate sources, including RDS, NetSuite, HubSpot, and Salesforce, resulting in a 45% reduction in data acquisition time.
  • Successfully integrated Snowflake as the central data warehouse, enabling high-performance storage, querying, and scalability, resulting in 30% faster analytics.
  • Developed a comprehensive mart layer for BI using Lightdash, creating a user-friendly interface for business analysts to access and analyze data insights promptly.
  • Implemented data governance practices to ensure data accuracy, consistency, and compliance, leading to a significant reduction in data-related errors.
  • Worked closely with cross-functional teams to define data requirements, troubleshoot issues, and optimize data delivery, resulting in faster project delivery.
Technologies: Amazon Web Services (AWS), Data Build Tool (dbt), SQL, Python 3, Dagster, Snowflake, Amazon RDS, Stitch Data, REST APIs, Automation, Machine Learning, English, Query Optimization, Fivetran

Senior Data Engineer

2021 - 2022
Axel Springer
  • Designed and implemented real-time streaming solutions for user engagement and connection with the reporting dashboard.
  • Created structured dbt models to encapsulate complex data transformations, simplifying code maintenance and contributing to a significant decrease in error rates.
  • Orchestrated a seamless migration of custom Python data transformation processes from Apache Airflow to dbt, ensuring consistent and accurate data processing.
  • Maintained data quality and integrity, ensuring they were complete, accurate, consistent, and valuable.
  • Managed the real-time dashboard's complete extract, transform, and load (ETL) infrastructure.
  • Planned, designed, and supervised projects end to end.
  • Integrated third-party application programming interfaces (APIs) to collect advertisement reports.
Technologies: Python 3, Apache Airflow, Snowflake, Spark, Amazon Web Services (AWS), ETL, Serverless Framework, Data Warehousing, Apache Spark, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, Data Architecture, Databases, PostgreSQL, Dagster, Data Science, Data Governance, Visual Studio Code (VS Code), APIs, Bash, Big Data, Amazon Marketing Services (AMS), Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Data Transformation, Data Modeling, ELT, Hadoop, Microservices, REST APIs, Databricks, Looker, Analytics, Automation, Machine Learning, English, Query Optimization, Redshift, AWS SAM, Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight

Principal Data Engineer

2020 - 2021
NorthBay Solutions
  • Participated in developing a product for ingestion, transformation, data lake formation, and dataset visualization.
  • Worked on connectors of Amazon S3, Amazon Redshift, file transfer protocol (FTP) source, and flat files for ingestion.
  • Transformed and modeled the extracted data using dbt to create structured and optimized datasets for downstream analytics and reporting, enhancing data quality and enabling faster insights.
  • Created scalable architecture for transcript generator to handle unpredictable loads.
  • Migrated the large Oracle server to Amazon Aurora PostgreSQL database and saved the licensing costs.
Technologies: Apache Spark, Python 3, Node.js, Amazon Web Services (AWS), Data Lakes, ETL, Serverless Architecture, Data Engineering, Data Warehousing, Spark, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, SQL Server DBA, MySQL, Data Architecture, Databases, AWS Lambda, Healthcare, PostgreSQL, Retool, Lambda Functions, Data Science, Data Governance, Visual Studio Code (VS Code), APIs, Bash, Scala, NoSQL, Big Data, Data Analytics, BI Reporting, Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Azure, Hadoop, Docker, Microservices, REST APIs, Google Cloud, Automation, Machine Learning, English, Query Optimization, Redshift, AWS SAM, Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight

Senior Data Engineer

2019 - 2020
NorthBay Solutions
  • Created API services on a serverless framework for a cloud-based web app using Node.js 10.x.
  • Worked on the database migration from an on-premise Oracle server to an Amazon RDS Aurora PostgreSQL using AWS Database Migration Service (DMS).
  • Executed the ingestion processes on flat files using Amazon Athena, AWS Glue catalog, and AWS Glue crawlers.
  • Developed a custom Tableau dashboard as per management requirements.
  • Extracted data from tables and flat files from mixed systems, such as Oracle E-Business Suite (EBS) and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
  • Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Technologies: Node.js, Python 3, AWS Lambda, AWS Step Functions, Amazon CloudWatch, Redshift, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (SQS), API Gateways, AWS Glue, Amazon Athena, Amazon Aurora, Amazon RDS, Amazon Cognito, CI/CD Pipelines, Amazon S3 (AWS S3), ETL, Serverless Framework, Data Engineering, Data Warehousing, Serverless Architecture, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, MySQL, Data Architecture, Databases, Healthcare, PostgreSQL, Lambda Functions, MongoDB, Apache Kafka, Visual Studio Code (VS Code), APIs, Bash, NoSQL, Big Data, Data Analytics, BI Reporting, Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Docker, Microservices, REST APIs, Automation, Machine Learning, English, AWS SAM, Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight

Senior Data Engineer

2018 - 2019
Starzplay
  • Designed and created the specifications for a linear over-the-top (OTT) streaming network using AWS media services.
  • Performed the ingestion and transformation of on-premise data into Amazon S3 using AWS Glue PySpark and AWS Glue Python shell jobs.
  • Defined a data warehouse architecture (DWH) architecture, including dimensional modeling.
  • Developed a custom Tableau dashboard as per management requirements.
  • Extracted data from tables and flat files from mixed systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
  • Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Technologies: Node.js, Python 3, AWS Lambda, AWS Step Functions, Amazon CloudWatch, Redshift, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (SQS), API Gateways, AWS Glue, Amazon Athena, Amazon Aurora, Amazon RDS, Amazon EC2, Data Engineering, Data Warehousing, SQL, Pandas, Data Pipelines, Python, ETL, RDBMS, Database Architecture, MySQL, Data Architecture, Databases, PostgreSQL, Lambda Functions, Django, Visual Studio Code (VS Code), APIs, Data Analytics, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Microservices, Automation, Machine Learning, English, AWS SAM, Amazon Redshift Spectrum, Amazon QuickSight

Software Engineer

2015 - 2018
NorthBay Solutions
  • Sourced tables and flat files from heterogeneous systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline and loaded them in the staging area with Amazon Redshift.
  • Performed transformations on source tables and built dimensions and facts.
  • Created and maintained a serverless architecture using AWS services, including Amazon API Gateway, AWS Lambda, and Amazon RDS.
  • Used the AWS Kinesis stream for every event in the system and Amazon Athena to fetch data for the reporting layer.
  • Built a 4-tier QlikView Data (QVD) architecture in Qlik Sense to optimize the query performance and minimize the database workload.
Technologies: Redshift, Amazon RDS, AWS Step Functions, AWS Lambda, Amazon S3 (AWS S3), Amazon Cognito, API Gateways, Amazon DynamoDB, Data Engineering, Data Warehousing, Serverless Architecture, SQL, Data Pipelines, Python, ETL, Microsoft Power BI, RDBMS, Database Architecture, Data Architecture, Databases, Healthcare, PostgreSQL, Visual Studio Code (VS Code), Relational Databases, ELT, Microservices, English, Amazon Redshift Spectrum

Centralized Educational Platform

The company wanted to help higher education organizations work better. The goal was to ease the administrative load enabling users to return to learning and discovery. They wanted a web application to handle ingesting data from multiple sources to a centralized Amazon S3 location. The service will act as a data lake and access layer to consume data from Amazon S3.

Data Warehouse for a Video-on-demand Company

The company utilizes advanced technologies to provide a premium viewing experience with full HD and 4K content sourced from some of the most important studios in the entertainment business. The studios included 20th Century Studios, CBS, Disney, Lionsgate, Paramount, Showtime, Sony, Starz, Universal, and Warner Bros. They wanted to ingest, transform and validate the data consumed in reporting and machine learning.

Quantflare

The project revolves around several tasks with the ultimate goal of having Fortune 500 stocks, cryptocurrencies time series analysis, predictive modeling, and machine learning by getting the data from premium APIs and storing them in a relational database management system (RDBMS). The other milestone was to have a REST API on top of the obtained data for potential customers.
2015 - 2017

Master's Degree in Computer Science

LUMS - Lahore University of Management Sciences - Lahore, Pakistan

2011 - 2015

Bachelor's Degree in Computer Science

University of Engineering and Technology, Lahore - Lahore, Punjab, Pakistan

AUGUST 2023 - AUGUST 2026

AWS Certified Solutions Architect

Amazon Web Services

SEPTEMBER 2020 - SEPTEMBER 2023

AWS Certified Solutions Architect Associate

AWS

Languages

SQL, Python, Snowflake, Bash, Scala, JavaScript, Python 3

Frameworks

Apache Spark, Spark, Hadoop, Serverless Framework, Django

Libraries/APIs

Pandas, REST APIs, Node.js, Amazon EC2 API

Tools

Amazon Simple Queue Service (SQS), AWS Glue, Amazon Athena, Apache Airflow, Amazon Redshift Spectrum, Retool, DataGrip, Jupyter, AWS Step Functions, Amazon Cognito, Amazon CloudWatch, Amazon Simple Notification Service (Amazon SNS), Jenkins, AWS CloudFormation, Tableau, Microsoft Power BI, Amazon Elastic MapReduce (EMR), Stitch Data, Looker, Amazon QuickSight

Paradigms

Serverless Architecture, ETL, Microservices, Automation, Data Science

Platforms

AWS Lambda, Amazon EC2, Amazon Web Services (AWS), Azure, Docker, Databricks, Visual Studio Code (VS Code), Talend, Apache Kafka

Storage

Redshift, Data Pipelines, RDBMS, Database Architecture, MySQL, Databases, PostgreSQL, NoSQL, Relational Databases, SQL Server DBA, MongoDB, Amazon S3 (AWS S3), Amazon DynamoDB, Amazon Aurora, Data Lakes, Google Cloud

Industry Expertise

Healthcare

Other

Data Engineering, Data Warehousing, Amazon RDS, Data Build Tool (dbt), Data Architecture, Lambda Functions, APIs, Big Data, Data Analytics, Data Analysis, Big Data Architecture, Message Queues, Data Transformation, Data Modeling, ELT, English, Query Optimization, AWS SAM, Azure Databricks, Data Governance, BI Reporting, Machine Learning, Analytics, Fivetran, API Gateways, CI/CD Pipelines, Amazon API Gateway, Dagster, Amazon Marketing Services (AMS)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring