Nouman Khalid, Developer in Lahore, Punjab, Pakistan
Nouman is available for hire
Hire Nouman

Nouman Khalid

Verified Expert  in Engineering

Data Engineer and Developer

Lahore, Punjab, Pakistan

Toptal member since November 3, 2022

Bio

Nouman is a senior data engineer with over seven years of experience building data-intensive applications, tackling challenging architectural and scalability problems, and collecting and sorting data in data-centric companies. He is helping a news publishing company become the first to fully understand user behavior and make infrastructure more robust, reusable, and scalable. With his solid background, Nouman is eager to take on new challenges and deliver outstanding results.

Portfolio

WeGift
Amazon API Gateway, Snowflake, Data Build Tool (dbt), AWS Glue, Amazon Athena...
Data Kitchens
Amazon Web Services (AWS), Data Build Tool (dbt), SQL, Python 3, Dagster...
Axel Springer
Python 3, Apache Airflow, Snowflake, Spark, Amazon Web Services (AWS), ETL...

Experience

  • Data Pipelines - 8 years
  • Data Engineering - 8 years
  • Amazon Web Services (AWS) - 8 years
  • Snowflake - 6 years
  • Data Warehousing - 5 years
  • Redshift - 5 years
  • Pandas - 5 years
  • Apache Spark - 4 years

Availability

Part-time

Preferred Environment

Python 3, Amazon Web Services (AWS), SQL, Data Build Tool (dbt), Snowflake

The most amazing...

...design I've built is a reusable data ingestion framework using AWS.

Work Experience

Senior Data Engineer

2023 - 2024
WeGift
  • Established a framework for translating existing mappings to PySpark jobs, facilitating a seamless migration process.
  • Designed and implemented data ingestion pipelines using PySpark, enabling seamless integration of structured and semi-structured data from diverse sources into Snowflake.
  • Established an end-to-end data pipeline at Runa, integrating Dagster, dbt, PySpark, and Snowflake, significantly reducing data processing time.
  • Collaborated with data analysts and engineers to design and implement seamless data sharing across different teams using PySpark, enhancing data accessibility and usability.
  • Conducted performance tuning of PySpark jobs, leveraging partitioning and caching techniques to handle large datasets efficiently.
Technologies: Amazon API Gateway, Snowflake, Data Build Tool (dbt), AWS Glue, Amazon Athena, Amazon DynamoDB, Fivetran, Tableau, Dagster, Dimensional Modeling, PySpark, Technical Leadership, Git, Linux, Flask, Software Engineering, Distributed Systems, Delta Live Tables (DLT), Reports, dbt Cloud, Amazon S3 (AWS S3), JSON, ETL Tools

Senior Data Engineer

2022 - 2023
Data Kitchens
  • Built and automated end-to-end data pipelines using PySpark for scalable and distributed processing of real-time analytics and reporting datasets.
  • Leveraged dbt to create optimized transformation processes, enhancing data quality and reliability while reducing processing time by 55% on average.
  • Set up and managed multiple data ingestion pipelines for disparate sources, including RDS, NetSuite, HubSpot, and Salesforce, resulting in a 45% reduction in data acquisition time.
  • Developed and optimized data partitioning strategies in PySpark to enhance the performance of marketing campaign performance analysis across terabytes of data.
  • Developed a comprehensive mart layer for BI using Lightdash, creating a user-friendly interface for business analysts to access and analyze data insights promptly.
  • Implemented data governance practices to ensure data accuracy, consistency, and compliance, significantly reducing data-related errors.
  • Partnered with cross-functional teams to build distributed systems for data cleansing, data enrichment, and schema normalization, critical for consistent reporting and analytics.
Technologies: Amazon Web Services (AWS), Data Build Tool (dbt), SQL, Python 3, Dagster, Snowflake, Amazon RDS, Stitch Data, REST APIs, Automation, English, Query Optimization, Fivetran, Orchestration, Dimensional Modeling, PySpark, Technical Leadership, Git, Linux, Flask, GeoPandas, Software Engineering, Distributed Systems, Delta Live Tables (DLT), Reports, dbt Cloud, Amazon S3 (AWS S3), JSON, Amazon DynamoDB, Amazon Redshift, ETL Tools

Senior Data Engineer

2021 - 2022
Axel Springer
  • Designed and implemented real-time streaming solutions for user engagement and connection with the reporting dashboard.
  • Created structured dbt models to encapsulate complex data transformations, simplifying code maintenance and contributing to a significant decrease in error rates.
  • Orchestrated a seamless migration of custom Python data transformation processes from Apache Airflow to dbt, ensuring consistent and accurate data processing.
  • Maintained data quality and integrity, ensuring they were complete, accurate, consistent, and valuable.
  • Developed distributed data pipelines using PySpark, optimizing the processing of large-scale data from diverse sources for analytics and reporting.
  • Planned, designed, and supervised projects end to end.
  • Integrated third-party application programming interfaces (APIs) to collect advertisement reports.
Technologies: Python 3, Apache Airflow, Snowflake, Spark, Amazon Web Services (AWS), ETL, Serverless Framework, Data Warehousing, Apache Spark, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, Data Architecture, Databases, PostgreSQL, Dagster, Data Science, Data Governance, Visual Studio Code (VS Code), APIs, Bash, Big Data, Amazon Marketing Services (AMS), Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Data Transformation, Data Modeling, ELT, Hadoop, Microservices, REST APIs, Databricks, Looker, Analytics, Automation, Machine Learning, English, Query Optimization, Redshift, AWS Serverless Application Model (SAM), Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight, Dimensional Modeling, PySpark, Technical Leadership, Git, Linux, Flask, GeoPandas, Software Engineering, Distributed Systems, Reports, dbt Cloud, Amazon S3 (AWS S3), JSON, Amazon DynamoDB, Amazon Redshift, ETL Tools

Principal Data Engineer

2020 - 2021
NorthBay Solutions
  • Participated in developing a product for ingestion, transformation, data lake formation, and dataset visualization.
  • Worked on connectors of Amazon S3, Amazon Redshift, file transfer protocol (FTP) source, and flat files for ingestion.
  • Transformed and modeled the extracted data using dbt to create structured and optimized datasets for downstream analytics and reporting, enhancing data quality and enabling faster insights.
  • Created scalable architecture for transcript generator to handle unpredictable loads.
  • Migrated the large Oracle server to Amazon Aurora PostgreSQL database and saved the licensing costs.
Technologies: Apache Spark, Python 3, Node.js, Amazon Web Services (AWS), Data Lakes, ETL, Serverless Architecture, Data Engineering, Data Warehousing, Spark, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, SQL Server DBA, MySQL, Data Architecture, Databases, AWS Lambda, Healthcare, PostgreSQL, Retool, Lambda Functions, Data Science, Data Governance, Visual Studio Code (VS Code), APIs, Bash, Scala, NoSQL, Big Data, Data Analytics, BI Reporting, Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Azure, Hadoop, Docker, Microservices, REST APIs, Google Cloud, Automation, Machine Learning, English, Query Optimization, Redshift, AWS Serverless Application Model (SAM), Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight, Dimensional Modeling, PySpark, Databricks, Technical Leadership, Git, Linux, GeoPandas, Software Engineering, Distributed Systems, Reports, dbt Cloud, Amazon S3 (AWS S3), JSON, Amazon DynamoDB, Amazon Redshift, ETL Tools

Senior Data Engineer

2019 - 2020
NorthBay Solutions
  • Created API services on a serverless framework for a cloud-based web app using Node.js 10.x.
  • Worked on the database migration from an on-premise Oracle server to an Amazon RDS Aurora PostgreSQL using AWS Database Migration Service (DMS).
  • Executed the ingestion processes on flat files using Amazon Athena, AWS Glue catalog, and AWS Glue crawlers.
  • Developed a custom Tableau dashboard as per management requirements.
  • Extracted data from tables and flat files from mixed systems, such as Oracle E-Business Suite (EBS) and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
  • Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Technologies: Node.js, Python 3, AWS Lambda, AWS Step Functions, Amazon CloudWatch, Redshift, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), API Gateways, AWS Glue, Amazon Athena, Amazon Aurora, Amazon RDS, Amazon Cognito, CI/CD Pipelines, Amazon S3 (AWS S3), ETL, Serverless Framework, Data Engineering, Data Warehousing, Serverless Architecture, SQL, Data Build Tool (dbt), Pandas, Data Pipelines, Python, RDBMS, Database Architecture, MySQL, Data Architecture, Databases, Healthcare, PostgreSQL, Lambda Functions, MongoDB, Apache Kafka, Visual Studio Code (VS Code), APIs, Bash, NoSQL, Big Data, Data Analytics, BI Reporting, Data Analysis, Big Data Architecture, Message Queues, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Docker, Microservices, REST APIs, Automation, Machine Learning, English, AWS Serverless Application Model (SAM), Amazon Redshift Spectrum, Azure Databricks, Amazon QuickSight, Dimensional Modeling, Databricks, Technical Leadership, Git, Software Engineering, Distributed Systems, Reports, JSON, Amazon Redshift, ETL Tools

Senior Data Engineer

2018 - 2019
Starzplay
  • Designed and created the specifications for a linear over-the-top (OTT) streaming network using AWS media services.
  • Performed the ingestion and transformation of on-premise data into Amazon S3 using AWS Glue PySpark and AWS Glue Python shell jobs.
  • Defined a data warehouse architecture (DWH) architecture, including dimensional modeling.
  • Developed a custom Tableau dashboard as per management requirements.
  • Extracted data from tables and flat files from mixed systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
  • Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Technologies: Node.js, Python 3, AWS Lambda, AWS Step Functions, Amazon CloudWatch, Redshift, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), API Gateways, AWS Glue, Amazon Athena, Amazon Aurora, Amazon RDS, Amazon EC2, Data Engineering, Data Warehousing, SQL, Pandas, Data Pipelines, Python, ETL, RDBMS, Database Architecture, MySQL, Data Architecture, Databases, PostgreSQL, Lambda Functions, Django, Visual Studio Code (VS Code), APIs, Data Analytics, Relational Databases, Amazon Elastic MapReduce (EMR), Data Transformation, Data Modeling, JavaScript, ELT, Microservices, Automation, Machine Learning, English, AWS Serverless Application Model (SAM), Amazon Redshift Spectrum, Amazon QuickSight, Git, Software Engineering, Distributed Systems, Amazon S3 (AWS S3), JSON, ETL Tools

Software Engineer

2015 - 2018
NorthBay Solutions
  • Sourced tables and flat files from heterogeneous systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline and loaded them in the staging area with Amazon Redshift.
  • Performed transformations on source tables and built dimensions and facts.
  • Created and maintained a serverless architecture using AWS services, including Amazon API Gateway, AWS Lambda, and Amazon RDS.
  • Used the AWS Kinesis stream for every event in the system and Amazon Athena to fetch data for the reporting layer.
  • Built a 4-tier QlikView Data (QVD) architecture in Qlik Sense to optimize the query performance and minimize the database workload.
Technologies: Redshift, Amazon RDS, AWS Step Functions, AWS Lambda, Amazon S3 (AWS S3), Amazon Cognito, API Gateways, Amazon DynamoDB, Data Engineering, Data Warehousing, Serverless Architecture, SQL, Data Pipelines, Python, ETL, Microsoft Power BI, RDBMS, Database Architecture, Data Architecture, Databases, Healthcare, PostgreSQL, Visual Studio Code (VS Code), Relational Databases, ELT, Microservices, English, Amazon Redshift Spectrum, Git, Software Engineering, Distributed Systems, JSON, ETL Tools

Experience

Centralized Educational Platform

The company wanted to help higher education organizations work better. The goal was to ease the administrative load enabling users to return to learning and discovery. They wanted a web application to handle ingesting data from multiple sources to a centralized Amazon S3 location. The service will act as a data lake and access layer to consume data from Amazon S3.

Data Warehouse for a Video-on-demand Company

The company utilizes advanced technologies to provide a premium viewing experience with full HD and 4K content sourced from some of the most important studios in the entertainment business. The studios included 20th Century Studios, CBS, Disney, Lionsgate, Paramount, Showtime, Sony, Starz, Universal, and Warner Bros. They wanted to ingest, transform and validate the data consumed in reporting and machine learning.

Quantflare

The project revolves around several tasks with the ultimate goal of having Fortune 500 stocks, cryptocurrencies time series analysis, predictive modeling, and machine learning by getting the data from premium APIs and storing them in a relational database management system (RDBMS). The other milestone was to have a REST API on top of the obtained data for potential customers.

Education

2015 - 2017

Master's Degree in Computer Science

LUMS - Lahore University of Management Sciences - Lahore, Pakistan

2011 - 2015

Bachelor's Degree in Computer Science

University of Engineering and Technology, Lahore - Lahore, Punjab, Pakistan

Certifications

AUGUST 2023 - AUGUST 2026

AWS Certified Solutions Architect

Amazon Web Services

SEPTEMBER 2020 - SEPTEMBER 2023

AWS Certified Solutions Architect Associate

AWS

Skills

Libraries/APIs

Pandas, REST APIs, PySpark, Node.js, Amazon EC2 API

Tools

Amazon Simple Queue Service (SQS), AWS Glue, Amazon Athena, Apache Airflow, Stitch Data, Amazon Redshift Spectrum, Git, dbt Cloud, Retool, DataGrip, Jupyter, AWS Step Functions, Amazon Cognito, Amazon CloudWatch, Amazon Simple Notification Service (SNS), Jenkins, AWS CloudFormation, Tableau, Microsoft Power BI, Amazon Elastic MapReduce (EMR), Looker, Amazon QuickSight

Languages

Snowflake, SQL, Python, Bash, Scala, JavaScript, Python 3

Frameworks

Apache Spark, Spark, AWS Serverless Application Model (SAM), Hadoop, Flask, Serverless Framework, Django, Delta Live Tables (DLT)

Paradigms

Serverless Architecture, ETL, Microservices, Automation, Dimensional Modeling

Platforms

AWS Lambda, Amazon EC2, Amazon Web Services (AWS), Azure, Databricks, Docker, Linux, Visual Studio Code (VS Code), Talend, Apache Kafka

Storage

Redshift, Amazon S3 (AWS S3), Amazon DynamoDB, Data Pipelines, RDBMS, Database Architecture, MySQL, Databases, PostgreSQL, NoSQL, Relational Databases, JSON, SQL Server DBA, MongoDB, Amazon Aurora, Data Lakes, Google Cloud

Industry Expertise

Healthcare

Other

Data Engineering, Data Warehousing, Amazon RDS, Data Build Tool (dbt), Data Architecture, Lambda Functions, APIs, Big Data, Data Analytics, Data Analysis, Big Data Architecture, Message Queues, Data Transformation, Data Modeling, ELT, English, Query Optimization, Azure Databricks, Orchestration, Azure Data Factory (ADF), Technical Leadership, Software Engineering, Distributed Systems, Reports, Amazon Redshift, ETL Tools, Data Science, Data Governance, BI Reporting, Machine Learning, Analytics, Fivetran, GeoPandas, API Gateways, CI/CD Pipelines, Amazon API Gateway, Dagster, Amazon Marketing Services (AMS)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring