
Nouman Khalid
Verified Expert in Engineering
Data Engineer and Developer
Lahore, Punjab, Pakistan
Toptal member since November 3, 2022
Nouman is a senior data engineer with over seven years of experience building data-intensive applications, tackling challenging architectural and scalability problems, and collecting and sorting data in data-centric companies. He is helping a news publishing company become the first to fully understand user behavior and make infrastructure more robust, reusable, and scalable. With his solid background, Nouman is eager to take on new challenges and deliver outstanding results.
Portfolio
Experience
- Data Pipelines - 8 years
- Data Engineering - 8 years
- Amazon Web Services (AWS) - 8 years
- Snowflake - 6 years
- Data Warehousing - 5 years
- Redshift - 5 years
- Pandas - 5 years
- Apache Spark - 4 years
Availability
Preferred Environment
Python 3, Amazon Web Services (AWS), SQL, Data Build Tool (dbt), Snowflake
The most amazing...
...design I've built is a reusable data ingestion framework using AWS.
Work Experience
Senior Data Engineer
WeGift
- Established a framework for translating existing mappings to PySpark jobs, facilitating a seamless migration process.
- Designed and implemented data ingestion pipelines using PySpark, enabling seamless integration of structured and semi-structured data from diverse sources into Snowflake.
- Established an end-to-end data pipeline at Runa, integrating Dagster, dbt, PySpark, and Snowflake, significantly reducing data processing time.
- Collaborated with data analysts and engineers to design and implement seamless data sharing across different teams using PySpark, enhancing data accessibility and usability.
- Conducted performance tuning of PySpark jobs, leveraging partitioning and caching techniques to handle large datasets efficiently.
Senior Data Engineer
Data Kitchens
- Built and automated end-to-end data pipelines using PySpark for scalable and distributed processing of real-time analytics and reporting datasets.
- Leveraged dbt to create optimized transformation processes, enhancing data quality and reliability while reducing processing time by 55% on average.
- Set up and managed multiple data ingestion pipelines for disparate sources, including RDS, NetSuite, HubSpot, and Salesforce, resulting in a 45% reduction in data acquisition time.
- Developed and optimized data partitioning strategies in PySpark to enhance the performance of marketing campaign performance analysis across terabytes of data.
- Developed a comprehensive mart layer for BI using Lightdash, creating a user-friendly interface for business analysts to access and analyze data insights promptly.
- Implemented data governance practices to ensure data accuracy, consistency, and compliance, significantly reducing data-related errors.
- Partnered with cross-functional teams to build distributed systems for data cleansing, data enrichment, and schema normalization, critical for consistent reporting and analytics.
Senior Data Engineer
Axel Springer
- Designed and implemented real-time streaming solutions for user engagement and connection with the reporting dashboard.
- Created structured dbt models to encapsulate complex data transformations, simplifying code maintenance and contributing to a significant decrease in error rates.
- Orchestrated a seamless migration of custom Python data transformation processes from Apache Airflow to dbt, ensuring consistent and accurate data processing.
- Maintained data quality and integrity, ensuring they were complete, accurate, consistent, and valuable.
- Developed distributed data pipelines using PySpark, optimizing the processing of large-scale data from diverse sources for analytics and reporting.
- Planned, designed, and supervised projects end to end.
- Integrated third-party application programming interfaces (APIs) to collect advertisement reports.
Principal Data Engineer
NorthBay Solutions
- Participated in developing a product for ingestion, transformation, data lake formation, and dataset visualization.
- Worked on connectors of Amazon S3, Amazon Redshift, file transfer protocol (FTP) source, and flat files for ingestion.
- Transformed and modeled the extracted data using dbt to create structured and optimized datasets for downstream analytics and reporting, enhancing data quality and enabling faster insights.
- Created scalable architecture for transcript generator to handle unpredictable loads.
- Migrated the large Oracle server to Amazon Aurora PostgreSQL database and saved the licensing costs.
Senior Data Engineer
NorthBay Solutions
- Created API services on a serverless framework for a cloud-based web app using Node.js 10.x.
- Worked on the database migration from an on-premise Oracle server to an Amazon RDS Aurora PostgreSQL using AWS Database Migration Service (DMS).
- Executed the ingestion processes on flat files using Amazon Athena, AWS Glue catalog, and AWS Glue crawlers.
- Developed a custom Tableau dashboard as per management requirements.
- Extracted data from tables and flat files from mixed systems, such as Oracle E-Business Suite (EBS) and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
- Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Senior Data Engineer
Starzplay
- Designed and created the specifications for a linear over-the-top (OTT) streaming network using AWS media services.
- Performed the ingestion and transformation of on-premise data into Amazon S3 using AWS Glue PySpark and AWS Glue Python shell jobs.
- Defined a data warehouse architecture (DWH) architecture, including dimensional modeling.
- Developed a custom Tableau dashboard as per management requirements.
- Extracted data from tables and flat files from mixed systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline to load in an Amazon S3 bucket.
- Created ETL jobs with Talend and migrated data from the Microsoft SQL Server and MySQL server to Amazon Redshift.
Software Engineer
NorthBay Solutions
- Sourced tables and flat files from heterogeneous systems, such as Oracle EBS and Amazon S3, using Amazon EMR and Amazon Data Pipeline and loaded them in the staging area with Amazon Redshift.
- Performed transformations on source tables and built dimensions and facts.
- Created and maintained a serverless architecture using AWS services, including Amazon API Gateway, AWS Lambda, and Amazon RDS.
- Used the AWS Kinesis stream for every event in the system and Amazon Athena to fetch data for the reporting layer.
- Built a 4-tier QlikView Data (QVD) architecture in Qlik Sense to optimize the query performance and minimize the database workload.
Experience
Centralized Educational Platform
Data Warehouse for a Video-on-demand Company
Quantflare
Education
Master's Degree in Computer Science
LUMS - Lahore University of Management Sciences - Lahore, Pakistan
Bachelor's Degree in Computer Science
University of Engineering and Technology, Lahore - Lahore, Punjab, Pakistan
Certifications
AWS Certified Solutions Architect
Amazon Web Services
AWS Certified Solutions Architect Associate
AWS
Skills
Libraries/APIs
Pandas, REST APIs, PySpark, Node.js, Amazon EC2 API
Tools
Amazon Simple Queue Service (SQS), AWS Glue, Amazon Athena, Apache Airflow, Stitch Data, Amazon Redshift Spectrum, Git, dbt Cloud, Retool, DataGrip, Jupyter, AWS Step Functions, Amazon Cognito, Amazon CloudWatch, Amazon Simple Notification Service (SNS), Jenkins, AWS CloudFormation, Tableau, Microsoft Power BI, Amazon Elastic MapReduce (EMR), Looker, Amazon QuickSight
Languages
Snowflake, SQL, Python, Bash, Scala, JavaScript, Python 3
Frameworks
Apache Spark, Spark, AWS Serverless Application Model (SAM), Hadoop, Flask, Serverless Framework, Django, Delta Live Tables (DLT)
Paradigms
Serverless Architecture, ETL, Microservices, Automation, Dimensional Modeling
Platforms
AWS Lambda, Amazon EC2, Amazon Web Services (AWS), Azure, Databricks, Docker, Linux, Visual Studio Code (VS Code), Talend, Apache Kafka
Storage
Redshift, Amazon S3 (AWS S3), Amazon DynamoDB, Data Pipelines, RDBMS, Database Architecture, MySQL, Databases, PostgreSQL, NoSQL, Relational Databases, JSON, SQL Server DBA, MongoDB, Amazon Aurora, Data Lakes, Google Cloud
Industry Expertise
Healthcare
Other
Data Engineering, Data Warehousing, Amazon RDS, Data Build Tool (dbt), Data Architecture, Lambda Functions, APIs, Big Data, Data Analytics, Data Analysis, Big Data Architecture, Message Queues, Data Transformation, Data Modeling, ELT, English, Query Optimization, Azure Databricks, Orchestration, Azure Data Factory (ADF), Technical Leadership, Software Engineering, Distributed Systems, Reports, Amazon Redshift, ETL Tools, Data Science, Data Governance, BI Reporting, Machine Learning, Analytics, Fivetran, GeoPandas, API Gateways, CI/CD Pipelines, Amazon API Gateway, Dagster, Amazon Marketing Services (AMS)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring