
Azeem Khan
Verified Expert in Engineering
Data Engineer and Developer
Sydney, New South Wales, Australia
Toptal member since February 6, 2026
Azeem is a data engineer with 6+ years of experience across banking, finance, healthcare, pharma, and fintech. He designs, builds, and optimizes cloud-based batch and streaming data pipelines across AWS and Azure. An expert in Python, PySpark, SQL, Databricks, Airflow, and AWS Glue, Azeem has successfully executed cloud migrations, improved performance, reduced costs, and enhanced production reliability. He managed 30+ pipelines within an enterprise environment.
Portfolio
Experience
- SQL - 6 years
- Python - 6 years
- AWS Glue - 6 years
- SQL Stored Procedures - 6 years
- AWS IoT - 6 years
- Spark - 4 years
- PySpark - 4 years
- Data Warehousing - 4 years
Preferred Environment
AWS IoT, Python, Spark, SQL, PySpark, Streaming, Data Architecture
The most amazing...
...thing I've done is optimize Spark pipelines, which helped save the company revenue by reducing job runtime, leading to lower monthly Databricks bills.
Work Experience
Senior Data Engineer
Avrioc
- Designed and maintained end-to-end ETL pipelines using batch sources with MySQL and PostgreSQL, and streaming platforms using Apache Kafka and Elasticsearch, to support real-time and analytical workloads for AI and fintech products.
- Optimized PySpark pipelines on Databricks using partitioning and execution tuning, reducing job runtime from 2.5 hours to 30 minutes and lowering cloud compute costs.
- Engineered real-time streaming ingestion frameworks using Kafka and Spark Structured Streaming to deliver near real-time analytics for sports and transaction platforms.
- Built and maintained cloud-native data workflows using AWS Glue, Databricks, and Airflow, ensuring production reliability and SLA compliance.
- Collaborated with cross-functional teams to design and deploy internal enterprise data platforms, supporting HR, finance, and leadership analytics.
- Implemented data validation, monitoring, and error-handling frameworks, improving pipeline reliability and reducing production failures.
- Mentored junior data engineers and reviewed code, ensuring scalable, maintainable, and production-ready data solutions.
Senior Data Engineer
AstraZeneca
- Designed and maintained end-to-end ETL pipelines using batch sources with MySQL and PostgreSQL, and streaming platforms using Apache Kafka and Elasticsearch.
- Led cloud migration and ETL optimization initiatives, moving daily production pipelines from on-premise to AWS and Azure, improving execution performance by up to 70%, and reducing operational cost and manual intervention.
- Architected scalable data lakes and star-schema data warehouses, enabling high-volume analytics and reducing report turnaround time by 50% for enterprise healthcare and pharma clients.
- Automated ingestion and transformation pipelines using Apache Airflow, Fivetran, and AWS Glue, managing 32+ production workflows across regulated business domains.
- Implemented data quality, validation, masking, and governance frameworks, integrating Collibra and Starburst, improving compliance, metadata discoverability, and trust in enterprise data products.
- Built cross-cloud migration and SQL translation frameworks supporting AWS, Azure, GCP, and Snowflake, enabling standardized dashboard delivery through Power BI and Amazon QuickSight.
- Headed and mentored data engineering teams of up to four members, ensuring delivery of high-quality, SLA-compliant, and scalable data solutions for international stakeholders.
Senior Data Engineer
BASF
- Developed AWS Glue ETL pipelines using PySpark and SQL, processing 1 TB+ daily data volumes across agricultural and product analytics platforms.
- Spearheaded cloud migration of mission-critical data pipelines from on-premise systems to AWS S3, Redshift, and EMR, enabling near real-time analytics for 100+ enterprise users.
- Translated legacy Talend ETL workflows into maintainable AWS Glue pipelines, improving long-term scalability and onboarding efficiency.
- Automated data cleansing, validation, and transformation frameworks, increasing data quality and trust in downstream BI and reporting systems.
- Designed and documented ETL frameworks and data models, reducing onboarding time for new engineers by 40% and improving knowledge transfer.
- Supported production monitoring, issue resolution, and SLA adherence for enterprise-scale data pipelines.
Data Engineer
Navikenz
- Designed and implemented secure enterprise data warehouses for banking and compliance-driven clients, ensuring privacy and integrity of 5,000+ daily records.
- Optimized SQL procedures and ETL workflows, improving query performance by 22% and supporting faster operational reporting.
- Built cloud migration accelerators enabling movement of 15+ data sources across AWS, GCP, Snowflake, and Azure Databricks.
- Developed Python-based SQL translation tools, reducing manual migration effort and accelerating cross-platform cloud adoption.
- Delivered standardized analytics dashboards using Power BI and Amazon QuickSight, enabling real-time financial and operational insights for business stakeholders.
Data Engineer
Uber
- Developed and maintained PL/SQL procedures, functions, and ETL pipelines supporting large-scale enterprise applications with 100+ database tables.
- Migrated on-premise Oracle workloads to AWS, improving scalability, reliability, and readiness of business systems.
- Implemented AWS Glue-based ETL pipelines to load structured and semi-structured data into Amazon Redshift.
- Automated data quality checks, monitoring, and error logging, reducing manual validation efforts by 50% and improving SLA compliance.
- Supported production systems through debugging, performance tuning, and incident resolution for global enterprise clients.
Experience
HR Software
MyWhoosh Virtual Cycling ETL Pipeline
Database Solution for Insurance Company
http://www.momentum.co.zaData Governance ETL Pipeline
Call Center Insurance Repayment
Uber P2P Finance Project
Education
Master's Degree in Information Technology
University of Hyderabad - Hyderabad, India
Certifications
AWS Certified Data Engineer
Amazon Web Services
Analyzing and Visualising Data with Microsoft Power BI
Microsoft
Amazon Web Services Data Analytics - Specialty
Amazon Web Services
Microsoft Azure Fundamentals AZ-900
Microsoft
Skills
Libraries/APIs
PySpark
Tools
AWS Glue, Microsoft Power BI, Terraform, Claude, BigQuery, Oracle ERP, Collibra, Apache Airflow, AWS IAM, Podio
Languages
Python, SQL, Snowflake
Platforms
AWS IoT, AWS Lambda, Amazon Web Services (AWS), Blockchain, Databricks, Apache Kafka, Docker, Oracle, Azure
Storage
PL/SQL, SQL Stored Procedures, Amazon S3 (AWS S3), Data Pipelines, Data Lakes, JSON, Google Cloud, Elasticsearch
Paradigms
ETL, REST
Frameworks
Spark, Alchemy
Other
Data Warehousing, Data Engineering, ETL Tools, ETL Pipelines, Data Modeling, Data Transformation, Document Parsing, Dashboards, Data Visualization, Data Analysis, Databricks, Technical Business Analysis, Delta Lake, Data Cleansing, Data Cleaning, CRM APIs, OpenAI, Skip Tracing, Web Scraping, AI Data Classification, Streaming, Data Architecture, Information Science, Big Data, Maps, Oracle Fusion ERP, KSQL, Starburst, Analytics
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring