Azeem Khan, Developer in Sydney, New South Wales, Australia
Azeem is available for hire
Hire Azeem

Azeem Khan

Bio

Azeem is a data engineer with 6+ years of experience across banking, finance, healthcare, pharma, and fintech. He designs, builds, and optimizes cloud-based batch and streaming data pipelines across AWS and Azure. An expert in Python, PySpark, SQL, Databricks, Airflow, and AWS Glue, Azeem has successfully executed cloud migrations, improved performance, reduced costs, and enhanced production reliability. He managed 30+ pipelines within an enterprise environment.

Portfolio

Avrioc
Python, PySpark, Databricks, SQL, Elasticsearch, Apache Kafka, Data Engineering...
AstraZeneca
Python, AWS Lambda, AWS Glue, Data Warehousing, Data Lakes, PySpark, SQL...
BASF
Python, Spark, AWS IoT, AWS Lambda, ETL, Maps, SQL, Data Engineering...

Experience

  • SQL - 6 years
  • Python - 6 years
  • AWS Glue - 6 years
  • SQL Stored Procedures - 6 years
  • AWS IoT - 6 years
  • Spark - 4 years
  • PySpark - 4 years
  • Data Warehousing - 4 years

Preferred Environment

AWS IoT, Python, Spark, SQL, PySpark, Streaming, Data Architecture

The most amazing...

...thing I've done is optimize Spark pipelines, which helped save the company revenue by reducing job runtime, leading to lower monthly Databricks bills.

Work Experience

Senior Data Engineer

2026 - 2026
Avrioc
  • Designed and maintained end-to-end ETL pipelines using batch sources with MySQL and PostgreSQL, and streaming platforms using Apache Kafka and Elasticsearch, to support real-time and analytical workloads for AI and fintech products.
  • Optimized PySpark pipelines on Databricks using partitioning and execution tuning, reducing job runtime from 2.5 hours to 30 minutes and lowering cloud compute costs.
  • Engineered real-time streaming ingestion frameworks using Kafka and Spark Structured Streaming to deliver near real-time analytics for sports and transaction platforms.
  • Built and maintained cloud-native data workflows using AWS Glue, Databricks, and Airflow, ensuring production reliability and SLA compliance.
  • Collaborated with cross-functional teams to design and deploy internal enterprise data platforms, supporting HR, finance, and leadership analytics.
  • Implemented data validation, monitoring, and error-handling frameworks, improving pipeline reliability and reducing production failures.
  • Mentored junior data engineers and reviewed code, ensuring scalable, maintainable, and production-ready data solutions.
Technologies: Python, PySpark, Databricks, SQL, Elasticsearch, Apache Kafka, Data Engineering, Amazon S3 (AWS S3), Delta Lake, ETL Tools, ETL Pipelines, Claude, Data Modeling, Data Pipelines, Data Transformation, Document Parsing, OpenAI, Skip Tracing, AI Data Classification, Microsoft Power BI, Dashboards, Data Visualization, Data Analysis, Databricks, Technical Business Analysis, Streaming

Senior Data Engineer

2024 - 2025
AstraZeneca
  • Designed and maintained end-to-end ETL pipelines using batch sources with MySQL and PostgreSQL, and streaming platforms using Apache Kafka and Elasticsearch.
  • Led cloud migration and ETL optimization initiatives, moving daily production pipelines from on-premise to AWS and Azure, improving execution performance by up to 70%, and reducing operational cost and manual intervention.
  • Architected scalable data lakes and star-schema data warehouses, enabling high-volume analytics and reducing report turnaround time by 50% for enterprise healthcare and pharma clients.
  • Automated ingestion and transformation pipelines using Apache Airflow, Fivetran, and AWS Glue, managing 32+ production workflows across regulated business domains.
  • Implemented data quality, validation, masking, and governance frameworks, integrating Collibra and Starburst, improving compliance, metadata discoverability, and trust in enterprise data products.
  • Built cross-cloud migration and SQL translation frameworks supporting AWS, Azure, GCP, and Snowflake, enabling standardized dashboard delivery through Power BI and Amazon QuickSight.
  • Headed and mentored data engineering teams of up to four members, ensuring delivery of high-quality, SLA-compliant, and scalable data solutions for international stakeholders.
Technologies: Python, AWS Lambda, AWS Glue, Data Warehousing, Data Lakes, PySpark, SQL, Data Engineering, Amazon S3 (AWS S3), Delta Lake, Terraform, ETL Tools, ETL Pipelines, Amazon Web Services (AWS), Data Cleansing, Data Cleaning, CRM APIs, Claude, Data Modeling, Data Pipelines, Data Transformation, Document Parsing, Skip Tracing, Microsoft Power BI, Dashboards, Data Visualization, Data Analysis, Databricks, Technical Business Analysis

Senior Data Engineer

2023 - 2024
BASF
  • Developed AWS Glue ETL pipelines using PySpark and SQL, processing 1 TB+ daily data volumes across agricultural and product analytics platforms.
  • Spearheaded cloud migration of mission-critical data pipelines from on-premise systems to AWS S3, Redshift, and EMR, enabling near real-time analytics for 100+ enterprise users.
  • Translated legacy Talend ETL workflows into maintainable AWS Glue pipelines, improving long-term scalability and onboarding efficiency.
  • Automated data cleansing, validation, and transformation frameworks, increasing data quality and trust in downstream BI and reporting systems.
  • Designed and documented ETL frameworks and data models, reducing onboarding time for new engineers by 40% and improving knowledge transfer.
  • Supported production monitoring, issue resolution, and SLA adherence for enterprise-scale data pipelines.
Technologies: Python, Spark, AWS IoT, AWS Lambda, ETL, Maps, SQL, Data Engineering, Amazon S3 (AWS S3), Terraform, ETL Tools, ETL Pipelines, Amazon Web Services (AWS), Data Cleansing, Data Cleaning, Data Pipelines, Data Transformation, Document Parsing, Podio, Web Scraping, Microsoft Power BI, Dashboards, Data Visualization, Data Analysis, Databricks, Technical Business Analysis

Data Engineer

2022 - 2023
Navikenz
  • Designed and implemented secure enterprise data warehouses for banking and compliance-driven clients, ensuring privacy and integrity of 5,000+ daily records.
  • Optimized SQL procedures and ETL workflows, improving query performance by 22% and supporting faster operational reporting.
  • Built cloud migration accelerators enabling movement of 15+ data sources across AWS, GCP, Snowflake, and Azure Databricks.
  • Developed Python-based SQL translation tools, reducing manual migration effort and accelerating cross-platform cloud adoption.
  • Delivered standardized analytics dashboards using Power BI and Amazon QuickSight, enabling real-time financial and operational insights for business stakeholders.
Technologies: Python, AWS IoT, Databricks, AWS Glue, AWS Lambda, PySpark, JSON, Google Cloud, BigQuery, Snowflake, Data Engineering, Amazon S3 (AWS S3), ETL Tools, ETL Pipelines, Amazon Web Services (AWS), Data Cleansing, Data Cleaning, CRM APIs, Data Pipelines, Data Transformation, Document Parsing, Web Scraping, Microsoft Power BI, Dashboards, Data Analysis, Technical Business Analysis

Data Engineer

2019 - 2022
Uber
  • Developed and maintained PL/SQL procedures, functions, and ETL pipelines supporting large-scale enterprise applications with 100+ database tables.
  • Migrated on-premise Oracle workloads to AWS, improving scalability, reliability, and readiness of business systems.
  • Implemented AWS Glue-based ETL pipelines to load structured and semi-structured data into Amazon Redshift.
  • Automated data quality checks, monitoring, and error logging, reducing manual validation efforts by 50% and improving SLA compliance.
  • Supported production systems through debugging, performance tuning, and incident resolution for global enterprise clients.
Technologies: PL/SQL, Oracle ERP, Oracle Fusion ERP, SQL, Python, Data Engineering, Amazon S3 (AWS S3), ETL Pipelines, Amazon Web Services (AWS), Data Cleaning, CRM APIs, Data Pipelines, Data Transformation, Document Parsing, Web Scraping, Microsoft Power BI, Dashboards, Data Analysis, Technical Business Analysis

Experience

HR Software

A complete HR solution with the front end on React and the back end DB on PostgreSQL. It provides comprehensive software for HR activity management, including hiring, payroll, leaves, releases, and similar functionalities for different user types, including admin, HR, candidates, and employees.

MyWhoosh Virtual Cycling ETL Pipeline

A complete ETL solution as a batch pipeline for a virtual cycling application called MyWhoosh. It has a very robust use of Kafka and Elasticsearch for handling features such as on-the-go face recognition, health and power metrics, and more, to prevent candidate cheating.

Database Solution for Insurance Company

http://www.momentum.co.za
On this project, I managed data cleaning and some transformations of user health data. I employed business logic to determine whether it is a good idea to offer more benefits to a client or a potential client.

Data Governance ETL Pipeline

Created an end-to-end pipeline on Python on Apace Airflow for orchestration, where the data is sourced from multiple S3 storage, brought to AWS Glue Catalog. This catalog is shared with Starburst, where the major transformations and matching occur, and is then pushed to Collibra for data governance.

Call Center Insurance Repayment

A back-end database design and implementation for a complete call center solution for a bank to be provided to its 3rd party call center service for loan repayment and recovery with bank clients. This software should include the correct number of wrappers and all necessary safety measures to protect sensitive data.

Uber P2P Finance Project

As the support team member for Uber's monthly financial close and all company payments, I managed the procure-to-pay lifecycle. The solution was built entirely on Oracle E-Business Suite and managed by our team.

Education

2021 - 2023

Master's Degree in Information Technology

University of Hyderabad - Hyderabad, India

Certifications

JULY 2025 - JULY 2028

AWS Certified Data Engineer

Amazon Web Services

AUGUST 2023 - PRESENT

Analyzing and Visualising Data with Microsoft Power BI

Microsoft

MAY 2023 - PRESENT

Amazon Web Services Data Analytics - Specialty

Amazon Web Services

AUGUST 2020 - PRESENT

Microsoft Azure Fundamentals AZ-900

Microsoft

Skills

Libraries/APIs

PySpark

Tools

AWS Glue, Microsoft Power BI, Terraform, Claude, BigQuery, Oracle ERP, Collibra, Apache Airflow, AWS IAM, Podio

Languages

Python, SQL, Snowflake

Platforms

AWS IoT, AWS Lambda, Amazon Web Services (AWS), Blockchain, Databricks, Apache Kafka, Docker, Oracle, Azure

Storage

PL/SQL, SQL Stored Procedures, Amazon S3 (AWS S3), Data Pipelines, Data Lakes, JSON, Google Cloud, Elasticsearch

Paradigms

ETL, REST

Frameworks

Spark, Alchemy

Other

Data Warehousing, Data Engineering, ETL Tools, ETL Pipelines, Data Modeling, Data Transformation, Document Parsing, Dashboards, Data Visualization, Data Analysis, Databricks, Technical Business Analysis, Delta Lake, Data Cleansing, Data Cleaning, CRM APIs, OpenAI, Skip Tracing, Web Scraping, AI Data Classification, Streaming, Data Architecture, Information Science, Big Data, Maps, Oracle Fusion ERP, KSQL, Starburst, Analytics

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring