Shahban Riaz, Developer in Melbourne, Victoria, Australia
Shahban is available for hire
Hire Shahban

Shahban Riaz

Verified Expert  in Engineering

Data Engineer and Developer

Melbourne, Victoria, Australia
Toptal Member Since
August 3, 2022

Shahban is a data engineer who specializes in architecting, designing, and developing data lakes, warehouses, and analytics solutions. For over 16 years in the technology industry, he has guided large organizations in establishing data governance frameworks, implementing batch and real-time data pipelines, and building data quality frameworks. Shahban has experience with test-driven development, CI/CD, and agile project execution.


Yahoo! - Yahoo Paranoids (Cybersecurity) - Australia
Data Engineering, Data Architecture, Amazon Web Services (AWS), Databricks...
Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Batch, PySpark...
AusNet Services
Solution Design, Azure Data Factory, Databricks, PySpark, Azure Data Lake, Git...




Preferred Environment

Azure, Apache Airflow, Azure Synapse, Databricks, Terraform, Apache Kafka, Amazon Web Services (AWS), Redshift, Apache Spark, Agile

The most amazing...

...project I've developed is a framework for configuration-driven data curation, transformation, and quality assurance using Apache Airflow and PySpark.

Work Experience

Data Engineer/Architect

2023 - PRESENT
Yahoo! - Yahoo Paranoids (Cybersecurity) - Australia
  • Prepared data mesh architecture tailored for Yahoo's enterprise security division, utilizing AWS's infrastructure and Databricks Delta Lake's technology to optimize security operations and ensure scalability and cost-effectiveness.
  • Developed technical solution that automated the deployment of Databricks workspaces, facilitating a seamless producer-consumer data mesh architecture. The solution significantly enhanced operational efficiency and data delivery speed.
  • Engineered data pipelines that integrated data from multiple systems, streamlining the process of identifying, categorizing, and prioritizing vulnerabilities. This resulted in the effective resolution of security issues and enhanced system resilience.
Technologies: Data Engineering, Data Architecture, Amazon Web Services (AWS), Databricks, PySpark, Python, Splunk, DevOps, Security, CI/CD Pipelines, Machine Learning, Artificial Intelligence (AI), Terraform

Senior Data Engineer

2021 - 2023
  • Enhanced data ingestion and orchestration frameworks to run jobs in clustered Spark environments using AWS Batch. This reduced the execution time of data pipelines by more than half.
  • Integrated the self-service portal with Airflow and Talend to allow the execution of data processing pipelines across multiple systems with a single click.
  • Developed an automated data tagging solution for an enterprise data lake using Amazon SNS, Amazon SQS, and AWS Lambda functions.
  • Configured AWS Lake Formation to federate data across multiple data lakes. This enables end users to access data in various data lakes from a single location.
  • Built a configuration-driven framework using Apache Airflow and Spark that allows business users to generate customized data objects from simple SQL queries.
  • Created and productionized data quality pipelines using Great Expectations.
  • Developed data pipelines to curate data from Salesforce using REST APIs.
Technologies: Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Batch, PySpark, Apache Airflow, Amazon Athena, Docker, AWS Glue, AWS Lake Formation, Git, Jira, SQL, Python, ETL, Data Engineering, Containerization, APIs, Spark, PostgreSQL, Data Governance, PL/SQL

Data Analytics Technical and Design Lead

2021 - 2021
AusNet Services
  • Prepared technical design for the full-stack monitoring solution of the corporate data analytics platform, which resulted in a 360-degree monitoring view of the platform. I used Azure Monitor, Kusto Query Language, and a Log Analytics workspace.
  • Drafted architecture and design patterns for data ingestion, transformation, and storage, using Azure Data Factory, PostgreSQL, Databricks, data lake storage, EventHubs, and a data vault.
  • Prepared data models for spatial and weather data sets using the data vault methodology.
  • Led a team of seven DataOps engineers in developing a data analytics and machine learning platform.
  • Reviewed and enhanced end-to-end architecture for a data lake and a data warehousing solution.
  • Oversaw the development and optimization of streaming data pipelines, utilizing Azure ADF, Azure Event Hubs, Apache Spark, Azure Databricks, and Azure SQL.
  • Designed patterns to curate data from various external systems using REST APIs and Apache Spark.
Technologies: Solution Design, Azure Data Factory, Databricks, PySpark, Azure Data Lake, Git, Jira, SQL, Python, Data Architecture, Azure, ETL, Data Engineering, Containerization, Data Warehouse Design, APIs, Spark, PostgreSQL, Data Governance, PL/SQL

Senior Data Engineer and Solution Designer

2020 - 2021
  • Developed a reusable data curation and processing framework using PySpark, AWS EMR, Glue, S3, DynamoDB, and Amazon SQS.
  • Built a configurable pipeline orchestration framework using Python and Apache Airflow.
  • Created continuous deployment pipelines for automated testing and deployment of infrastructure and data pipelines using AWS CodeCommit, CodePipeline, CodeBuild, and Cloud Development Kit (CDK).
  • Drafted end-to-end data architecture for an AWS-based lakehouse solution utilizing native services and open-source Delta Lake.
Technologies: PySpark, Amazon Elastic MapReduce (EMR), AWS Glue, Amazon Athena, Apache Airflow, Docker, AWS Cloud Development, Python, Jira, Data Architecture, Agile, Amazon Web Services (AWS), ETL, Data Engineering, Containerization, Spark, SQL, Data Governance, PL/SQL

Senior Data Engineer

2019 - 2020
  • Developed and productionized data ingestion, transformation, and modeling frameworks, using Confluent Kafka; Spark in Scala; AWS DynamoDB, Lambda, and ECS; and Amazon EMR, EKS, SNS, and SQS.
  • Built a scalable pipeline scheduling framework using Python and Apache Airflow.
  • Designed and developed data consumption patterns using AWS Glue, Athena, Redshift Spectrum, and Tableau.
  • Created a data tagging solution for ensuring data security and traceability.
  • Enhanced infrastructure deployment pipelines, using Jenkins and Terraform in Apache Kafka, ZooKeeper, and Airflow; AWS EMR, S3, ECS, Glue, and DynamoDB; and Amazon SNS and SQS.
  • Designed and assisted in implementing CI/CD processes to deploy canary releases for data ingestion and processing.
  • Worked on optimizing the performance of existing Kafka-based data pipelines.
Technologies: Apache Spark, Apache Airflow, Apache Kafka, Amazon Elastic MapReduce (EMR), Docker, Amazon Elastic Container Service (Amazon ECS), Amazon Athena, AWS Glue, Jenkins, Terraform, Git, Scala, Python, Jira, SQL, Containerization, Amazon Web Services (AWS), PL/SQL

Senior Consultant – Big Data

2018 - 2019
  • Designed hybrid data movement, organization, processing, and notification solutions for on-premise data lakes in Cloudera and on-cloud data lakes using the Google Cloud Platform.
  • Developed data pipelines for batch and stream processing using StreamSets, Apache Kafka, Pub/Sub, DataFlow, BigQuery, Google's machine learning API, and Twilio.
  • Prepared a big data lake and data warehousing architecture using Azure services, including Azure Data Lake Storage Gen2, ADF, Databricks, PolyBase, Cosmos DB, SQL Database, and Azure SQL Data Warehouse.
  • Built data ingestion and processing pipelines using ADF and Spark.
  • Conducted performance tests on end-to-end data pipelines to establish the suitability of PaaS services for production loads.
  • Designed and developed Type 2 SCD data sync pipelines using Spark and Spark SQL.
  • Created data quality assurance and reconciliation frameworks.
Technologies: Apache Spark, Databricks, Azure Data Lake, Azure Data Factory, Data Architecture, Solution Design, Git, Jira, Azure, Python, PySpark, SQL, Data Engineering, Containerization, PL/SQL

System Implementation Consultant

2007 - 2018
  • Worked in various roles, including technical consultant, functional consultant, and team leader for the implementation of PeopleSoft Campus Solutions.
  • Created operational and statutory reports using Oracle Business Intelligence tools.
  • Engineered numerous integrations between PeopleSoft systems and 3rd-party products.
  • Designed and automated data migration from legacy systems to PeopleSoft systems.
Technologies: PL/SQL, SQL, Oracle, Microsoft SQL Server, PeopleSoft, Business Analysis, Requirements Analysis, Business Process Analysis, Data Migration, REST APIs, APIs, API Integration, PeopleCode, Oracle BI Publisher, Java, Stakeholder Management, IT Project Management, System Implementation

Enterprise Data Analytics Platform

I built an enterprise data analytics platform for one of the leading telecom companies to curate, process, and store data from multiple corporate systems into an AWS data lake.

I was part of the team assigned to establish architectural patterns to process data from relational and non-relational sources and establish data governance frameworks, including data security, classification, ownership, discoverability, and consumption patterns.

During the later stages of the project, I worked on implementing the platform using AWS IAM, Glue, Athena, Lake Formation, ECS, and EMR, PySpark, Apache Airflow, and containerization technology.

Information Management Lakehouse

I participated as the technical and design lead in a project that involved curating data from IoT devices in the electricity and gas network, geospatial networks, and relational systems. Then we modeled that data using Data Vault and stored it in Azure Data Lake for analysis. The information was stored in data lakes and warehouses using lakehouse architecture to support business intelligence and machine learning uses.

My responsibilities included creating detailed architectural and design documents for the data platform and reviewing design documents prepared by other team members. I also provided technical guidance to the team for building data pipelines and reviewed other engineers' work to ensure that best practices were followed.

Corporate Data Hub

I created a corporate data hub for a leading online employment marketplace that allows to curate data from various enterprise systems into an AWS-based data warehouse. This data warehouse was built using lakehouse architecture.

I developed and enhanced the pipeline scheduling, data curation, and processing framework using Apache Airflow and Spark; PostgreSQL; REST APIs; Delta Lake; AWS S3, Glue, Athena, and Lake Formation; CDC; and Type 2 SCD. I also built data marts for business consumption based on dimensional modeling.
2003 - 2007

Bachelor's Degree in Computer Science

University of Sargodha - Sargodha, Pakistan


AWS Certified Solutions Architect – Professional



Databricks Certified Data Engineer Professional


JULY 2020 - JULY 2023

AWS Certified Data Analytics Specialty



DP-200: Implementing an Azure Data Solution



AgilePM Practitioner

APMG International


PySpark, REST APIs


Apache Airflow, Amazon Elastic Container Registry (ECR), Git, Jira, Amazon Athena, AWS Glue, Terraform, Amazon Elastic MapReduce (EMR), Amazon Elastic Container Service (Amazon ECS), Jenkins, Azure Monitor, AWS Batch, Oracle BI Publisher, Splunk


Apache Spark, Spark, Data Lakehouse


SQL, Python, Scala, PeopleCode, Java


ETL, Agile, Testing, Requirements Analysis, DevOps


Docker, Amazon Web Services (AWS), Azure Event Hubs, Apache Kafka, Azure, Azure Synapse Analytics, Databricks, Azure Functions, Oracle


Data Pipelines, PL/SQL, Redshift, Amazon S3 (AWS S3), Data Lake Design, PostgreSQL, Microsoft SQL Server


Solution Design, Data Architecture, Data Engineering, AWS Cloud Development, Software Engineering, Delta Lake, Deployment, Azure Data Lake, Data Governance, Containerization, IT Project Management, Azure Databricks, Azure Data Factory, Data Modeling, Azure Blob Storage, Network Data Storage, Data Processing, Team Leadership, AWS Lake Formation, Data Warehouse Design, APIs, PeopleSoft, Business Analysis, Business Process Analysis, Data Migration, API Integration, Stakeholder Management, AWS Cloud Architecture, Cloud Infrastructure, Cloud Migration, System Implementation, Security, CI/CD Pipelines, Machine Learning, Artificial Intelligence (AI)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.


Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring