Shahban Riaz, Developer in Melbourne, Victoria, Australia
Shahban is available for hire
Hire Shahban

Shahban Riaz

Verified Expert  in Engineering

Data Engineer and Developer

Location
Melbourne, Victoria, Australia
Toptal Member Since
August 3, 2022

Shahban is a data engineer who specializes in architecting, designing, and developing data lakes, warehouses, and analytics solutions. For over 14 years in the technology industry, he has guided large organizations in establishing data governance frameworks, implementing batch and real-time data pipelines, and building data quality frameworks. Shahban has experience with test-driven development, CI/CD, and agile project execution.

Portfolio

SEEK
Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Batch, PySpark...
AusNet Services
Solution Design, Azure Data Factory, Databricks, PySpark, Azure Data Lake, Git...
Jemena
PySpark, Amazon Elastic MapReduce (EMR), AWS Glue, Amazon Athena...

Experience

Availability

Part-time

Preferred Environment

Azure, Apache Airflow, Azure Synapse, Databricks, Terraform, Apache Kafka, Amazon Web Services (AWS), Redshift, Apache Spark, Agile

The most amazing...

...project I've developed is a framework for configuration-driven data curation, transformation, and quality assurance using Apache Airflow and PySpark.

Work Experience

Senior Data Engineer

2021 - 2023
SEEK
  • Enhanced data ingestion and orchestration frameworks to run jobs in clustered Spark environments using AWS Batch. This reduced the execution time of data pipelines by more than half.
  • Integrated the self-service portal with Airflow and Talend to allow the execution of data processing pipelines across multiple systems with a single click.
  • Developed an automated data tagging solution for an enterprise data lake using Amazon SNS, Amazon SQS, and AWS Lambda functions.
  • Configured AWS Lake Formation to federate data across multiple data lakes. This enables end users to access data in various data lakes from a single location.
  • Built a configuration-driven framework using Apache Airflow and Spark that allows business users to generate customized data objects from simple SQL queries.
  • Created and productionized data quality pipelines using Great Expectations.
  • Developed data pipelines to curate data from Salesforce using REST APIs.
Technologies: Amazon Web Services (AWS), Amazon S3 (AWS S3), AWS Batch, PySpark, Apache Airflow, Amazon Athena, Docker, AWS Glue, AWS Lake Formation, Git, Jira, SQL, Python, ETL, Data Engineering, Containerization, APIs, Spark, PostgreSQL, Data Governance, PL/SQL

Data Analytics Technical and Design Lead

2021 - 2021
AusNet Services
  • Prepared technical design for the full-stack monitoring solution of the corporate data analytics platform, which resulted in a 360-degree monitoring view of the platform. I used Azure Monitor, Kusto Query Language, and a Log Analytics workspace.
  • Drafted architecture and design patterns for data ingestion, transformation, and storage, using Azure Data Factory, PostgreSQL, Databricks, data lake storage, EventHubs, and a data vault.
  • Prepared data models for spatial and weather data sets using the data vault methodology.
  • Led a team of seven DataOps engineers in developing a data analytics and machine learning platform.
  • Reviewed and enhanced end-to-end architecture for a data lake and a data warehousing solution.
  • Oversaw the development and optimization of streaming data pipelines, utilizing Azure ADF, Azure Event Hubs, Apache Spark, Azure Databricks, and Azure SQL.
  • Designed patterns to curate data from various external systems using REST APIs and Apache Spark.
Technologies: Solution Design, Azure Data Factory, Databricks, PySpark, Azure Data Lake, Git, Jira, SQL, Python, Data Architecture, Azure, ETL, Data Engineering, Containerization, Data Warehouse Design, APIs, Spark, PostgreSQL, Data Governance, PL/SQL

Senior Data Engineer and Solution Designer

2020 - 2021
Jemena
  • Developed a reusable data curation and processing framework using PySpark, AWS EMR, Glue, S3, DynamoDB, and Amazon SQS.
  • Built a configurable pipeline orchestration framework using Python and Apache Airflow.
  • Created continuous deployment pipelines for automated testing and deployment of infrastructure and data pipelines using AWS CodeCommit, CodePipeline, CodeBuild, and Cloud Development Kit (CDK).
  • Drafted end-to-end data architecture for an AWS-based lakehouse solution utilizing native services and open-source Delta Lake.
Technologies: PySpark, Amazon Elastic MapReduce (EMR), AWS Glue, Amazon Athena, Apache Airflow, Docker, AWS Cloud Development, Python, Jira, Data Architecture, Agile, Amazon Web Services (AWS), ETL, Data Engineering, Containerization, Spark, SQL, Data Governance, PL/SQL

Senior Data Engineer

2019 - 2020
nbn
  • Developed and productionized data ingestion, transformation, and modeling frameworks, using Confluent Kafka; Spark in Scala; AWS DynamoDB, Lambda, and ECS; and Amazon EMR, EKS, SNS, and SQS.
  • Built a scalable pipeline scheduling framework using Python and Apache Airflow.
  • Designed and developed data consumption patterns using AWS Glue, Athena, Redshift Spectrum, and Tableau.
  • Created a data tagging solution for ensuring data security and traceability.
  • Enhanced infrastructure deployment pipelines, using Jenkins and Terraform in Apache Kafka, ZooKeeper, and Airflow; AWS EMR, S3, ECS, Glue, and DynamoDB; and Amazon SNS and SQS.
  • Designed and assisted in implementing CI/CD processes to deploy canary releases for data ingestion and processing.
  • Worked on optimizing the performance of existing Kafka-based data pipelines.
Technologies: Apache Spark, Apache Airflow, Apache Kafka, Amazon Elastic MapReduce (EMR), Docker, Amazon Elastic Container Service (Amazon ECS), Amazon Athena, AWS Glue, Jenkins, Terraform, Git, Scala, Python, Jira, SQL, Containerization, Amazon Web Services (AWS), PL/SQL

Senior Consultant – Big Data

2018 - 2019
Deloitte
  • Designed hybrid data movement, organization, processing, and notification solutions for on-premise data lakes in Cloudera and on-cloud data lakes using the Google Cloud Platform.
  • Developed data pipelines for batch and stream processing using StreamSets, Apache Kafka, Pub/Sub, DataFlow, BigQuery, Google's machine learning API, and Twilio.
  • Prepared a big data lake and data warehousing architecture using Azure services, including Azure Data Lake Storage Gen2, ADF, Databricks, PolyBase, Cosmos DB, SQL Database, and Azure SQL Data Warehouse.
  • Built data ingestion and processing pipelines using ADF and Spark.
  • Conducted performance tests on end-to-end data pipelines to establish the suitability of PaaS services for production loads.
  • Designed and developed Type 2 SCD data sync pipelines using Spark and Spark SQL.
  • Created data quality assurance and reconciliation frameworks.
Technologies: Apache Spark, Databricks, Azure Data Lake, Azure Data Factory, Data Architecture, Solution Design, Git, Jira, Azure, Python, PySpark, SQL, Data Engineering, Containerization, PL/SQL

System Implementation Consultant

2007 - 2018
Techlogix
  • Worked in roles including technical consultant, functional consultant, and team leader for the implementation of PeopleSoft Campus Solutions.
  • Developed operational and statutory reports using Oracle business intelligence tools.
  • Developed tens of integrations between PeopleSoft systems and third-party products.
  • Developed and automated data migration from legacy systems to PeopleSoft systems.
Technologies: PL/SQL, SQL, Oracle, Microsoft SQL Server, PeopleSoft, Business Analysis, Requirements Analysis, Business Process Analysis, Data Migration, REST APIs, APIs, API Integration, PeopleCode, Oracle BI Publisher, Java, Stakeholder Management, IT Project Management, System Implementation

Enterprise Data Analytics Platform

I built an enterprise data analytics platform for one of the leading telecom companies to curate, process, and store data from multiple corporate systems into an AWS data lake.

I was part of the team assigned to establish architectural patterns to process data from relational and non-relational sources and establish data governance frameworks, including data security, classification, ownership, discoverability, and consumption patterns.

During the later stages of the project, I worked on implementing the platform using AWS IAM, Glue, Athena, Lake Formation, ECS, and EMR, PySpark, Apache Airflow, and containerization technology.

Information Management Lakehouse

I participated as the technical and design lead in a project that involved curating data from IoT devices in the electricity and gas network, geospatial networks, and relational systems. Then we modeled that data using Data Vault and stored it in Azure Data Lake for analysis. The information was stored in data lakes and warehouses using lakehouse architecture to support business intelligence and machine learning uses.

My responsibilities included creating detailed architectural and design documents for the data platform and reviewing design documents prepared by other team members. I also provided technical guidance to the team for building data pipelines and reviewed other engineers' work to ensure that best practices were followed.

Corporate Data Hub

I created a corporate data hub for a leading online employment marketplace that allows to curate data from various enterprise systems into an AWS-based data warehouse. This data warehouse was built using lakehouse architecture.

I developed and enhanced the pipeline scheduling, data curation, and processing framework using Apache Airflow and Spark; PostgreSQL; REST APIs; Delta Lake; AWS S3, Glue, Athena, and Lake Formation; CDC; and Type 2 SCD. I also built data marts for business consumption based on dimensional modeling.
2003 - 2007

Bachelor's Degree in Computer Science

University of Sargodha - Sargodha, Pakistan

MARCH 2023 - PRESENT

AWS Certified Solutions Architect – Professional

AWS

JUNE 2022 - PRESENT

Databricks Certified Data Engineer Professional

Databricks

JULY 2020 - JULY 2023

AWS Certified Data Analytics Specialty

AWS

FEBRUARY 2020 - PRESENT

DP-200: Implementing an Azure Data Solution

Microsoft

APRIL 2017 - PRESENT

AgilePM Practitioner

APMG International

Languages

SQL, Python, Scala, PeopleCode, Java

Frameworks

Apache Spark, Spark, Data Lakehouse

Libraries/APIs

PySpark, REST APIs

Tools

Apache Airflow, Amazon Elastic Container Registry (ECR), Git, Jira, Amazon Athena, AWS Glue, Terraform, Amazon Elastic MapReduce (EMR), Amazon Elastic Container Service (Amazon ECS), Jenkins, Azure Monitor, AWS Batch, Oracle BI Publisher

Paradigms

ETL, Agile, Testing, Requirements Analysis

Platforms

Docker, Amazon Web Services (AWS), Azure Event Hubs, Apache Kafka, Azure, Azure Synapse Analytics, Databricks, Azure Functions, Oracle

Storage

Data Pipelines, PL/SQL, Redshift, Amazon S3 (AWS S3), Data Lake Design, PostgreSQL, Microsoft SQL Server

Other

Solution Design, Data Architecture, Data Engineering, AWS Cloud Development, Software Engineering, Delta Lake, Deployment, Azure Data Lake, Data Governance, Containerization, IT Project Management, Azure Databricks, Azure Data Factory, Data Modeling, Azure Blob Storage, Network Data Storage, Data Processing, Team Leadership, AWS Lake Formation, Data Warehouse Design, APIs, PeopleSoft, Business Analysis, Business Process Analysis, Data Migration, API Integration, Stakeholder Management, AWS Cloud Architecture, Cloud Infrastructure, Cloud Migration, System Implementation

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring