
Shahban Riaz
Data Engineer and Developer
Shahban is a data engineer who specializes in architecting, designing, and developing data lakes, warehouses, and analytics solutions. For over 14 years in the technology industry, he has guided large organizations in establishing data governance frameworks, implementing batch and real-time data pipelines, and building data quality frameworks. Shahban has experience with test-driven development, CI/CD, and agile project execution.
Portfolio
Experience
SQL - 11 yearsContainerization - 6 yearsData Engineering - 6 yearsApache Airflow - 6 yearsETL - 6 yearsApache Spark - 6 yearsData Architecture - 4 yearsApache Kafka - 2 yearsAvailability
Preferred Environment
Azure, Apache Airflow, Azure Synapse, Databricks, Terraform, Apache Kafka, Amazon Web Services (AWS), Redshift, Apache Spark, Agile
The most amazing...
...project I've developed is a framework for configuration-driven data curation, transformation, and quality assurance using Apache Airflow and PySpark.
Work Experience
Senior Data Engineer
SEEK
- Enhanced data ingestion and orchestration frameworks to run jobs in clustered Spark environments using AWS Batch. This reduced the execution time of data pipelines by more than half.
- Integrated the self-service portal with Airflow and Talend to allow the execution of data processing pipelines across multiple systems with a single click.
- Developed an automated data tagging solution for an enterprise data lake using Amazon SNS, Amazon SQS, and AWS Lambda functions.
- Configured AWS Lake Formation to federate data across multiple data lakes. This enables end users to access data in various data lakes from a single location.
- Built a configuration-driven framework using Apache Airflow and Spark that allows business users to generate customized data objects from simple SQL queries.
- Created and productionized data quality pipelines using Great Expectations.
- Developed data pipelines to curate data from Salesforce using REST APIs.
Data Analytics Technical and Design Lead
AusNet Services
- Prepared technical design for the full-stack monitoring solution of the corporate data analytics platform, which resulted in a 360-degree monitoring view of the platform. I used Azure Monitor, Kusto Query Language, and a Log Analytics workspace.
- Drafted architecture and design patterns for data ingestion, transformation, and storage, using Azure Data Factory, PostgreSQL, Databricks, data lake storage, EventHubs, and a data vault.
- Prepared data models for spatial and weather data sets using the data vault methodology.
- Led a team of seven DataOps engineers in developing a data analytics and machine learning platform.
- Reviewed and enhanced end-to-end architecture for a data lake and a data warehousing solution.
- Oversaw the development and optimization of streaming data pipelines, utilizing Azure ADF, Azure Event Hubs, Apache Spark, Azure Databricks, and Azure SQL.
- Designed patterns to curate data from various external systems using REST APIs and Apache Spark.
Senior Data Engineer and Solution Designer
Jemena
- Developed a reusable data curation and processing framework using PySpark, AWS EMR, Glue, S3, DynamoDB, and Amazon SQS.
- Built a configurable pipeline orchestration framework using Python and Apache Airflow.
- Created continuous deployment pipelines for automated testing and deployment of infrastructure and data pipelines using AWS CodeCommit, CodePipeline, CodeBuild, and Cloud Development Kit (CDK).
- Drafted end-to-end data architecture for an AWS-based lakehouse solution utilizing native services and open-source Delta Lake.
Senior Data Engineer
nbn
- Developed and productionized data ingestion, transformation, and modeling frameworks, using Confluent Kafka; Spark in Scala; AWS DynamoDB, Lambda, and ECS; and Amazon EMR, EKS, SNS, and SQS.
- Built a scalable pipeline scheduling framework using Python and Apache Airflow.
- Designed and developed data consumption patterns using AWS Glue, Athena, Redshift Spectrum, and Tableau.
- Created a data tagging solution for ensuring data security and traceability.
- Enhanced infrastructure deployment pipelines, using Jenkins and Terraform in Apache Kafka, ZooKeeper, and Airflow; AWS EMR, S3, ECS, Glue, and DynamoDB; and Amazon SNS and SQS.
- Designed and assisted in implementing CI/CD processes to deploy canary releases for data ingestion and processing.
- Worked on optimizing the performance of existing Kafka-based data pipelines.
Senior Consultant – Big Data
Deloitte
- Designed hybrid data movement, organization, processing, and notification solutions for on-premise data lakes in Cloudera and on-cloud data lakes using the Google Cloud Platform.
- Developed data pipelines for batch and stream processing using StreamSets, Apache Kafka, Pub/Sub, DataFlow, BigQuery, Google's machine learning API, and Twilio.
- Prepared a big data lake and data warehousing architecture using Azure services, including Azure Data Lake Storage Gen2, ADF, Databricks, PolyBase, Cosmos DB, SQL Database, and Azure SQL Data Warehouse.
- Built data ingestion and processing pipelines using ADF and Spark.
- Conducted performance tests on end-to-end data pipelines to establish the suitability of PaaS services for production loads.
- Designed and developed Type 2 SCD data sync pipelines using Spark and Spark SQL.
- Created data quality assurance and reconciliation frameworks.
System Implementation Consultant
Techlogix
- Worked in roles including technical consultant, functional consultant, and team leader for the implementation of PeopleSoft Campus Solutions.
- Developed operational and statutory reports using Oracle business intelligence tools.
- Developed tens of integrations between PeopleSoft systems and third-party products.
- Developed and automated data migration from legacy systems to PeopleSoft systems.
Experience
Enterprise Data Analytics Platform
I was part of the team assigned to establish architectural patterns to process data from relational and non-relational sources and establish data governance frameworks, including data security, classification, ownership, discoverability, and consumption patterns.
During the later stages of the project, I worked on implementing the platform using AWS IAM, Glue, Athena, Lake Formation, ECS, and EMR, PySpark, Apache Airflow, and containerization technology.
Information Management Lakehouse
My responsibilities included creating detailed architectural and design documents for the data platform and reviewing design documents prepared by other team members. I also provided technical guidance to the team for building data pipelines and reviewed other engineers' work to ensure that best practices were followed.
Corporate Data Hub
I developed and enhanced the pipeline scheduling, data curation, and processing framework using Apache Airflow and Spark; PostgreSQL; REST APIs; Delta Lake; AWS S3, Glue, Athena, and Lake Formation; CDC; and Type 2 SCD. I also built data marts for business consumption based on dimensional modeling.
Skills
Languages
SQL, Python, Scala, PeopleCode, Java
Frameworks
Apache Spark, Spark, AWS EMR
Libraries/APIs
PySpark, REST APIs
Tools
Apache Airflow, Amazon Elastic Container Registry (Amazon ECR), Git, Jira, Amazon Athena, AWS Glue, Terraform, Amazon Elastic MapReduce (EMR), Amazon Elastic Container Service (Amazon ECS), Jenkins, AWS Batch, Oracle BI Publisher
Paradigms
ETL, Agile, Testing, Requirements Analysis
Platforms
Docker, Amazon Web Services (AWS), Azure Event Hubs, Apache Kafka, Azure, Databricks, Azure Functions, Oracle
Storage
Data Pipelines, PL/SQL, Redshift, Amazon S3 (AWS S3), Data Lake Design, PostgreSQL, Microsoft SQL Server
Other
Solution Design, Data Architecture, Data Engineering, AWS Cloud Development, Software Engineering, Delta Lake, Deployment, Azure Data Lake, Data Governance, Containerization, IT Project Management, Azure Databricks, Azure Data Factory, Data Modeling, Azure Blob Storage, Azure Synapse Analytics, Network Data Storage, Data Processing, Data Lakehouse, Azure Monitor, Team Leadership, AWS Lake Formation, Data Warehouse Design, APIs, PeopleSoft, Business Analysis, Business Process Analysis, Data Migration, API Integration, Stakeholder Management, AWS Cloud Architecture, Cloud Infrastructure, Cloud Migration
Education
Bachelor's Degree in Computer Science
University of Sargodha - Sargodha, Pakistan
Certifications
AWS Certified Solutions Architect – Professional
AWS
Databricks Certified Data Engineer Professional
Databricks
AWS Certified Data Analytics Specialty
AWS
DP-200: Implementing an Azure Data Solution
Microsoft
AgilePM Practitioner
APMG International