Salman is available for hire

Salman Azhar

Verified Expert in Engineering

Data Engineering Developer

Location

Lahore, Pakistan

Toptal Member Since

June 6, 2022

Salman is a challenge-driven data engineer with 7+ years of professional experience developing Big Data solutions and using various Amazon cloud services, including AWS StepFunctions, Amazon Redshift, and AWS Data Pipeline. In addition, he has experience designing architectural flows and implementing data lakes, data ingestion workflows, and data warehouses on AWS. Salman is always looking for challenging opportunities and prides himself on going the extra mile on every project.

Data Analysis Exploratory Data Analysis Data Engineering Big Data Big Data Architecture Analytics Data Analytics Data Warehouse Design Data Warehousing Python 3 SQL Python AWS Lambda Git AWS Step Functions AWS Glue AWS Athena

Portfolio

Scoutbee

Python, Apache Kafka, Kafka Streams, Data Lakes, Data Lake Design, Delta Lake...

Zaavya

SQL, Python, AWS Step Functions, Amazon Simple Queue Service (SQS), AWS Lambda...

Systems limited

SQL, AWS Step Functions, AWS Lambda, Python, Pentaho, Microsoft Power BI...

Experience

SQL - 7 years Python 3 - 7 years ETL - 6 years Amazon Web Services (AWS) - 6 years Data Modeling - 6 years Data Engineering - 6 years Data Pipelines - 5 years PySpark - 5 years

Availability

Part-time

Preferred Environment

Slack, Amazon Web Services (AWS), Data Pipelines, PySpark, SQL, Data Warehousing, Pipelines, Python, Databases, Data Analytics, Database Design, Data Analysis

The most amazing...

...ingestion pipeline I've developed had the ability to consume any type, format, and structure of data in real-time and supported both streaming and batch loads.

Work Experience

Senior Data Engineer

2020 - 2022

Scoutbee

Designed and developed a data lake on AWS using the SSOT practice.
Developed a REST API microservice that avoided updating linked services with model versions by serving all machine learning models.
Implemented and designed a streaming ingestion platform with multiple sources and sinks.

Technologies: Python, Apache Kafka, Kafka Streams, Data Lakes, Data Lake Design, Delta Lake, Databricks, Elasticsearch, Neo4j, Graph Databases, Spark, Data Engineering, Data Architecture, ETL, Snowflake, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, SQL, Data Warehousing, Data Analytics, Data, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Redshift, Apache Spark, Amazon Athena, AWS Glue, Amazon Simple Queue Service (SQS), Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Spark Streaming, Spark Structured Streaming, Exploratory Data Analysis

Senior Data Engineer

2019 - 2020

Zaavya

Architected a cloud-native solution based on AWS serverless architecture.
Created configuration-driven data pipeline workflows that use multiple components to organize business processes and data flows.
Migrated data from on-premise databases to the company's AWS data hub.
Automated data cataloging for incoming data in AWS Glue.
Designed and implemented the transaction monitor and error handling flows.

Technologies: SQL, Python, AWS Step Functions, Amazon Simple Queue Service (SQS), AWS Lambda, AWS Glue, Elasticsearch, Graph Databases, Amazon Kinesis, Amazon S3 (AWS S3), Data Engineering, Data Architecture, ETL, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Spark, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Apache Spark, Amazon Athena, Delta Lake, Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Spark Streaming, Spark Structured Streaming, Exploratory Data Analysis

Senior Data Engineer

2018 - 2019

Systems limited

Built the data model for the US elections Associated Press data.
Designed the workflow and worked on implementations using Pentaho.
Created interactive dashboards on Power BI using SQL as a source.
Implemented a hybrid, cloud and on-premise, data ingestion platform using AWS.

Technologies: SQL, AWS Step Functions, AWS Lambda, Python, Pentaho, Microsoft Power BI, Amazon QuickSight, Amazon DynamoDB, Elasticsearch, Data Engineering, Data Architecture, ETL, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Spark, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Redshift, Apache Spark, Delta Lake, Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Exploratory Data Analysis

Data Engineer

2016 - 2018

NorthBay Solutions

Developed data lakes on AWS S3 and worked on an Apache Spark framework that handled and processed terabytes of data.
Visualized and analyzed data using Amazon QuickSight, Amazon Athena, and Amazon Redshift and cataloged data in AWS Glue.
Participated in discussions on architecture and developed a pipeline for bringing an on-premise data warehouse to Amazon Redshift.
Utilized Amazon Redshift for data modeling and developing data marts and views.

Technologies: SQL, Python, Spark, AWS Glue, AWS Lambda, Amazon EC2, Amazon RDS, Redshift, Git, Jira, Amazon S3 (AWS S3), Apache Spark, Amazon QuickSight, Amazon Athena, Data Engineering, ETL, Programming, Analytics, Databases, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Data Architecture, Database Design, Big Data, Big Data Architecture, Data Analysis, Exploratory Data Analysis

Software Engineer

2015 - 2016

Netsol

Wrote SQL scripts, data definition languages, data manipulation languages, and stored procedures for Netsol's financial suite.
Developed a product baseline for all Netsol's financial products.
Collaborated with the team on automating the entire accounting system.
Developed data manipulation and handling scripts for all ongoing accounting events.

Technologies: SQL, Databases, Data, Jira, Scrum, Data Engineering, Programming, Analytics, Data Modeling, Python 3, Data Pipelines, Amazon Web Services (AWS), Data Warehousing, Data Analytics, Spark, Data Lakes, ETL, Data Quality, Data Cleaning, Git, Exploratory Data Analysis

Experience

Neiman Marcus – Smart Data Platform

Neiman Marcus, a retail company, wanted to create an operational data hub using multiple data storage technologies, which were chosen based on the enterprise's data structure and consumption. The data hub is comprised of numerous processing components that were stitched together using a data pipeline.

Medicare – Data Platform

Medicare is a US federal health insurance agency with billions of insurance, pharmaceutical, and medical records. They needed an efficient platform to ingest, manage, and query their data. The final product needed to be in the FHIR format to comply with the US government's standards.

Associated Press – Election Campaign

I worked on a project for The Associated Press, a US nonprofit news agency. The project was focused on building the data model and designing a pipeline for taking and storing election campaign data. I used Microsoft SQL Server and Amazon Redshift for storing data and connected them to Power BI to enable live reporting.

S&P Global Ratings – Data Lake

S&P Global Ratings is a US credit rating agency that publishes financial research and analysis. They had over 15 terabytes of data in Oracle, and the volume of data ingestion led to performance issues. I created a data lake on Amazon S3, which was used by multiple sources to dump data. The project's ETL framework was designed and developed in PySpark.

Skills

Languages

Python 3, SQL, Python, Snowflake

Frameworks

Spark, Apache Spark, Spark Structured Streaming

Libraries/APIs

PySpark, Spark Streaming

Tools

AWS Glue, Git, AWS Step Functions, Jira, Amazon Simple Queue Service (SQS), Amazon Athena, Slack, Microsoft Power BI, Kafka Streams, Amazon QuickSight

Paradigms

ETL, Database Design, Scrum

Platforms

AWS Lambda, Amazon Web Services (AWS), Amazon EC2, Pentaho, Apache Kafka, Databricks

Storage

Databases, Data Pipelines, Redshift, Data Lakes, Data Lake Design, Elasticsearch, Amazon S3 (AWS S3), Amazon DynamoDB, Graph Databases, Neo4j

Other

Programming, Analytics, Data Modeling, Data Warehousing, Data Analytics, Data, Delta Lake, Data Engineering, Data Warehouse Design, Data Cleaning, Data Architecture, Big Data, Big Data Architecture, Data Analysis, Data Matching, Exploratory Data Analysis, Data Structures, Machine Learning, Amazon RDS, Data Quality, Computer Science, Pipelines, Amazon Kinesis, Streaming

Education

2018 - 2020

Master's Degree in Computer Science

Lahore University of Management Sciences - Lahore, Pakistan

2011 - 2015

Bachelor's Degree in Computer Engineering

National University of Computer and Emerging Sciences - Lahore, Pakistan

Certifications

JULY 2020 - JULY 2023

AWS Certified Solutions Architect

Amazon Web Services

AUGUST 2017 - AUGUST 2020

AWS Certified Developer

Amazon Web Services

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring