Salman Azhar, Developer in Lahore, Pakistan
Salman is available for hire
Hire Salman

Salman Azhar

Verified Expert  in Engineering

Data Engineering Developer

Location
Lahore, Pakistan
Toptal Member Since
June 6, 2022

Salman is a challenge-driven data engineer with 7+ years of professional experience developing Big Data solutions and using various Amazon cloud services, including AWS StepFunctions, Amazon Redshift, and AWS Data Pipeline. In addition, he has experience designing architectural flows and implementing data lakes, data ingestion workflows, and data warehouses on AWS. Salman is always looking for challenging opportunities and prides himself on going the extra mile on every project.

Portfolio

Scoutbee
Python, Apache Kafka, Kafka Streams, Data Lakes, Data Lake Design, Delta Lake...
Zaavya
SQL, Python, AWS Step Functions, Amazon Simple Queue Service (SQS), AWS Lambda...
Systems limited
SQL, AWS Step Functions, AWS Lambda, Python, Pentaho, Microsoft Power BI...

Experience

Availability

Part-time

Preferred Environment

Slack, Amazon Web Services (AWS), Data Pipelines, PySpark, SQL, Data Warehousing, Pipelines, Python, Databases, Data Analytics, Database Design, Data Analysis

The most amazing...

...ingestion pipeline I've developed had the ability to consume any type, format, and structure of data in real-time and supported both streaming and batch loads.

Work Experience

Senior Data Engineer

2020 - 2022
Scoutbee
  • Designed and developed a data lake on AWS using the SSOT practice.
  • Developed a REST API microservice that avoided updating linked services with model versions by serving all machine learning models.
  • Implemented and designed a streaming ingestion platform with multiple sources and sinks.
Technologies: Python, Apache Kafka, Kafka Streams, Data Lakes, Data Lake Design, Delta Lake, Databricks, Elasticsearch, Neo4j, Graph Databases, Spark, Data Engineering, Data Architecture, ETL, Snowflake, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, SQL, Data Warehousing, Data Analytics, Data, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Redshift, Apache Spark, Amazon Athena, AWS Glue, Amazon Simple Queue Service (SQS), Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Spark Streaming, Spark Structured Streaming, Exploratory Data Analysis

Senior Data Engineer

2019 - 2020
Zaavya
  • Architected a cloud-native solution based on AWS serverless architecture.
  • Created configuration-driven data pipeline workflows that use multiple components to organize business processes and data flows.
  • Migrated data from on-premise databases to the company's AWS data hub.
  • Automated data cataloging for incoming data in AWS Glue.
  • Designed and implemented the transaction monitor and error handling flows.
Technologies: SQL, Python, AWS Step Functions, Amazon Simple Queue Service (SQS), AWS Lambda, AWS Glue, Elasticsearch, Graph Databases, Amazon Kinesis, Amazon S3 (AWS S3), Data Engineering, Data Architecture, ETL, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Spark, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Apache Spark, Amazon Athena, Delta Lake, Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Spark Streaming, Spark Structured Streaming, Exploratory Data Analysis

Senior Data Engineer

2018 - 2019
Systems limited
  • Built the data model for the US elections Associated Press data.
  • Designed the workflow and worked on implementations using Pentaho.
  • Created interactive dashboards on Power BI using SQL as a source.
  • Implemented a hybrid, cloud and on-premise, data ingestion platform using AWS.
Technologies: SQL, AWS Step Functions, AWS Lambda, Python, Pentaho, Microsoft Power BI, Amazon QuickSight, Amazon DynamoDB, Elasticsearch, Data Engineering, Data Architecture, ETL, Programming, Analytics, Databases, Data Structures, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Spark, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Git, Redshift, Apache Spark, Delta Lake, Database Design, Big Data, Big Data Architecture, Data Analysis, Data Matching, Exploratory Data Analysis

Data Engineer

2016 - 2018
NorthBay Solutions
  • Developed data lakes on AWS S3 and worked on an Apache Spark framework that handled and processed terabytes of data.
  • Visualized and analyzed data using Amazon QuickSight, Amazon Athena, and Amazon Redshift and cataloged data in AWS Glue.
  • Participated in discussions on architecture and developed a pipeline for bringing an on-premise data warehouse to Amazon Redshift.
  • Utilized Amazon Redshift for data modeling and developing data marts and views.
Technologies: SQL, Python, Spark, AWS Glue, AWS Lambda, Amazon EC2, Amazon RDS, Redshift, Git, Jira, Amazon S3 (AWS S3), Apache Spark, Amazon QuickSight, Amazon Athena, Data Engineering, ETL, Programming, Analytics, Databases, Data Modeling, Python 3, Data Pipelines, PySpark, Data Warehousing, Data Analytics, Data, Data Lakes, Data Lake Design, Amazon Web Services (AWS), Data Warehouse Design, Data Quality, Data Cleaning, Data Architecture, Database Design, Big Data, Big Data Architecture, Data Analysis, Exploratory Data Analysis

Software Engineer

2015 - 2016
Netsol
  • Wrote SQL scripts, data definition languages, data manipulation languages, and stored procedures for Netsol's financial suite.
  • Developed a product baseline for all Netsol's financial products.
  • Collaborated with the team on automating the entire accounting system.
  • Developed data manipulation and handling scripts for all ongoing accounting events.
Technologies: SQL, Databases, Data, Jira, Scrum, Data Engineering, Programming, Analytics, Data Modeling, Python 3, Data Pipelines, Amazon Web Services (AWS), Data Warehousing, Data Analytics, Spark, Data Lakes, ETL, Data Quality, Data Cleaning, Git, Exploratory Data Analysis

Neiman Marcus – Smart Data Platform

Neiman Marcus, a retail company, wanted to create an operational data hub using multiple data storage technologies, which were chosen based on the enterprise's data structure and consumption. The data hub is comprised of numerous processing components that were stitched together using a data pipeline.

Medicare – Data Platform

Medicare is a US federal health insurance agency with billions of insurance, pharmaceutical, and medical records. They needed an efficient platform to ingest, manage, and query their data. The final product needed to be in the FHIR format to comply with the US government's standards.

Associated Press – Election Campaign

I worked on a project for The Associated Press, a US nonprofit news agency. The project was focused on building the data model and designing a pipeline for taking and storing election campaign data. I used Microsoft SQL Server and Amazon Redshift for storing data and connected them to Power BI to enable live reporting.

S&P Global Ratings – Data Lake

S&P Global Ratings is a US credit rating agency that publishes financial research and analysis. They had over 15 terabytes of data in Oracle, and the volume of data ingestion led to performance issues. I created a data lake on Amazon S3, which was used by multiple sources to dump data. The project's ETL framework was designed and developed in PySpark.

Languages

Python 3, SQL, Python, Snowflake

Frameworks

Spark, Apache Spark, Spark Structured Streaming

Libraries/APIs

PySpark, Spark Streaming

Tools

AWS Glue, Git, AWS Step Functions, Jira, Amazon Simple Queue Service (SQS), Amazon Athena, Slack, Microsoft Power BI, Kafka Streams, Amazon QuickSight

Paradigms

ETL, Database Design, Scrum

Platforms

AWS Lambda, Amazon Web Services (AWS), Amazon EC2, Pentaho, Apache Kafka, Databricks

Storage

Databases, Data Pipelines, Redshift, Data Lakes, Data Lake Design, Elasticsearch, Amazon S3 (AWS S3), Amazon DynamoDB, Graph Databases, Neo4j

Other

Programming, Analytics, Data Modeling, Data Warehousing, Data Analytics, Data, Delta Lake, Data Engineering, Data Warehouse Design, Data Cleaning, Data Architecture, Big Data, Big Data Architecture, Data Analysis, Data Matching, Exploratory Data Analysis, Data Structures, Machine Learning, Amazon RDS, Data Quality, Computer Science, Pipelines, Amazon Kinesis, Streaming

2018 - 2020

Master's Degree in Computer Science

Lahore University of Management Sciences - Lahore, Pakistan

2011 - 2015

Bachelor's Degree in Computer Engineering

National University of Computer and Emerging Sciences - Lahore, Pakistan

JULY 2020 - JULY 2023

AWS Certified Solutions Architect

Amazon Web Services

AUGUST 2017 - AUGUST 2020

AWS Certified Developer

Amazon Web Services