Salman is available for hire

Salman Ahmed

Verified Expert in Engineering

Data Engineer and Developer

Location

Karachi, Sindh, Pakistan

Toptal Member Since

February 14, 2022

Salman is a GCP-certified professional data engineer specializing in building, maintaining, and optimizing warehouse and data pipelines with cost efficiency in mind. He has 5+ years of experience in fintech, eCommerce, and sourcing industries. Salman helps businesses store and process data in the best ways possible to convert it into actionable insights and predictions.

Data Engineering Data Queries Database Development Python SQL Data Pipelines Databases Relational Databases RDBMS Apache Hive JSON Impala Big Data Data Warehousing Analytics

Portfolio

Heirloom

Google Cloud Platform (GCP), Google Cloud Functions, Apache Airflow, BigQuery...

Freelance

Big Data, Spark, Hadoop, Data Warehousing, Data Engineering, ETL...

Sanpei Ventures

Alibaba Cloud, Apache Airflow, PostgreSQL, SQL, Python, GitHub, JSON...

Experience

Python - 4 years Data Engineering - 4 years Spark - 3 years ETL - 3 years BigQuery - 2 years Informatica ETL - 1 year Databricks - 1 year Google Cloud Platform (GCP) - 1 year

Availability

Full-time

Preferred Environment

Windows, Linux, Jupyter, Slack

The most amazing...

...thing I've developed is a search engine for MOOCs, courses, and jobs from all around the web.

Work Experience

Data Engineer

2023 - PRESENT

Heirloom

Integrated multiple 3rd-party sources into BigQuery for dashboarding and analysis.
Developed multiple data pipelines to clean and transform data according to business needs.
Set up alerting and monitoring for pipelines and infrastructure so that relevant stakeholders can be notified whenever any issue arises.
Designed workflows in Airflow to automate different processes.
Sped up dashboards using different techniques, such as materialized views.
Upgraded and maintained Cloud SQL for MySQL instance for optimal performance.

Technologies: Google Cloud Platform (GCP), Google Cloud Functions, Apache Airflow, BigQuery, Google Cloud SQL, MySQL, Python, SQL, Google Cloud Storage, Tableau

Big Data Engineer | Consultant

2021 - PRESENT

Freelance

Wrote Spark, MapReduce, Hive, and Python scripts for distributed data processing.
Extracted products from eCommerce store reviews and predicted their sentiments using Python, NLTK, spaCy, and Flair.
Helped generate reports and articles related to big data, data engineering, and data science.
Developed different Big Query ETLs, scheduled them using Airflow, and built some nice visualizations on Google Data Studio.
Integrated several third-party sources with Big Query, including CRM, Asana, Bevy, etc., using stitch data, custom scripts, and APIs/webhooks.
Optimized already running ETLs, which reduces the GCP cost and retrieval time by approximately 40-60%.

Technologies: Big Data, Spark, Hadoop, Data Warehousing, Data Engineering, ETL, Machine Learning, Data Science, Google Cloud Platform (GCP), BigQuery, Apache Airflow, Python, Alibaba Cloud, PostgreSQL, Docker, Kubernetes, Tableau, Google Data Studio, Query Optimization, Data Warehouse Design, Data Lakes, Dashboards, Relational Data Mapping, Automation, Data Processing, Pub/Sub, Apache Kafka, Relational Databases, Apache Spark, PySpark, Amazon EC2, Amazon S3 (AWS S3), ELT, Jenkins, Confluence, Data Architecture, Automated Data Flows, Reports, T-SQL (Transact-SQL), Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, SQL Performance, Performance Tuning, Data Queries, SQL DML, Integration, API Integration, Azure, Stitch Data, ETL Tools, CI/CD Pipelines, Business Intelligence (BI) Platforms, Data Analytics, Google Cloud Storage, Azure SQL Databases, Azure SQL, Google BigQuery, Data Pipelines, Jupyter Notebook, Schemas, AWS Lambda, Amazon RDS, RDBMS

Data Engineer

2022 - 2023

Sanpei Ventures

Collaborated with a team to build pipelines that collect, clean, and analyze different eCommerce brands' data.
Managed the Airflow server for optimal performance and set up a proper process to synchronize it with the GitHub repository.
Optimized an already-running ETL with reduced network and CPU usage.
Normalized and converted NoSQL (JSON) data to relational forms.
Created a process to automate the regex-based pattern matching for products from their warehouse.

Technologies: Alibaba Cloud, Apache Airflow, PostgreSQL, SQL, Python, GitHub, JSON, Data Pipelines, Databases, Database Administration (DBA), Dashboards, Automation, Data Processing, Relational Databases, ELT, Change Data Capture, Data Architecture, Automated Data Flows, T-SQL (Transact-SQL), SQL Performance, Performance Tuning, Data Queries, SQL DML, Integration, API Integration, ETL Tools, Data Analytics, Jupyter Notebook, Schemas, Amazon RDS, RDBMS

Data Engineer

2021 - 2022

Amigoals

Wrote multiple scrapers and parsers to fetch and transform data into its best form.
Set up data pipelines on Databricks to scrape different public sites and clean and ingest data into the warehouse.
Wrote transformation pipelines on Spark and Spark SQL to clean and transform data for further analysis.

Technologies: Databricks, Python, SQL, Spark, Delta Lake, Selenium, Data Analysis, Relational Data Mapping, Data Processing, Relational Databases, Apache Spark, PySpark, ELT, Automated Data Flows, SQL Performance, Data Queries, SQL DML, API Integration, Data Analytics, Data Pipelines, Jupyter Notebook, Schemas, RDBMS

Big Data Consultant

2019 - 2022

Blutech Consulting

Developed and automated fault-tolerant ETL pipelines in multiple banking streams to load data into a dimension layer that is then provided to the BI team and relevant departments for report generation and decision-making.
Worked with clients to translate business problems into quantitative queries and collect/clean the necessary data.
Optimized Informatica/SSIS workflows and improved the performance of ETL pipelines that process GBs of data daily.
Developed non-skewed data marts on Teradata, where daily ETLs can be run easily.
Built an app called Data Robot to search for accounts and entities in a data lake using fuzzy search, which helps multiple departments in their ongoing operations. The manual process that preceded this product used to take weeks.
Automated regulatory returns through a data lake, using big data and reporting tools, which helps in making the process fast, dynamic, and scheduled instead of doing it manually in Excel.
Built an app called ScoreCard to automate Spark reports for anomalous transactions, accounts, and agents. It helped the bank score their app agents to reward and improve their services.
Developed processes to check for ETL anomalous behavior and alerting capabilities to ensure the data is loaded correctly and on time.

Technologies: Spark, Data Warehousing, ETL, Big Data, Hadoop, Apache Hive, Impala, Informatica, SQL Server Integration Services (SSIS), Microsoft SQL Server, MySQL, Teradata, SQL Server Reporting Services (SSRS), Python, SQL, Data Engineering, Data Analysis, Data Science, Visualization, Analytics, Data Modeling, Business Intelligence (BI), Microsoft Power BI, Informatica ETL, MariaDB, Data Warehouse Design, Query Optimization, Databases, Database Administration (DBA), Data Lakes, Data Processing, Apache Kafka, Relational Databases, Apache Spark, PySpark, SSAS Tabular, SQL Server Analysis Services (SSAS), ELT, Change Data Capture, Data Architecture, Automated Data Flows, Reports, T-SQL (Transact-SQL), SQL Performance, Performance Tuning, Data Queries, SQL DML, Integration, ETL Tools, Business Intelligence (BI) Platforms, Data Analytics, Data Pipelines, Jupyter Notebook, Schemas, Consulting, RDBMS

Experience

CourseThread

A search engine to find courses, MOOCs, and other online learning material. I was the only back-end engineer, building scrapers, ETL, and Django-based code. I also worked with the front-end engineer to create the necessary tables and APIs so that the app could run smoothly.

FoodCase

A restaurant analyzer app using NLP and machine learning that shows different stats and useful info about the restaurants coming via user feedback and reviews. I completed the whole project myself including back-end development, scraping, ETL, database development, and front end.

Movie Review Sentiment Analysis (Kernels Only)

https://www.kaggle.com/code/pantherpanther/simple-1d-cnn-with-glove-twitter-embeddings?scriptVersionId=10076590

The Rotten Tomatoes movie review dataset is a corpus of movie reviews. As a data scientist, I analyzed and trained a sentiment analysis model to predict the user sentiments from user reviews. The competition was hosted on Kaggle.

Gift Assistant

https://www.giftassistant.io

Gift Assistant is an artificial intelligence technology that enables customers to find the perfect gift for any occasion. Gift AI combines machine learning and natural language processing (NLP) to analyze customer conversations and identify the best gift for each customer's unique needs. With Gift AI, customers can easily find the perfect gift quickly and accurately.

Assessments for Udemy Business Pro

As a subject matter expert for Google Professional Cloud Data Engineer certification, I created professional development assessments for Udemy Business Pro. These assessments help measure and evaluate learners' current knowledge to identify areas for improvement.

Skills

Languages

Python, SQL, SQL DML, T-SQL (Transact-SQL), JavaScript, Snowflake

Tools

Impala, BigQuery, Apache Airflow, GitHub, Informatica ETL, Stitch Data, Looker, Amazon Elastic MapReduce (EMR), Jupyter, Slack, Microsoft Power BI, Tableau, MySQL Workbench, Microsoft Excel, Confluence, Jenkins, Google Analytics, AWS Glue

Paradigms

Database Development, ETL, Automation, Data Science, Business Intelligence (BI)

Platforms

Jupyter Notebook, Databricks, Google Cloud Platform (GCP), AWS Lambda, Oracle, Windows, Linux, Docker, Kubernetes, Azure, Amazon Web Services (AWS), Azure SQL Data Warehouse, Amazon EC2, Apache Kafka, Dedicated SQL Pool (formerly SQL DW)

Storage

Apache Hive, Data Pipelines, JSON, Databases, Relational Databases, RDBMS, MySQL, Teradata, Google Cloud Storage, PostgreSQL, MariaDB, SQL Performance, Database Administration (DBA), SQL Server Analysis Services (SSAS), Data Lakes, Redshift, SQL Server Integration Services (SSIS), Microsoft SQL Server, SQL Server Reporting Services (SSRS), Alibaba Cloud, Azure SQL Databases, Azure SQL, SSAS Tabular, Amazon S3 (AWS S3), Google Cloud SQL

Other

Data Engineering, ETL Tools, Data Queries, ELT, Schemas, Data Transformation, CSV, CSV File Processing, Big Data, Data Warehousing, Informatica, Data Analysis, Natural Language Processing (NLP), Analytics, Data Analytics, Data Modeling, Query Optimization, Data Warehouse Design, API Integration, Integration, Performance Tuning, Automated Data Flows, Data Architecture, Change Data Capture, Data Processing, Relational Data Mapping, Google BigQuery, Amazon RDS, Consulting, Scaling, Reporting, CRM APIs, GPT, Generative Pre-trained Transformers (GPT), Visualization, Machine Learning, Scraping, APIs, Deep Learning, Data Visualization, Delta Lake, Business Intelligence (BI) Platforms, CI/CD Pipelines, Google Data Studio, Reports, Pub/Sub, Dashboards, Google Search Console, Artificial Intelligence (AI), Google Cloud Functions

Frameworks

Spark, Hadoop, Apache Spark, Flask, Django, Selenium

Libraries/APIs

PySpark, Keras, Pandas, Matplotlib, Scikit-learn

Education

2014 - 2018

Bachelor's Degree in Computer Science

National University of Computer and Emerging Sciences (FAST) - Pakistan

Certifications

JUNE 2022 - JUNE 2024

Professional Data Engineer

Google Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring