Syed is available for hire

Syed Muneeb Hussain

Verified Expert in Engineering

SQL and Data Developer

Location

Karachi, Sindh, Pakistan

Toptal Member Since

July 20, 2022

Muneeb is an experienced data engineer proficient in Python and SQL, specializing in big data technologies. He has worked on various cloud platforms such as AliCloud, Azure, and GCP, as well as transformation tools like dbt and ETL tools, including Airflow, Prefect, ADF, Talend, and SSIS. Additionally, he excels in data visualization using Power BI, Grafana, and Apache Superset. Muneeb is well-versed in DataOps and CI/CD pipelining using Docker and GitHub.

Data Engineering Analytics Data Warehousing Data Warehouse Design Data Queries Data Analytics Data Modeling Data Migration SQL Python ETL NumPy Database Design MySQL Data Pipelines Alibaba Cloud Feature Engineering Apache Superset

Portfolio

Dataquartz

Python 3, SQL, DuckDB, Grafana, Flask, REST, Docker, GitHub, Airbyte, Meltano...

Seeloz

Azure Logic Apps, Apache Hive, Google BigQuery, PySpark, Spark SQL, SQL...

Daraz | Alibaba Group

Apache Hive, SQL, Blink SQL, Alibaba Cloud, Data, Data Engineering...

Experience

Python - 8 years Data Engineering - 8 years SQL - 8 years ETL - 6 years Apache Airflow - 3 years BigQuery - 3 years PostgreSQL - 3 years Data Build Tool (dbt) - 2 years

Availability

Part-time

Preferred Environment

SQL, Python 3, PyCharm, Talend, PostgreSQL, Grafana, Data Build Tool (dbt), Apache Airflow, Meltano, Docker

The most amazing...

...project I've developed is a user-friendly ETL platform, democratizing data orchestration for seamless workflows—bridging the gap for non-tech users.

Work Experience

Lead Data Engineer

2022 - PRESENT

Dataquartz

Led the development of an in-house data ingestion product with Python, Flask, DuckDB, PostgreSQL, Grafana for dynamic visualization, and Prefect for ETL workflow management.
Orchestrated end-to-end ETL pipelines, incorporating audit logging and data integrity checks using Airflow and Prefect.
Implemented Prometheus and the Node Exporter for robust logging within the application.
Pioneered bug tracking, resolution, and new feature development in the data model.
Containerized the entire application using Docker for enhanced scalability and manageability in the data engineering workflow.

Technologies: Python 3, SQL, DuckDB, Grafana, Flask, REST, Docker, GitHub, Airbyte, Meltano, Apache Airflow, Prefect, Prometheus, Node Exporter, Data Engineering, Data Modeling, Data Warehousing, PostgreSQL, JSON, MinIO, S3 Buckets, Cloud, Data Build Tool (dbt), Pandas, NumPy, Apache Superset, Database Design, Google Cloud, Analytics, English, Agile, Data Science, Data Structures

Data Engineering Manager

2022 - 2022

Seeloz

Worked on the development of a data ingestion product using Prefect, SQL, Python, Flask, DuckDB, and PostgreSQL for the back end, and Grafana for the data visualization.
Built various ETL projects using SQL, PySpark, Scala, Azure Logic Apps, and more to pull data from multiple ERPs and various source systems.
Developed Azure Logic Apps to pull data from Microsoft Dynamics 365 data entities. Wrote ETL in Scala and PySpark to load them into the supply chain meta-model.
Implemented a monitoring framework using PySpark, PostgreSQL, and Grafana to ensure data correctness and integrity.
Worked on the development and optimizations of various data models and ETL pipelines for fast data processing.
Monitored daily data pipelines and ETL data load processes to ensure all the required data was loaded correctly in the supply chain data model.
Developed various dashboards using Grafana and Power BI to gauge important business metrics.

Technologies: Azure Logic Apps, Apache Hive, Google BigQuery, PySpark, Spark SQL, SQL, Database Design, Scala, Data Warehousing, Azure Blobs, Data Analysis, Data Engineering, Databricks, Slowly Changing Dimensions (SCD), Query Optimization, Big Data Architecture, Data Pipelines, Data Quality Analysis, IntelliJ IDEA, Shell, Data Integration, Data Queries, Analysis, ETL, Business Intelligence (BI), Python 3, PyCharm, Big Data, Data Warehouse Design, Azure, Cloud Infrastructure, ETL Tools, Databases, Python, CI/CD Pipelines, GitHub, Azure SQL, Data Analytics, Database Analytics, RDBMS, Data Processing, Business Intelligence (BI) Platforms, Azure SQL Databases, Dedicated SQL Pool (formerly SQL DW), Azure SQL Data Warehouse, API Integration, SQL DML, SQL Performance, Performance Tuning, T-SQL (Transact-SQL), Reports, BI Reports, Apache Spark, Relational Databases, Data Modeling, Database Modeling, MariaDB, Business Logic, APIs, Data Architecture, Database Architecture, Logical Database Design, Database Schema Design, Relational Database Design, REST APIs, Azure Service Bus, JSON, Quality Management, MySQL, Dimensional Modeling, ELT, Pandas, Spark, Microsoft Azure, Schemas, Jupyter Notebook, Relational Data Mapping, BigQuery, Reporting, BI Reporting, Windows PowerShell, XML, AnyDesk, Apache Airflow, NoSQL, PostgreSQL, Database Optimization, DuckDB, Apache Flink, Data Extraction, CSV Export, CSV, Scripting, MongoDB, .NET, HTML, CSS, Cloud, Azure Data Factory, Data Management, Data Build Tool (dbt), Azure Databricks, Warehouses, Meltano, Flask, REST, NumPy, Google Cloud, Analytics, English, Agile, Data Science, Data Structures

Big Data Engineering and Governance Lead

2019 - 2022

Daraz | Alibaba Group

Built and managed a DWH architecture, and wrote automated ETL scripts using HiveQL, HDFS, HBase, Python, and Shell on a cloud platform for data ingestions.
Developed BI dashboards on Power BI, vShow, and FBI to gauge important metrics related to domains like customer funnel, marketing, and logistics.
Developed and maintained an enterprise data warehouse and monitored data ingestion pipelines on a daily basis using SQL, Python, Flink, ODPS, and ETL flows.
Optimized dozens of ETL pipelines and SQL queries for fast data processing to finish the execution in minutes instead of hours.
Worked closely with the departmental HODs to maintain optimum levels of communication to effectively and efficiently complete projects.
Managed incoming data analysis requests and efficiently distributed results to support decision strategies.

Technologies: Apache Hive, SQL, Blink SQL, Alibaba Cloud, Data, Data Engineering, Data Warehousing, Data Governance, Big Data, Python 3, Shell, Data Visualization, Business Intelligence (BI), Query Optimization, Data Integration, PostgreSQL, MySQL, Slowly Changing Dimensions (SCD), Data Analysis, Big Data Architecture, Data Pipelines, Data Quality Analysis, Data Queries, Analysis, ETL, Database Design, Data Warehouse Design, Azure, Cloud Infrastructure, ETL Tools, Databases, Python, CI/CD Pipelines, GitHub, Data Analytics, Database Analytics, RDBMS, Data Processing, Business Intelligence (BI) Platforms, Azure SQL Databases, Azure SQL Data Warehouse, Dedicated SQL Pool (formerly SQL DW), Microsoft SQL Server, API Integration, SQL DML, SQL Performance, Performance Tuning, T-SQL (Transact-SQL), Reports, BI Reports, PySpark, Apache Spark, Relational Databases, Data Modeling, Database Modeling, Stored Procedure, Tableau, Dashboards, Dashboard Development, MariaDB, Business Logic, Microsoft Power BI, APIs, Data Architecture, Database Architecture, Logical Database Design, Database Schema Design, Relational Database Design, REST APIs, JSON, MySQL Workbench, Quality Management, IntelliJ IDEA, Dimensional Modeling, ELT, Pandas, Spark, Schemas, Jupyter Notebook, Relational Data Mapping, BigQuery, Reporting, BI Reporting, Windows PowerShell, XML, AnyDesk, Apache Airflow, NoSQL, Database Optimization, Hadoop, HDFS, Docker, Google BigQuery, Apache Flink, Data Extraction, CSV Export, CSV, Scripting, MongoDB, HTML, CSS, Cloud, Azure Data Factory, Data Management, Data Build Tool (dbt), Warehouses, REST, NumPy, Analytics, English, Agile, Data Structures

Technical Consultant

2019 - 2019

Qordata

Designed and developed end-to-end data ingestion pipelines to ensure data flow daily.
Implemented and managed data flow jobs for data modeling solutions relevant to the health and life science industry, using tools like SQL Server Integration Services (SSIS) and Microsoft SQL Server.
Developed SQL queries, stored procedures, and dynamic SQL and optimized existing complex SQL queries to speed up day-to-day processes.
Created ad-hoc data reports that clients requested following their requirements.

Technologies: SQL, SQL Server Integration Services (SSIS), SQL Server Management Studio (SSMS), Data Analysis, Data Quality Analysis, Data Queries, Query Plan, Query Optimization, SQL Stored Procedures, Slowly Changing Dimensions (SCD), Data Engineering, Data Pipelines, Shell, Data Integration, Analysis, ETL, Data Warehousing, Business Intelligence (BI), Database Design, Data Warehouse Design, ETL Tools, Databases, Data Analytics, Database Analytics, RDBMS, Data Processing, Business Intelligence (BI) Platforms, Microsoft SQL Server, SQL DML, SQL Performance, Performance Tuning, T-SQL (Transact-SQL), Relational Databases, Data Modeling, Database Modeling, Business Logic, Data Architecture, Database Architecture, Logical Database Design, Database Schema Design, Relational Database Design, Visual Studio, Quality Management, MySQL, Dimensional Modeling, ELT, Schemas, Jupyter Notebook, Relational Data Mapping, Reporting, BI Reporting, Windows PowerShell, NoSQL, PostgreSQL, Database Optimization, Data Extraction, CSV Export, CSV, Scripting, MongoDB, .NET, Cloud, Data Management, Warehouses, NumPy, Analytics, English, Data Structures

Data Engineer

2017 - 2019

Afiniti

Designed and developed a database architecture and data model for a business flow using Talend Open Studio, SSIS, and MySQL Workbench.
Performed large-scale data conversions, migrations, and optimization to reduce resource and time costs while maintaining data integrity.
Wrote SQL stored procedures and Python scripts for data quality checks and ad-hoc analyses.
Implemented complex data processing jobs, including integrating customer relationship management (CRM) and third-party data into daily processes.
Established automated emails to have more visibility on the progress of regular data processing tasks.
Analyzed clients' business processes to propose optimal solutions for data requirements.

Technologies: SQL, MySQL, SQL Server Integration Services (SSIS), SQL Server Management Studio (SSMS), Talend, Talend ETL, Data Engineering, Data Pipelines, Data Analysis, Analysis, Data Visualization, Business Intelligence (BI), Slowly Changing Dimensions (SCD), Query Optimization, Data Quality Analysis, Shell, Data Integration, Data Queries, SQL Stored Procedures, ETL, Data Warehousing, Database Design, Python 3, Data Warehouse Design, ETL Tools, Databases, Python, Data Analytics, Database Analytics, RDBMS, Data Processing, Business Intelligence (BI) Platforms, Microsoft SQL Server, SQL DML, SQL Performance, Performance Tuning, T-SQL (Transact-SQL), Relational Databases, Data Modeling, Database Modeling, Stored Procedure, Business Logic, MariaDB, Microsoft Power BI, Data Architecture, Database Architecture, Logical Database Design, Database Schema Design, Relational Database Design, Visual Studio, MySQL Workbench, Quality Management, Dimensional Modeling, ELT, Pandas, Schemas, Jupyter Notebook, Relational Data Mapping, Reporting, BI Reporting, Windows PowerShell, AnyDesk, NoSQL, Database Optimization, Data Extraction, CSV Export, CSV, Scripting, .NET, HTML, CSS, Cloud, Data Management, Warehouses, NumPy, Analytics, English, Data Science, Data Structures

Experience

Automated ETL Tool

As the driving force behind a groundbreaking project, I contributed to the development of an in-house data ingestion system, meticulously implementing best practices in data engineering, development, and testing. Employing a robust tech stack that included Python, Flask, DuckDB, PostgreSQL, Grafana, Prefect, DuckDB, and Prometheus, our approach prioritized scalability, efficiency, and maintainability.

Our vision materialized in a centralized application boasting an intuitive click-and-drop interface, fostering a seamless user experience. This innovation serves as a dynamic one-stop-shop, empowering a diverse user base, both technical and non-technical, to effortlessly design and deploy ETL pipelines. The architecture embodies industry-leading practices, optimizing for performance, reliability, and data integrity.

Adhering to agile methodologies, we championed iterative improvements and rapid feature deployment. Rigorous testing practices, spanning unit, integration, and end-to-end testing, ensured the product's reliability and stability. Leveraging automated testing frameworks streamlined our testing process, guaranteeing thorough coverage and swift issue identification.

Meltano Custom Extractor

https://github.com/muneebsmh/meltano_custom_extractor

This project showcases a Python-based ELT (extract, load, transform) application engineered with Meltano, intricately woven with the SpaceX API. The application follows a streamlined process: first, it extracts raw data from the SpaceX API, capturing a wealth of information on rocket launches, missions, and related details. Once extracted, the data undergoes a rigorous transformation journey orchestrated by dbt (data build tool). Through dbt's powerful features, such as data modeling, versioning, and testing, the raw data is refined, enriched, and structured to meet specific business requirements and analytical needs. Finally, the transformed data finds its home in a PostgreSQL database, where it awaits eager analysis and exploration, ready to unveil insights that drive decision-making and propel innovation. Additionally, the project boasts a bespoke Meltano extractor, meticulously crafted to fetch data from the SpaceX API at regular intervals. Leveraging both incremental and full-load methodologies, this extractor ensures that the dataset remains up-to-date and comprehensive, paving the way for continuous insights and discoveries.

Hopsworks Feature Store Python Integration

https://github.com/muneebsmh/hopsworks-integrations

Addressing the challenges of limited documentation and global developer support for Hopsworks, I took on the task of developing Python APIs for the feature store. Despite tight timelines and minimal community assistance, I created all necessary APIs successfully. Moreover, I published the "Hopsworks Integration" library on PyPi.org (pypi.org/project/Hopsworks-Integration/). This library has gained traction and is now recognized by the Hopsworks development community.

Payment Risk Engine | COD Blocking

A system that identifies and blocks the cash-on-delivery option for faulty customers with bad buying histories. Previously, we had no way of tracking the customer performance, which led to many customers rejecting the delivered orders at their doorsteps, causing Daraz to bear the failed logistics cost. This system enabled us to block a cash-on-delivery (COD) feature for certain customers and make them pay in advance for their orders. It is based on a delicate trade-off as it increases gross-to-net revenue but can also decrease the customer base due to the COD feature blocking for parcel deliveries.

I first conducted a thorough data analysis to find the impact on the business and moved on to creating data pipelines and a performance dashboard that would gauge the impact of the system on the overall business of Daraz.

Delayed Order Notification System

An automated alert system that notifies customers about delayed orders based on specific logistics metrics in order to enhance the customer experience. I worked on developing the system's end-to-end data pipelines, designed the business flow, and made a BI dashboard to gauge the performance.

This project not only enhanced the customer experience but also helped in gauging Daraz's logistics performance and highlighted key metrics that needed to be fixed.

Dashboard Usage Analysis

Every data visualization dashboard consumes a certain amount of computing and memory resources. Knowing how many resources the dashboards consume from the assigned cloud quota is imperative when working in the eCommerce industry. Currently, there are more than 700 dashboards in Daraz. When these dashboards are refreshed daily, they consume many resources, slowing down other processes. Therefore, I needed to identify which dashboards were the most frequently used and which were not so they could be decommissioned to save resources.

I created a meta dashboard that would rank the dashboards by tracking the daily, weekly, and monthly active users and their visits. Also, this meta dashboard tracked individual user history on multiple dashboards, i.e., the number of dashboards that a particular user regularly visits, which helped us filter out the executives' dashboards.

Enterprise Data Warehouse

At my previous company, Afiniti, multiple clients used the Afiniti engine to optimize their call center performance based on the data-driven decision-based customer and agent pairing. The legacy enterprise data portal that Afiniti used to gauge clients' performance had some limitations. For instance, there was no implementation of change data capture and historical analysis of the clients. Also, the optimizing metric, such as handle time, wait time, etc., that was used to calculate the performance of a client was not recorded historically.

The enterprise data warehouse (EDW) structure caters to all limitations of an enterprise portal along with additional features, such as a standardized model that can fit into different business requirements without any change in architecture. It helped us track historical changes made to clients' performance and provided a holistic view of all clients in a single portal and at any time.

I worked on creating the whole data warehouse from scratch, including developing all data pipelines and dimensional modeling.

Data Pull from Dynamics 365 Using Azure Logic Apps

A data integration pipeline that pulls data from certain data entities in Microsoft Dynamics 365 into our supply chain meta-model at Seeloz. I developed this data integration pipeline in Azure Logic Apps to fetch data from data entities and load them into Azure Blob Storage, which could later be used in ETL written at our end. All the communication was done using Azure Service Bus. The app was triggered using the HTTP POST request, and the required arguments were passed using the JSON payload. All the error handling and logging were also implemented adequately at each step.

Skills

Languages

SQL, Python, SQL DML, T-SQL (Transact-SQL), Stored Procedure, XML, HTML, CSS, Python 3, Scala

Libraries/APIs

NumPy, REST APIs, Pandas, PySpark, Flask-RESTful, SQLAlchemy

Tools

Query Plan, MySQL Workbench, Talend ETL, Visual Studio, GitHub, Tableau, Microsoft Power BI, BigQuery, Apache Airflow, Grafana, Spark SQL, Azure Logic Apps, PyCharm, IntelliJ IDEA, Shell, GitLab CI/CD, Celery

Paradigms

ETL, Database Design, Dimensional Modeling, REST, Business Intelligence (BI), Agile, Data Science

Storage

MySQL, Data Pipelines, Data Integration, SQL Stored Procedures, Databases, RDBMS, Microsoft SQL Server, SQL Performance, Relational Databases, Database Modeling, MariaDB, Database Architecture, NoSQL, SQL Server Integration Services (SSIS), Apache Hive, PostgreSQL, SQL Server Management Studio (SSMS), Azure SQL, Azure SQL Databases, HDFS, MongoDB, Google Cloud, Alibaba Cloud, Azure Blobs, JSON, Redis

Other

Data Warehousing, Quality Management, Data Warehouse Design, Slowly Changing Dimensions (SCD), Data Engineering, Query Optimization, Data Quality Analysis, Data Queries, ETL Tools, Data Analytics, Database Analytics, Data Processing, Business Intelligence (BI) Platforms, Performance Tuning, Data Modeling, Business Logic, Data Architecture, Logical Database Design, Database Schema Design, Relational Database Design, ELT, Schemas, Relational Data Mapping, Reporting, BI Reporting, AnyDesk, Data Migration, Database Optimization, Data Extraction, CSV Export, CSV, Data Management, Warehouses, Analytics, English, Data Visualization, Google BigQuery, Big Data, Data Analysis, Big Data Architecture, Analysis, Cloud Infrastructure, CI/CD Pipelines, API Integration, Reports, BI Reports, Dashboards, Dashboard Development, APIs, DuckDB, Scripting, Cloud, Azure Data Factory, Data Build Tool (dbt), Azure Databricks, Meltano, Prefect, Prometheus, Node Exporter, MinIO, S3 Buckets, Data Structures, OOP Designs, Blink SQL, Data, Data Governance, Azure Service Bus, Microsoft Azure, Apache Superset, Workflow Automation, Hopworks, Feature Engineering

Frameworks

Windows PowerShell, Hadoop, .NET, Flask, Apache Spark, Spark

Platforms

Talend, Apache Flink, Azure, Azure SQL Data Warehouse, Jupyter Notebook, Docker, Dedicated SQL Pool (formerly SQL DW), Databricks, Amazon Web Services (AWS), Airbyte

Education

2018 - 2021

Master's Degree in Computer Science

National University of Computer and Emerging Sciences - Karachi, Pakistan

2013 - 2017

Bachelor's Degree in Computer Science

National University of Computer and Emerging Sciences - Karachi, Pakistan

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring