Muhammad is available for hire

Muhammad Naeem Ahmed

Verified Expert in Engineering

Data Engineer and Software Developer

Location

San Jose, CA, United States

Toptal Member Since

June 18, 2020

Muhammad brings nearly 15 years of IT experience in data warehousing solution implementation. He delivers reliable, maintainable, and efficient code using SQL, Python, Perl, Unix, C/C++, and Java. His work helped eBay increase its revenue and Walmart improve processes. Muhammad has a strong focus on big data-related technologies, automating redundant tasks to improve workflow, and understanding how to achieve exciting, efficient, and profitable client solutions.

Data Warehousing Data Engineering Data Warehouse Design Data Migration Data Analysis Data Analytics Data Modeling Data Queries Analytics Big Data Architecture Cloud Migration Full-stack Web Scraping Data Visualization Distributed Systems Sqoop MapR Windows Powershell HDP

Portfolio

Stout Technologies

Apache Hive, Python 3, Unidash, GitHub, Spark, Microsoft SQL Server...

Overjet Inc.

SQL, Python, MongoDB, BigQuery, Data Warehousing, ETL, Data Lakes...

Walmart Labs

Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehousing...

Experience

SQL - 15 years Python 3 - 10 years Teradata - 10 years Unix Shell Scripting - 9 years Java - 6 years Hadoop - 6 years Apache Hive - 6 years Spark - 4 years

Availability

Full-time

Preferred Environment

Snowflake, Teradata SQL Assistant, DBeaver, Presto, PyCharm, Google Cloud Platform (GCP), MongoDB, Terraform, Google Kubernetes Engine (GKE), Data Build Tool (dbt)

The most amazing...

...project I developed was converting buyers into sellers at eBay as part of a Hackathon project. This effort turned out to be an overall 0.1% revenue booster.

Work Experience

Senior Data Engineer

2020 - PRESENT

Stout Technologies

Created dbt models and workflows that formed a DAG for a luxury re-selling company's consignments and listings data.
Optimized production SQL for throughput quality.
Developed queries and built dashboards for business-critical video attributes.
Ran and monitored dbt commands such as run and test, performing complex joins, aggregations, and calculations.

Technologies: Apache Hive, Python 3, Unidash, GitHub, Spark, Microsoft SQL Server, Unstructured Data Analysis, Image Processing, Data Science, T-SQL (Transact-SQL), SQL DML, Data Queries, SQL Performance, Java, Analytics, Reports, Scripting, Automation, MySQL, Data Architecture, Microsoft Access, Inventory Management, Terraform, Google Cloud, Google Cloud Functions, Data Loss Prevention (DLP), NoSQL, Database Migration, Cloud Firestore, Azure, ETL Implementation & Design, ETL Development, ETL Testing, Databricks, Data Analysis, Google Data Studio, Microsoft Excel, APIs, REST APIs, Linux, Apache Kafka, Amazon S3 (AWS S3), AWS Lambda, Scala, Big Data Architecture, Message Queues, Amazon Elastic MapReduce (EMR), Amazon EC2, Oracle Application Express (APEX), Google BigQuery, Data Lakes, Azure Data Lake, Database Optimization, Database Administration (DBA), pgAdmin, Amazon EKS, Elastic, Tableau, SAP, OCR, Google Bigtable, Artificial Intelligence (AI), Airtable, Make, Snowflake, Dashboards, DAX, Query Optimization, Distributed Systems, Systems Monitoring, Cloud Platforms, Kibana, New Relic, Ruby on Rails (RoR), Microsoft, User Permissions, Data Build Tool (dbt), Data Management, Warehouses, ELT, Azure Data Lake Analytics, Modeling, BI Reporting, Amazon Aurora, Amazon CloudWatch, Redshift, Orchestration, Google Analytics, Web Analytics, Clickstream, Social Media Web Traffic, Advertising Technology (Adtech), Digital Marketing, Leadership, Risk Management, Critical Thinking, Insurance, JSON, XML, HIPAA Compliance, Dashboard Development, Reporting, Apache Pig, Apache Storm, Transportation & Logistics, Database Testing, QGIS, Azure Cosmos DB, Surveys, SurveyMonkey, Dashboard Design, Power BI Desktop, Survey Development & Analysis, Data Processing, CSV Export, Amazon Textract, PDF Scraping, Metabase, Pipelines, Azure Active Directory, HubSpot, SQL Server DBA, Cloud Migration, Data Migration, Data Strategy, Talend ETL, Advisory, Consulting, Financial Services, Technical Leadership, Infrastructure, Data-level Security, Full-stack

Data Analyst/Engineer

2023 - 2023

Overjet Inc.

Implemented MongoDB to BigQuery migration using pub/sub, Dataflow jobs, and multi-threaded Python code for deep learning, enabling insightful data analysis and complex transformations.
Developed efficient Python code for MongoDB aggregation pipelines and created business reports during the migration to BigQuery.
Developed Looker Explores and dashboards for business analysis, slicing and dicing by important dimensions.

Technologies: SQL, Python, MongoDB, BigQuery, Data Warehousing, ETL, Data Lakes, User Permissions, Data Management, Warehouses, ELT, Kubeflow, Dask, Prefect, Timescale, Financial Planning & Analysis (FP&A), Financial Modeling, Modeling, Amazon Web Services (AWS), Amazon Aurora, Amazon CloudWatch, Redshift, Orchestration, Digital Marketing, Leadership, Risk Management, Critical Thinking, Insurance, JSON, MIPS, XML, Fast Healthcare Interoperability Resources (FHIR), C#, HIPAA Compliance, Dashboard Development, Reporting, Apache Pig, Apache Storm, Data Integrity Testing, Database Testing, QGIS, Data Validation, Azure Cosmos DB, SurveyMonkey, Dashboard Design, Power BI Desktop, Survey Development & Analysis, Data Processing, CSV Export, Natural Language Processing (NLP), Cloud Dataflow, Master Data Management (MDM), Pipelines, Apache Beam, Azure Active Directory, HubSpot, Cloud Migration, Data Migration, Data Strategy, Talend ETL, Advisory, Consulting, Data Integration, Financial Services, Technical Leadership, Infrastructure, Data-level Security, Full-stack

Senior Data Engineer

2018 - 2021

Walmart Labs

Architected, developed, and supported new features in the project’s data flow that calculated cumulative/daily metrics such as converted visitors and first-time buyers on the home and search pages.
Analyzed Hive sensor- and beacon-parsed data for ad-hoc analysis of user behavior.
Automated the current ETL pipeline through Python to build SQL on the fly into Hive map columns. Reduced the development cycle of 2-3 weeks for each new feature.
Wrote Hive UDF to replace the use of R to calculate p-value in the Hive pipeline. Supported existing processes and tools, mentored fellow engineers, and triaged data issues in a timely resolution.
Participated in the effort to migrate on-premise jobs to the GCP cloud.

Technologies: Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehousing, Data Warehouse Design, Databases, Kubernetes, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Google Cloud Composer, Looker, Azure, Azure Databricks, Windows PowerShell, Business Intelligence (BI), Azure SQL, Microsoft Power BI, Big Data, Machine Learning, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Azure DevOps, Data Visualization, Data Analytics, T-SQL (Transact-SQL), SQL DML, Data Queries, SQL Performance, Performance Tuning, Analytics, Reports, Scripting, Automation, MySQL, Data Architecture, Inventory Management, Terraform, Google Cloud, Google Cloud Functions, Data Loss Prevention (DLP), NoSQL, Database Migration, ETL Implementation & Design, ETL Development, ETL Testing, Databricks, Data Analysis, Google Data Studio, Microsoft Excel, APIs, GitHub, REST APIs, Linux, Apache Kafka, Amazon S3 (AWS S3), AWS Lambda, Scala, Big Data Architecture, Message Queues, Amazon Elastic MapReduce (EMR), Amazon EC2, Oracle Application Express (APEX), Google BigQuery, Microsoft Fabric, Data Lakes, Azure Data Lake, Database Optimization, Database Administration (DBA), pgAdmin, Amazon EKS, Elastic, Tableau, SAP, OCR, Data Science, Google Bigtable, Snowflake, Dashboards, Query Optimization, Distributed Systems, Kibana, New Relic, Ruby on Rails (RoR), Microsoft, User Permissions, ELT, SQL Server Integration Services (SSIS), Alteryx, Modeling, Amazon Aurora, Amazon CloudWatch, Clickstream, Leadership, Risk Management, Critical Thinking, Apache Pig, Perl, R, Data Integrity Testing, QGIS, Data Validation, Data Processing, Cloud Migration, Data Migration, Consulting, Data Integration, Financial Services, Technical Leadership, Infrastructure, Data-level Security, Full-stack

Senior Software Engineer

2012 - 2018

ebay

Converted Teradata SQL to Spark SQL for a migration project. Developed Regex-related string processing UDFs for Spark.
Wrote Pig, Hive, and Map Reduce jobs on user behavior clickstream data. Automated Unix scripts through crontabs to run analyses, such as first-time buyer count and conversion metrics on listings data.
Prepared data for predictive and prescriptive modeling.
Built tools and custom wrapper scripts, using Python to automate DistCp Hadoop commands and logs processing.
Developed and supported ETL jobs into production. The jobs entailed both Teradata and Hadoop scripts.

Technologies: Teradata, Presto, Apache Hive, Spark, Hadoop, Python, Databases, Data Warehousing, Data Warehouse Design, Amazon Web Services (AWS), Docker, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Unix Shell Scripting, PySpark, Tableau, Microsoft Power BI, Business Intelligence (BI), Big Data, Machine Learning, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Azure DevOps, Data Visualization, Data Analytics, Image Processing, Data Science, SQL DML, Data Queries, SQL Performance, Performance Tuning, Analytics, Reports, Automation, MySQL, Data Architecture, Inventory Management, Google Cloud Functions, NoSQL, ETL Implementation & Design, ETL Development, ETL Testing, Microsoft Excel, APIs, GitHub, REST APIs, Linux, Apache Kafka, Scala, Message Queues, Amazon Elastic MapReduce (EMR), Amazon EC2, Google BigQuery, Data Lakes, Database Optimization, Database Administration (DBA), pgAdmin, Elastic, Query Optimization, Microsoft, SQL Server Integration Services (SSIS), Modeling, Clickstream, Critical Thinking, Apache Pig, Oozie, Perl, Transportation & Shipping, Data Processing, Consulting, Technical Leadership, Infrastructure

Database Analyst

2008 - 2012

PeakPoint Technologies

Data modeled and mapped, developed, and deployed ETL code. Wrote advanced Teradata SQL.
Developed extended stored procedures, DB-link, packages, and parameterized dynamic PL/SQL to migrate the schema objects per business requirements.
Designed a logical data model and implemented it to a physical data model.
Developed and placed into production automated ETL jobs scheduled in the UC4 tool.

Technologies: Python, Teradata, SQL, T-SQL (Transact-SQL), PL/SQL, Databases, Data Warehouse Design, Data Warehousing, Data, Data Engineering, Data Modeling, Data Pipelines, Relational Databases, Dimensional Modeling, DevOps, ETL, Apache Spark, Business Intelligence (BI), Microsoft SQL Server, Data Visualization, Data Analytics, SQL DML, SQL Performance, Performance Tuning, Analytics, Automation, MySQL, Inventory Management, ETL Implementation & Design, ETL Development, ETL Testing, Microsoft Excel, Scala, SQL Server Integration Services (SSIS), GIS, Master Data Management (MDM)

Experience

Teradata SQL to Spark SQL Migration Project

Involved in a detailed analysis of SQL and jobs written in Teradata SQL. The requirement was to convert the whole communications logical data model (CLDM) related ETL pipeline to Spark. The pipeline had around 200 final tables and around 150 jobs. Wrote many UDFs for handling Regex-related calculations in Spark, which were seamlessly handled by inbuilt Teradata functions. Since Spark lacked those functions, I wrote UDFs to handle those cases.

Experimentation ETL Code Refactor

This Hive ETL SQL was generated dynamically using Python to read from a YAML configuration file, where aggregations could be defined in the YAML. This was a huge win because it removed the need for making SQL changes and re-testing before every release.

Converting Buyers Into Sellers Through Purchase History

As part of a Hackathon, I developed a code prototype at eBay for converting buyers into sellers. Based on purchase history and shelf life of items bought, buyers would be sent recommendations to sell the purchased items at the depreciated cost. The project was built in Teradata using user sessions and event/transaction-level information to derive recommendations. This effort increased the overall revenue by 0.1% and was considered a huge success.

Python Wrapper for Hadoop Administrative Commands

Wrote a very detailed and complex Python Wrapper code on top of Hadoop commands to secure data in case of unintentional slip-ups. The project was a huge success and became a robust depository of code that mitigated a lot of pain points as more and more functionality was added.

Senior Data Engineer

Developed Facebook videos pipelines ETL. Created Diffs in Phabricator to be deployed for adding/removing video attributes in production. Created deltoid metrics in MDF. Analyzed data and published reports in Unidash.

Facebook Watch Data Pipeline Engineer

Built important metrics for videos such as genres, trending videos, songs, and movies. I also coded in Python and SQL (GitHub for code versioning), powered the ETL pipeline, added new features, and tuned the performance of previously written SQL.

Senior Developer

Key Responsibilities:
• Researching various crypto bots available in the market and their technical features-trading strategies.
• Developing Python Codebase to implement effective crypto bot strategies, taking into account fear and greed, on-chain analysis of whale activity, etc. Writing auto-buy, sell, and portfolio balancing code.
• Deep Diving on crawled web data regarding crypto news.

Skills

Languages

Python, T-SQL (Transact-SQL), Snowflake, Python 3, SQL, Java, R, Bash Script, SQL DML, Scala, MIPS, XML, C#, Perl, JavaScript, GraphQL, C++

Frameworks

Apache Spark, Presto, Spark, Hadoop, Windows PowerShell, Ruby on Rails (RoR)

Libraries/APIs

PySpark, REST APIs, Dask

Tools

Microsoft Power BI, PyCharm, Teradata SQL Assistant, Erwin, Apache Sqoop, Flume, Oozie, Tableau, BigQuery, GitHub, Microsoft Access, Terraform, Microsoft Excel, Amazon Elastic MapReduce (EMR), Oracle Application Express (APEX), pgAdmin, Amazon EKS, Elastic, Make, Kibana, Amazon CloudWatch, Google Analytics, Apache Storm, GIS, SurveyMonkey, Power BI Desktop, Amazon Textract, Cloud Dataflow, Apache Beam, Talend ETL, Apache Airflow, Google Cloud Composer, Looker, Google Kubernetes Engine (GKE)

Paradigms

ETL, Database Design, ETL Implementation & Design, Business Intelligence (BI), MapReduce, DevOps, Azure DevOps, Data Science, Automation, Fast Healthcare Interoperability Resources (FHIR), HIPAA Compliance, Dimensional Modeling

Platforms

Azure, Unix, Hortonworks Data Platform (HDP), Apache Pig, Apache Kafka, Amazon Web Services (AWS), Docker, Kubernetes, Google Cloud Platform (GCP), Databricks, Linux, AWS Lambda, Amazon EC2, New Relic, Microsoft, Kubeflow, Alteryx, HubSpot, MapR, Microsoft Fabric

Storage

MySQL, Databases, NoSQL, DBeaver, PL/SQL, Data Pipelines, Amazon DynamoDB, Database Architecture, Database Modeling, Apache Hive, Elasticsearch, Teradata, SQL Server 2014, PostgreSQL, Oracle PL/SQL, Azure SQL, Microsoft SQL Server, SQL Performance, MongoDB, Google Cloud, Database Migration, Cloud Firestore, Data Lakes, Database Administration (DBA), Google Bigtable, SQL Server Integration Services (SSIS), Amazon Aurora, Redshift, JSON, Database Testing, Data Validation, Azure Cosmos DB, Master Data Management (MDM), Azure Active Directory, SQL Server DBA, Data Integration, Amazon S3 (AWS S3), Oracle 11g, Relational Databases, Teradata Databases

Industry Expertise

Insurance

Other

Data Modeling, Data Warehousing, Data Analysis, Data Architecture, ETL Tools, Data Engineering, APIs, Machine Learning, Big Data, ETL Development, Data Warehouse Design, Unix Shell Scripting, Customer Data, Data, Web Scraping, Azure Databricks, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Data Visualization, Data Analytics, Image Processing, Data Queries, Performance Tuning, Analytics, Reports, Scripting, Inventory Management, Google Cloud Functions, Data Loss Prevention (DLP), ETL Testing, Google Data Studio, Big Data Architecture, Message Queues, Google BigQuery, Azure Data Lake, Database Optimization, SAP, OCR, Artificial Intelligence (AI), Airtable, Dashboards, Query Optimization, Distributed Systems, Systems Monitoring, Cloud Platforms, User Permissions, Data Build Tool (dbt), Data Management, Warehouses, ELT, Prefect, Timescale, Financial Planning & Analysis (FP&A), Azure Data Lake Analytics, Financial Modeling, Modeling, BI Reporting, Orchestration, Web Analytics, Clickstream, Social Media Web Traffic, Advertising Technology (Adtech), Digital Marketing, Leadership, Risk Management, Critical Thinking, Dashboard Development, Reporting, Transportation & Logistics, Transportation & Shipping, Data Integrity Testing, QGIS, Surveys, Dashboard Design, Survey Development & Analysis, Data Processing, CSV Export, Natural Language Processing (NLP), PDF Scraping, Metabase, Pipelines, Cloud Migration, Data Migration, Data Strategy, Advisory, Consulting, Financial Services, Technical Leadership, Infrastructure, Data-level Security, Full-stack, Microsoft Azure, DAX, Unidash, Teradata DBA

Education

2001 - 2005

Bachelor's Degree in Computer Science

FAST National University - Islamabad, Pakistan

Certifications

NOVEMBER 2005 - PRESENT

Teradata Certified Master V2R5

Teradata

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring