Muhammad Naeem Ahmed, Developer in San Jose, United States
Muhammad is available for hire
Hire Muhammad

Muhammad Naeem Ahmed

Verified Expert  in Engineering

Unix Shell Scripting Developer

Location
San Jose, United States
Toptal Member Since
June 18, 2020

Muhammad brings nearly 15 years of IT experience in data warehousing solution implementation. He delivers reliable, maintainable, efficient code using SQL, Python, Perl, Unix, C/C++, and Java. His work helped eBay increase its revenue, and Walmart improve processes. Muhammad has a strong focus on big data-related technologies, automating redundant tasks to improve workflow, and understanding how to achieve exciting, efficient, and profitable client solutions.

Portfolio

Stout Technologies
Apache Hive, Python 3, Unidash, GitHub, Spark, Microsoft SQL Server...
Walmart Labs
Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehouse Design...
ebay
Teradata, Presto DB, Apache Hive, Spark, Hadoop, Python, Databases...

Experience

Availability

Part-time

Preferred Environment

Snowflake, Teradata SQL Assistant, DBeaver, Presto DB, PyCharm

The most amazing...

...project I've developed was converting buyers into sellers at eBay as part of a Hackathon project. This effort turned out to be an overall 0.1% revenue booster.

Work Experience

Senior Data Engineer

2020 - PRESENT
Stout Technologies
  • Managed Facebook videos pipeline, containing attribute data, such as genre, PG rating, trending, etc., through Python/SQL.
  • Optimized production SQL for throughput quality.
  • Developed queries and built dashboards for business-critical video attributes.
Technologies: Apache Hive, Python 3, Unidash, GitHub, Spark, Microsoft SQL Server, Unstructured Data Analysis, Image Processing, Data Science, T-SQL (Transact-SQL), SQL DML, Data Queries, SQL Performance, Java, Analytics, Reports, Scripting, Automation, MySQL, Data Architecture, Microsoft Access, Inventory Management

Senior Data Engineer

2018 - 2021
Walmart Labs
  • Architected, developed, and supported new features in the project’s data flow that calculated cumulative/daily metrics such as converted visitors and first-time buyers on the home and search pages.
  • Analyzed Hive sensor- and beacon-parsed data for ad-hoc analysis of user behavior.
  • Automated the current ETL pipeline through Python to build SQL on the fly into Hive map columns. Reduced the development cycle of 2-3 weeks for each new feature.
  • Wrote Hive UDF to replace the use of R to calculate p-value in the Hive pipeline. Supported existing processes and tools, mentored fellow engineers, and triaged data issues in a timely resolution.
  • Participated in the effort to migrate on-premise jobs to the GCP cloud.
Technologies: Unix, Spark, Apache Hive, MapReduce, Hadoop, SQL, Python, Data Warehouse Design, Data Warehousing, Databases, Kubernetes, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Google Cloud Composer, Looker, Azure, Azure Databricks, Windows PowerShell, Business Intelligence (BI), Azure SQL, Microsoft Power BI, Big Data, Machine Learning, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Azure DevOps, Data Visualization, Data Analytics, T-SQL (Transact-SQL), SQL DML, Data Queries, SQL Performance, Performance Tuning, Analytics, Reports, Scripting, Automation, MySQL, Data Architecture, Inventory Management

Senior Software Engineer

2012 - 2018
ebay
  • Converted Teradata SQL to Spark SQL for a migration project. Developed Regex-related string processing UDFs for Spark.
  • Wrote Pig, Hive, and Map Reduce jobs on user behavior clickstream data. Automated Unix scripts through crontabs to run analyses, such as first-time buyer count and conversion metrics on listings data.
  • Prepared data for predictive and prescriptive modeling.
  • Built tools and custom wrapper scripts, using Python to automate DistCp Hadoop commands and logs processing.
  • Developed and supported ETL jobs into production. The jobs entailed both Teradata and Hadoop scripts.
Technologies: Teradata, Presto DB, Apache Hive, Spark, Hadoop, Python, Databases, Data Warehousing, Data Warehouse Design, Amazon Web Services (AWS), Docker, Customer Data, Data, Data Engineering, Apache Airflow, Data Modeling, Data Pipelines, Web Scraping, Relational Databases, Dimensional Modeling, PostgreSQL, DevOps, Google Cloud Platform (GCP), Elasticsearch, ETL, Apache Spark, BigQuery, Unix Shell Scripting, PySpark, Tableau, Microsoft Power BI, Business Intelligence (BI), Big Data, Machine Learning, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Azure DevOps, Data Visualization, Data Analytics, Image Processing, Data Science, SQL DML, Data Queries, SQL Performance, Performance Tuning, Analytics, Reports, Automation, MySQL, Data Architecture, Inventory Management

Database Analyst

2008 - 2012
PeakPoint Technologies
  • Data modeled and mapped, developed, and deployed ETL code. Wrote advanced Teradata SQL.
  • Developed extended stored procedures, DB-link, packages, and parameterized dynamic PL/SQL to migrate the schema objects per business requirements.
  • Designed a logical data model and implemented it to a physical data model.
  • Developed and placed into production automated ETL jobs scheduled in the UC4 tool.
Technologies: Python, Teradata, SQL, T-SQL (Transact-SQL), PL/SQL, Databases, Data Warehousing, Data Warehouse Design, Data, Data Engineering, Data Modeling, Data Pipelines, Relational Databases, Dimensional Modeling, DevOps, ETL, Apache Spark, Business Intelligence (BI), Microsoft SQL Server, Data Visualization, Data Analytics, SQL DML, SQL Performance, Performance Tuning, Analytics, Automation, MySQL, Inventory Management

Teradata SQL to Spark SQL Migration Project

Involved in a detailed analysis of SQL and jobs written in Teradata SQL. The requirement was to convert the whole communications logical data model (CLDM) related ETL pipeline to Spark. The pipeline had around 200 final tables and around 150 jobs. Wrote many UDFs for handling Regex-related calculations in Spark, which were seamlessly handled by inbuilt Teradata functions. Since Spark lacked those functions, I wrote UDFs to handle those cases.

Experimentation ETL Code Refactor

This Hive ETL SQL was generated dynamically using Python to read from a YAML configuration file, where aggregations could be defined in the YAML. This was a huge win because it removed the need for making SQL changes and re-testing before every release.

Converting Buyers Into Sellers Through Purchase History

As part of a Hackathon, I developed a code prototype at eBay for converting buyers into sellers. Based on purchase history and shelf life of items bought, buyers would be sent recommendations to sell the purchased items at the depreciated cost. The project was built in Teradata using user sessions and event/transaction-level information to derive recommendations. This effort increased the overall revenue by 0.1% and was considered a huge success.

Python Wrapper for Hadoop Administrative Commands

Wrote a very detailed and complex Python Wrapper code on top of Hadoop commands to secure data in case of unintentional slip-ups. The project was a huge success and became a robust depository of code that mitigated a lot of pain points as more and more functionality was added.

Senior Data Engineer

Developed Facebook videos pipelines ETL. Created Diffs in Phabricator to be deployed for adding/removing video attributes in production. Created deltoid metrics in MDF. Analyzed data and published reports in Unidash.

Facebook Watch Data Pipeline Engineer

Built important metrics for videos such as genres, trending videos, songs, and movies. I also coded in Python and SQL (GitHub for code versioning), powered the ETL pipeline, added new features, and tuned the performance of previously written SQL.

Senior Developer

Key Responsibilities:
• Researching various crypto bots available in the market and their technical features-trading strategies.
• Developing Python Codebase to implement effective crypto bot strategies, taking into account fear and greed, on-chain analysis of whale activity, etc. Writing auto-buy, sell, and portfolio balancing code.
• Deep Diving on crawled web data regarding crypto news.

Languages

Python, T-SQL (Transact-SQL), Snowflake, Python 3, SQL, Bash Script, SQL DML, JavaScript, GraphQL, C++, Java, R

Frameworks

Apache Spark, Presto DB, Spark, Hadoop, Windows PowerShell

Libraries/APIs

PySpark

Tools

Microsoft Power BI, PyCharm, Teradata SQL Assistant, Erwin, Sqoop, Flume, BigQuery, Microsoft Access, Apache Airflow, Oozie, Tableau, Google Cloud Composer, Looker, GitHub

Paradigms

ETL, Database Design, ETL Implementation & Design, MapReduce, DevOps, Azure DevOps, Automation, Dimensional Modeling, Data Science, Business Intelligence (BI)

Platforms

Azure, Unix, Hortonworks Data Platform (HDP), Apache Pig, Apache Kafka, Amazon Web Services (AWS), Docker, Kubernetes, MapR, Google Cloud Platform (GCP)

Storage

MySQL, Databases, NoSQL, DBeaver, PL/SQL, Data Pipelines, Amazon DynamoDB, Database Architecture, Database Modeling, Apache Hive, Elasticsearch, Teradata, SQL Server 2014, Oracle PL/SQL, Azure SQL, Microsoft SQL Server, SQL Performance, Amazon S3 (AWS S3), PostgreSQL, Oracle 11g, Relational Databases

Other

Data Modeling, Data Warehousing, Data Analysis, Data Architecture, ETL Tools, Data Engineering, APIs, Machine Learning, Big Data, ETL Development, Data Warehouse Design, Unix Shell Scripting, Customer Data, Data, Web Scraping, Azure Databricks, Data Cleansing, Azure Data Factory, Unstructured Data Analysis, Data Visualization, Data Analytics, Image Processing, Data Queries, Performance Tuning, Analytics, Reports, Scripting, Inventory Management, Microsoft Azure, Unidash

2001 - 2005

Bachelor's Degree in Computer Science

FAST National University - Islamabad, Pakistan

NOVEMBER 2005 - PRESENT

Teradata Certified Master V2R5

Teradata