Elaaf Shuja, Developer in Berlin, Germany
Elaaf is available for hire
Hire Elaaf

Elaaf Shuja

Verified Expert  in Engineering

Data Engineer and Developer

Location
Berlin, Germany
Toptal Member Since
September 7, 2022

Elaaf is a seasoned data engineer who loves designing, building, and maintaining petabyte-scale data infrastructures. He is keen on working with on-premise, cloud, and hybrid data solutions, always striving for code quality, performance, and maintainability. With exceptional communication skills, Elaaf can contribute to challenging projects and help expand data-based businesses.

Portfolio

Delivery Hero
Data Engineering, Software Engineering, Python, Apache Airflow...
Keyrus
Python, SQL, Spark, Apache Airflow, Data Visualization, APIs, Data Scraping...
ADDO AI
Python, SQL, Azure, Google Cloud Platform (GCP), Kubernetes, Data Visualization...

Experience

Availability

Part-time

Preferred Environment

MacOS, Visual Studio Code (VS Code), Slack

The most amazing...

...product I've built is a custom data integration application using purely open-source technologies.

Work Experience

Senior Data Engineer

2022 - PRESENT
Delivery Hero
  • Acted as part of the global recommendations team, responsible for providing personalized restaurant/cuisine recommendations to users of 12+ sub-brands in 70+ countries.
  • Developed and productized the data pipelines and serving API for a new cuisine recommendation strategy which yielded a +6% uplift in CVR in the A/B test.
  • Reduced daily operational costs by 11% by optimizing Kubernetes node type/region, API code, GCP Dataflow pipelines, database resources, and Datadog logging.
  • Migrated our entire services stack and data pipelines from GCP East Asia to Southeast Asia region, reducing cost by switching to nd2 machine type and reducing intra-region latency for end-users.
  • Served as an on-call person for managing critical recommendation services across 11 clusters and five global regions.
Technologies: Data Engineering, Software Engineering, Python, Apache Airflow, Google Cloud Platform (GCP), SQL, PostgreSQL, Kubernetes, Terraform, FastAPI, APIs, Data Scraping, Data Analytics, Business Intelligence (BI), ETL, Redis, Azure SQL, Azure, Data Pipelines, Cloud

Senior Data Engineer

2021 - 2022
Keyrus
  • Led the design and development effort for a data integration platform using open-source technologies such as Airflow, Spark, and Airbyte.
  • Managed a petabyte-scale data warehouse for a retail company in the Middle East, spearheading data ingestion and modeling.
  • Developed a custom containerized Spark application to deploy to on-premise clusters.
Technologies: Python, SQL, Spark, Apache Airflow, Data Visualization, APIs, Data Scraping, Data Analytics, Business Intelligence (BI), ETL, Redis, Azure SQL, Azure, Azure Cosmos DB, Data Pipelines, Cloud, Consulting, Costs

Data Engineer

2018 - 2021
ADDO AI
  • Developed and performed unit, system integration, and user acceptance testing of ETL pipelines covering over 35 distinct business streams and 12 dimensions of varying load and frequency on the Apache Hive data lake.
  • Analyzed the existing Teradata SQL and its conversion to PySpark and Spark SQL with the data modeling team.
  • Optimized Spark jobs and identified the most appropriate scheduling triggers using shell scripts based on business requirements and fact dependencies.
  • Designed and implemented the strategy for the PII data masking and data movement of different business streams between raw, curated, and serving data lake layers.
Technologies: Python, SQL, Azure, Google Cloud Platform (GCP), Kubernetes, Data Visualization, APIs, Data Scraping, Data Analytics, Business Intelligence (BI), ETL, Redis, Data Pipelines, Cloud, Consulting, Costs

Custom Data Integration Tool

A custom tool built on top of open-source technologies such as Apache Spark, Apache Airflow, and Airbyte. It allows non-technical business users to perform complex data engineering tasks through a no-code GUI easily.

User Stance Detection on Twitter

https://github.com/elaaf/stance-detect
This repo can determine the stance of Twitter users regarding a divisive topic using unsupervised machine learning. I did Python implementation based on an NLP research paper and performed the following:
• Constructed feature vectors for each user (hashtags, retweeted accounts, unique tweets)
• Applied dimensionality reduction (t-SNE, UMAP)
• Clustered low-dimensional data (mean-shift clustering, DBSCAN)

Languages

Python, SQL

Frameworks

Spark

Tools

Apache Airflow, Terraform

Paradigms

ETL, Business Intelligence (BI)

Platforms

Azure, Google Cloud Platform (GCP), Kubernetes, Airbyte

Storage

Data Pipelines, Redis, Azure SQL, PostgreSQL, Azure Cosmos DB

Other

Software Engineering, Data Engineering, ETL Tools, APIs, Cloud, Machine Learning, Data Visualization, Data Scraping, Data Analytics, Consulting, Costs, FastAPI

2018 - 2020

Master's Degree in Computer Science

Information Technology University of the Punjab - Lahore, Punjab, Pakistan

2013 - 2017

Bachelor's Degree in Electrical Engineering

National University of Science and Technology - Islamabad, Pakistan

DECEMBER 2021 - DECEMBER 2022

Microsoft Azure Data Engineer Associate

Microsoft

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring