Nikhil Gupta, Developer in Mumbai, Maharashtra, India
Nikhil is available for hire
Hire Nikhil

Nikhil Gupta

Verified Expert  in Engineering

Database Developer

Location
Mumbai, Maharashtra, India
Toptal Member Since
October 12, 2022

Nikhil is a senior data engineer with over four years of experience, capable of grasping new concepts quickly. He builds highly scalable data-intensive applications, comprehends various applications and tech stacks, and manages clients and stakeholders across the hierarchy. With his technical depth and presentation skills, Nikhil's most striking quality is his commitment to providing high-quality solutions.

Portfolio

PepsiCo Global - Main
Data Analysis, SQL, Snowflake, Python, Data Management Platforms, NoSQL
Millicom International Cellular SA - Main
Data Engineering, Amazon Web Services (AWS), Big Data, AWS Lambda, Spark...
Zepto
Python 3, Python, SQL, Debezium, CDC, Change Data Capture, Apache Kafka...

Experience

Availability

Part-time

Preferred Environment

DevOps, Data Engineering, Kubernetes, Data Management Platforms

The most amazing...

...thing I've built is a BI product that translates data into insights with written explanations, scaling from 0 to 12 clients in less than two years.

Work Experience

Data Analyst

2024 - PRESENT
PepsiCo Global - Main
  • Carried out reporting for auditing PepsiCo offers with Fetch Rewards. Automated the entire process on Snowflake.
  • Carried out significance testing among different user segments for various campaign advertisements.
  • Made dashboards on ThoughtSpot for high-level visibility over Fetch Rewards and the money PepsiCo and different brands spent on different types of campaigns. Did campaign analytics on that.
Technologies: Data Analysis, SQL, Snowflake, Python, Data Management Platforms, NoSQL

Data Engineers

2023 - PRESENT
Millicom International Cellular SA - Main
  • Developed Millicom's entire data governance and security framework in a Data Mesh Architecture.
  • Streamlined access requests across 80+ producer and consumer AWS accounts and empowered country teams to develop their own solutions while maintaining the central framework.
  • Maintained a gold standard in data sharing using AWS Lake Formation. Designed and implemented a business data catalog using AWS data zone. Automated the deployment using Terraform and Python scripts.
Technologies: Data Engineering, Amazon Web Services (AWS), Big Data, AWS Lambda, Spark, AWS Glue, Amazon S3 (AWS S3), Apache Kafka, SQL, Python, Scala, Big Data Architecture, Data Transformation, Message Queues, Relational Databases, Data Pipelines, Amazon EC2, Amazon Athena, Amazon Elastic MapReduce (EMR), Amazon RDS, Redshift, Warehouses, Data Build Tool (dbt), Data Management Platforms, NoSQL

Data Engineer II

2023 - 2023
Zepto
  • Designed and implemented an event-driven pipeline that gets triggered based on file upload.
  • Implemented an end-to-end change data capture (CDC) pipeline capturing data from source tables in real time.
  • Optimized the existing Amazon Redshift cluster for better performance and to prevent frequent shutdowns.
  • Architected the FMCG Dynamic Pricing Engine. Designed the data flow to automate the price changes in FMCG items on the Zepto app. The project's impact is estimated to be around 0.2 million INR, given the current revenue leakage will be minimized.
  • Led the end-to-end development of an in-house streaming pipeline and achieved an SLA of 10 seconds, transmitting 100MB per second end-to-end. The pipeline involved Debezium, Kafka, Kafka Connect, PostgreSQL, ClickHouse, and Apache Pinot.
Technologies: Python 3, Python, SQL, Debezium, CDC, Change Data Capture, Apache Kafka, Back-end Development, Data Manipulation, Dashboards, Reports, Information Visualization, Data Warehousing, Data Warehouse Design, Data, Looker, AWS Glue, AWS Lambda, Amazon RDS, Big Data Architecture, Data Transformation, Message Queues, ETL Development, Database Optimization, Database Architecture, Data Architecture, Warehouses, Kubernetes, Data Management Platforms, NoSQL

Senior Data Engineer

2022 - 2023
Xpressbees
  • Designed an in-house scheduling framework that reduced Amazon MWAA costs by 50% by leveraging Snowflake tasks.
  • Reduced engineering time and effort with the framework that served as a self-service scheduling for the analytics and client MIS teams to schedule custom SQL and stored procedures.
  • Oversaw the design and implementation of this framework end to end.
Technologies: Apache Kafka, Apache Airflow, Python 3, SQL, Snowflake, Query Optimization, Data Engineering, ETL Tools, Web Scraping, Data Manipulation, Dashboards, Reports, Information Visualization, Data Warehousing, Data Warehouse Design, Data, Looker, AWS Glue, AWS Lambda, Amazon RDS, Big Data Architecture, Data Transformation, Message Queues, ETL Development, Database Optimization, Database Architecture, Data Architecture, Warehouses, Kubernetes, Data Management Platforms, NoSQL

Data Engineer

2022 - 2022
PepsiCo
  • Built the entire data pipeline for their billing dashboard that helped PepsiCo track costs across different cloud vendors and services.
  • Collected costing and tagging data from AWS, Azure, Snowflake, and Datadog and streamlined it into our final data model. We built our presentation layer using ThoughtSpot.
  • Created GitHub Actions for different environments that built a Docker image—that packaged all the data build tool (dbt) models—and pushed it to the ECR repo.
  • Wrote an Airflow DAG that ran this Docker image on a cadence to execute the dbt models in production. Wrote data transformation logic using dbt.
Technologies: Python, Snowflake, Apache Airflow, Terraform, GitHub, GitHub API, Continuous Delivery (CD), Continuous Integration (CI), DevOps, Amazon Web Services (AWS), Datadog, APIs, Microsoft Power BI, Data Visualization, Business Intelligence (BI), Azure, Data Analysis, Database Analytics, Docker, CI/CD Pipelines, Query Optimization, Data Engineering, ETL Tools, Back-end Development, Data Manipulation, Dashboards, Reports, Information Visualization, Data Warehousing, Pandas, Data Warehouse Design, Data, Amazon RDS, Big Data Architecture, Data Transformation, Message Queues, ETL Development, Database Architecture, Data Architecture, Warehouses, Data Build Tool (dbt), Data Management Platforms, NoSQL

Senior Data Engineer

2021 - 2022
Xpressbees
  • Wrote data transformations using SQL to onboard tables from new data sources onto the data platform.
  • Built directed acyclic graph (DAG) scripts to schedule data loads hourly from original databases, Postgres, MySQL, and MongoDB, into analytical layer tables for the analytics data warehouse.
  • Created Kafka connector Debezium configuration files for setting up the change data capture (CDC) from source databases into the data lake.
  • Conducted code reviews and mentored junior data engineers in the team.
Technologies: Python, Python 3, Snowflake, CDC, SQL, Apache Airflow, PostgreSQL, MongoDB, MySQL, Data Warehousing, Data Warehouse Design, Pipelines, Data Pipelines, Amazon Web Services (AWS), Data Cleaning, Data Lakes, Big Data, ETL, Data Modeling, Dimensional Modeling, Data Extraction, DB, ELT, Databases, Oracle, Serverless, Relational Databases, Full-stack, Data Visualization, Business Intelligence (BI), Data Analysis, Database Analytics, Docker, Query Optimization, Data Engineering, ETL Tools, Web Scraping, Apache Kafka, Back-end Development, Data Manipulation, Dashboards, Reports, Information Visualization, Pandas, Data, AWS Glue, AWS Lambda, Amazon RDS, Big Data Architecture, Data Transformation, Message Queues, ETL Development, Database Architecture, Data Architecture, Warehouses, Data Build Tool (dbt), Kubernetes, Data Management Platforms, NoSQL

Senior Data Engineer

2020 - 2021
vPhrase
  • Ingested approximately 80 GB of daily Phrazor product and plugin usage data into Amazon S3 data lake from multiple client and in-house servers.
  • Imported the data from the data lake into the Snowflake data warehouse for transformations and analytics.
  • Composed ingestion and transformation scripts in SQL to load data from raw and staging tables into the analytical layer tables, which were eventually used for analytics by the product manager and CTO.
  • Wrote the Airflow directed acyclic graph (DAG) scripts using Python and orchestrated the entire pipeline.
Technologies: Python, Snowflake, Amazon S3 (AWS S3), Apache Airflow, Amazon EC2, Data Analytics, Pipelines, Data Pipelines, ETL, SQL, Data Modeling, Terraform, Dimensional Modeling, Data Extraction, DB, ELT, Databases, Oracle, Serverless, Relational Databases, MySQL, Full-stack, Microsoft Power BI, Data Visualization, Business Intelligence (BI), Data Analysis, Database Analytics, Apache Spark, Docker, Query Optimization, Data Engineering, ETL Tools, Back-end Development, Data Manipulation, Dashboards, Reports, Information Visualization, Pandas, Data Warehouse Design, Data, Amazon RDS, Big Data Architecture, Data Transformation, ETL Development, Data Architecture, Warehouses, Kubernetes, Data Management Platforms

Data Engineer

2019 - 2020
vPhrase
  • Designed the end-to-end ETL pipeline for a financial client to power their stocks and mutual fund recommendation algorithm.
  • Ingested data from 3rd-party vendor databases using Debezium, Kafka, and Kafka Connect for the CDC into the Amazon S3 (AWS S3) data lake.
  • Wrote the cleaning, transformation, and data processing scripts using Python and Spark to calculate around 100-150 financial KPIs.
  • Orchestrated the entire ETL pipeline using Airflow running on an Amazon EC2 instance.
Technologies: Spark, Amazon Web Services (AWS), Data Lakes, Data Warehousing, Python, Python 3, SQL, ETL, Data Pipelines, Pipelines, Data Modeling, Terraform, Dimensional Modeling, Tableau, Data Extraction, DB, ELT, Databases, Oracle, Relational Databases, Data Visualization, Business Intelligence (BI), Data Analysis, Database Analytics, Apache Spark, Data Engineering, ETL Tools, Web Scraping, Data Manipulation, Reports, Data, Big Data Architecture, Data Transformation, ETL Development, Data Architecture, Warehouses

Data Engineer

2017 - 2018
vPhrase
  • Built a BI software called Phrazor from scratch. It went from zero to 12 full-time clients, including over 200 licenses in three years.
  • Modeled and designed the product's back-end data model and knowledge base, powering analytics on the user's reports and dashboards.
  • Led the design and implementation of formulae using Spark and Pandas. It handled crunching data to calculate industry-specific KPIs for the user's reports.
  • Designed a multi-level drill-down feature to diagnose sudden drops or growths in KPIs.
  • Created and maintained unit tests to cover 90% of the codebase.
Technologies: Python, Spark SQL, Spark, Apache Airflow, Database Design, Database Modeling, Database Schema Design, Data Modeling, Dimensional Modeling, Consumer Packaged Goods (CPG), Tableau, Data Extraction, DB, ELT, Microsoft Power BI, Data Visualization, Business Intelligence (BI), Data Analysis, Database Analytics, Docker, Data Engineering, Beautiful Soup, Data, Data Transformation, Warehouses

ETL Pipeline for Stocks and Mutual Funds Recommendation System

SCOPE
Built an end-to-end ETL pipeline that supplied data to a stocks and mutual funds recommendation algorithm for a leading trading firm in India.

DATA SOURCES
The project required historical data from that client, data from the client's third-party vendors, and data from various APIs.

TECH STACK AND OVERVIEW
Data ingestion using Debezium, Kafka, and Kafka Connect for the CDC from vendor databases into the S3 data lake. data cleaning/data transformation using Python + Spark, and data processing, which involved calculating around 100-150 financial KPIs on top of the ingested data. The data cleaning and processing pipeline was orchestrated using Apache Airflow running on an EC2 instance.

Phrazor Product Usage Analytics Pipeline

SCOPE
Built an end-to-end ETL pipeline to get in Click Stream data from clients and internal servers for analytics.

DATA SOURCES
Ingested approximately 80-100GB of daily Phrazor product and Phrazor plugin usage data into AWS S3 Data Lake from multiple client and in-house servers.

OVERVIEW
Cleaned and transformed the raw data and transferred that from S3 to Snowflake Data Warehouse for further analytics. The entire pipeline was orchestrated using Apache Airflow and running on an AWS EC2 instance.

IMPACT
This pipeline helped the product managers make smarter product decisions, run A/B tests, and analyze how users use the platform. This layer also supplied clean data to the data scientists for advanced analytics.

IEEE-CIS Fraud Detection Kaggle Competition

A project that achieved the top 7% bronze medal out of over 6,700 teams across the globe.

Given data for credit card transactions, the solution was supposed to identify fraudulent transactions given data about each transaction. It was a classic example of highly skewed data, with the positive class being less than 1% of the entire data.

Tested and employed data imbalance handling techniques and finally went ahead with hard negative mining. My team and I engineered many features that helped us reach that top 7%.

Languages

Python 3, SQL, Snowflake, Python, Scala

Frameworks

Apache Spark, Spark

Libraries/APIs

Pandas, NumPy, Beautiful Soup, PySpark, Amazon EC2 API, GitHub API

Tools

Apache Airflow, Microsoft Power BI, Spark SQL, GitHub, Amazon Elastic MapReduce (EMR), Terraform, Tableau, Looker, Git, PyCharm, Sublime Text 3, AWS Glue, Amazon Athena

Paradigms

ETL, Business Intelligence (BI), Unit Testing, Data Science, Agile, Database Design, Dimensional Modeling, Continuous Integration (CI), Continuous Delivery (CD), DevOps

Platforms

Kubernetes, Apache Kafka, Amazon EC2, Amazon Web Services (AWS), Oracle, Docker, Linux, Ubuntu, Azure, AWS Lambda

Storage

PostgreSQL, MySQL, Databases, Relational Databases, Redshift, Data Lakes, Database Modeling, Data Pipelines, JSON, DB, Database Architecture, NoSQL, Amazon S3 (AWS S3), MongoDB, Azure SQL, Datadog

Other

Data Warehousing, ELT, Debezium, Big Data, Data Analytics, Data Engineering, Data Warehouse Design, Data Visualization, Data Extraction, Data Analysis, Database Analytics, ETL Tools, Data, Big Data Architecture, ETL Development, Data Architecture, Warehouses, Data Modeling, Database Schema Design, EMR, Data Cleaning, Data Processing, Star Schema, APIs, Pipelines, CI/CD Pipelines, Query Optimization, Web Scraping, Back-end Development, Dashboards, Reports, Information Visualization, Data Transformation, Message Queues, Database Optimization, Data Build Tool (dbt), CDC, Machine Learning, EDA, Parquet, Consumer Packaged Goods (CPG), Serverless, Full-stack, Change Data Capture, Real Estate, Data Manipulation, Amazon RDS, Data Management Platforms

2015 - 2019

Bachelor's Degree in Computer Science

Mumbai University - Mumbai, India

JANUARY 2018 - PRESENT

Data Science

GreyAtom School of Data Science

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring