Verified Expert in Engineering
Nikhil is a senior data engineer with over four years of experience, capable of grasping new concepts quickly. He builds highly scalable data-intensive applications, comprehends various applications and tech stacks, and manages clients and stakeholders across the hierarchy. With his technical depth and presentation skills, Nikhil's most striking quality is his commitment to providing high-quality solutions.
DevOps, Data Engineering, Kubernetes, Data Management Platforms
The most amazing...
...thing I've built is a BI product that translates data into insights with written explanations, scaling from 0 to 12 clients in less than two years.
PepsiCo Global - Main
- Carried out reporting for auditing PepsiCo offers with Fetch Rewards. Automated the entire process on Snowflake.
- Carried out significance testing among different user segments for various campaign advertisements.
- Made dashboards on ThoughtSpot for high-level visibility over Fetch Rewards and the money PepsiCo and different brands spent on different types of campaigns. Did campaign analytics on that.
Millicom International Cellular SA - Main
- Developed Millicom's entire data governance and security framework in a Data Mesh Architecture.
- Streamlined access requests across 80+ producer and consumer AWS accounts and empowered country teams to develop their own solutions while maintaining the central framework.
- Maintained a gold standard in data sharing using AWS Lake Formation. Designed and implemented a business data catalog using AWS data zone. Automated the deployment using Terraform and Python scripts.
Data Engineer II
- Designed and implemented an event-driven pipeline that gets triggered based on file upload.
- Implemented an end-to-end change data capture (CDC) pipeline capturing data from source tables in real time.
- Optimized the existing Amazon Redshift cluster for better performance and to prevent frequent shutdowns.
- Architected the FMCG Dynamic Pricing Engine. Designed the data flow to automate the price changes in FMCG items on the Zepto app. The project's impact is estimated to be around 0.2 million INR, given the current revenue leakage will be minimized.
- Led the end-to-end development of an in-house streaming pipeline and achieved an SLA of 10 seconds, transmitting 100MB per second end-to-end. The pipeline involved Debezium, Kafka, Kafka Connect, PostgreSQL, ClickHouse, and Apache Pinot.
Senior Data Engineer
- Designed an in-house scheduling framework that reduced Amazon MWAA costs by 50% by leveraging Snowflake tasks.
- Reduced engineering time and effort with the framework that served as a self-service scheduling for the analytics and client MIS teams to schedule custom SQL and stored procedures.
- Oversaw the design and implementation of this framework end to end.
- Built the entire data pipeline for their billing dashboard that helped PepsiCo track costs across different cloud vendors and services.
- Collected costing and tagging data from AWS, Azure, Snowflake, and Datadog and streamlined it into our final data model. We built our presentation layer using ThoughtSpot.
- Created GitHub Actions for different environments that built a Docker image—that packaged all the data build tool (dbt) models—and pushed it to the ECR repo.
- Wrote an Airflow DAG that ran this Docker image on a cadence to execute the dbt models in production. Wrote data transformation logic using dbt.
Senior Data Engineer
- Wrote data transformations using SQL to onboard tables from new data sources onto the data platform.
- Built directed acyclic graph (DAG) scripts to schedule data loads hourly from original databases, Postgres, MySQL, and MongoDB, into analytical layer tables for the analytics data warehouse.
- Created Kafka connector Debezium configuration files for setting up the change data capture (CDC) from source databases into the data lake.
- Conducted code reviews and mentored junior data engineers in the team.
Senior Data Engineer
- Ingested approximately 80 GB of daily Phrazor product and plugin usage data into Amazon S3 data lake from multiple client and in-house servers.
- Imported the data from the data lake into the Snowflake data warehouse for transformations and analytics.
- Composed ingestion and transformation scripts in SQL to load data from raw and staging tables into the analytical layer tables, which were eventually used for analytics by the product manager and CTO.
- Wrote the Airflow directed acyclic graph (DAG) scripts using Python and orchestrated the entire pipeline.
- Designed the end-to-end ETL pipeline for a financial client to power their stocks and mutual fund recommendation algorithm.
- Ingested data from 3rd-party vendor databases using Debezium, Kafka, and Kafka Connect for the CDC into the Amazon S3 (AWS S3) data lake.
- Wrote the cleaning, transformation, and data processing scripts using Python and Spark to calculate around 100-150 financial KPIs.
- Orchestrated the entire ETL pipeline using Airflow running on an Amazon EC2 instance.
- Built a BI software called Phrazor from scratch. It went from zero to 12 full-time clients, including over 200 licenses in three years.
- Modeled and designed the product's back-end data model and knowledge base, powering analytics on the user's reports and dashboards.
- Led the design and implementation of formulae using Spark and Pandas. It handled crunching data to calculate industry-specific KPIs for the user's reports.
- Designed a multi-level drill-down feature to diagnose sudden drops or growths in KPIs.
- Created and maintained unit tests to cover 90% of the codebase.
ETL Pipeline for Stocks and Mutual Funds Recommendation System
Built an end-to-end ETL pipeline that supplied data to a stocks and mutual funds recommendation algorithm for a leading trading firm in India.
The project required historical data from that client, data from the client's third-party vendors, and data from various APIs.
TECH STACK AND OVERVIEW
Data ingestion using Debezium, Kafka, and Kafka Connect for the CDC from vendor databases into the S3 data lake. data cleaning/data transformation using Python + Spark, and data processing, which involved calculating around 100-150 financial KPIs on top of the ingested data. The data cleaning and processing pipeline was orchestrated using Apache Airﬂow running on an EC2 instance.
Phrazor Product Usage Analytics Pipeline
Built an end-to-end ETL pipeline to get in Click Stream data from clients and internal servers for analytics.
Ingested approximately 80-100GB of daily Phrazor product and Phrazor plugin usage data into AWS S3 Data Lake from multiple client and in-house servers.
Cleaned and transformed the raw data and transferred that from S3 to Snowﬂake Data Warehouse for further analytics. The entire pipeline was orchestrated using Apache Airﬂow and running on an AWS EC2 instance.
This pipeline helped the product managers make smarter product decisions, run A/B tests, and analyze how users use the platform. This layer also supplied clean data to the data scientists for advanced analytics.
IEEE-CIS Fraud Detection Kaggle Competition
Given data for credit card transactions, the solution was supposed to identify fraudulent transactions given data about each transaction. It was a classic example of highly skewed data, with the positive class being less than 1% of the entire data.
Tested and employed data imbalance handling techniques and finally went ahead with hard negative mining. My team and I engineered many features that helped us reach that top 7%.
Python 3, SQL, Snowflake, Python, Scala
Apache Spark, Spark
Pandas, NumPy, Beautiful Soup, PySpark, Amazon EC2 API, GitHub API
Apache Airflow, Microsoft Power BI, Spark SQL, GitHub, Amazon Elastic MapReduce (EMR), Terraform, Tableau, Looker, Git, PyCharm, Sublime Text 3, AWS Glue, Amazon Athena
ETL, Business Intelligence (BI), Unit Testing, Data Science, Agile, Database Design, Dimensional Modeling, Continuous Integration (CI), Continuous Delivery (CD), DevOps
Kubernetes, Apache Kafka, Amazon EC2, Amazon Web Services (AWS), Oracle, Docker, Linux, Ubuntu, Azure, AWS Lambda
PostgreSQL, MySQL, Databases, Relational Databases, Redshift, Data Lakes, Database Modeling, Data Pipelines, JSON, DB, Database Architecture, NoSQL, Amazon S3 (AWS S3), MongoDB, Azure SQL, Datadog
Data Warehousing, ELT, Debezium, Big Data, Data Analytics, Data Engineering, Data Warehouse Design, Data Visualization, Data Extraction, Data Analysis, Database Analytics, ETL Tools, Data, Big Data Architecture, ETL Development, Data Architecture, Warehouses, Data Modeling, Database Schema Design, EMR, Data Cleaning, Data Processing, Star Schema, APIs, Pipelines, CI/CD Pipelines, Query Optimization, Web Scraping, Back-end Development, Dashboards, Reports, Information Visualization, Data Transformation, Message Queues, Database Optimization, Data Build Tool (dbt), CDC, Machine Learning, EDA, Parquet, Consumer Packaged Goods (CPG), Serverless, Full-stack, Change Data Capture, Real Estate, Data Manipulation, Amazon RDS, Data Management Platforms
Bachelor's Degree in Computer Science
Mumbai University - Mumbai, India
GreyAtom School of Data Science
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring