Dhruvi Pandya, Developer in Mumbai, Maharashtra, India
Dhruvi is available for hire
Hire Dhruvi

Dhruvi Pandya

Verified Expert  in Engineering

Data Engineer and Software Developer

Location
Mumbai, Maharashtra, India
Toptal Member Since
September 1, 2023

Dhruvi is a data engineering professional with seven years of industry experience who started as a back-end developer. Her expertise lies in building, maintaining, and optimizing data pipelines, mostly with stacks such as Spark, Airflow, Storm, Snowplow, Docker, and Kafka. She has hands-on experience with AWS services like EC2, EMR, Elastic Beanstalk, Athena, and Redshift. Dhruvi is also well-versed in Agile development and sprint planning methodologies.

Portfolio

Saltside Technologies
ETL, Redshift, Amazon Elastic MapReduce (EMR), AWS Elastic Beanstalk, Python 3...
AccionLabs
Python, JavaScript, Angular, Node.js, MongoDB, PostgreSQL, Apache Airflow...
AccionLabs
Node.js, MongoDB, Express.js, AngularJS, Karma, Protractor, MySQL, Jenkins...

Experience

Availability

Part-time

Preferred Environment

Linux, Visual Studio Code (VS Code), Git

The most amazing...

...thing I've done is optimize a Spark job running on Amazon EMR and bringing down the cost by 25%.

Work Experience

Senior Data Engineer

2021 - PRESENT
Saltside Technologies
  • Collaborated with the team on migrating the Lambda architecture to streaming architecture using tools like Snowplow Analytics, Apache Kafka, and Apache Storm.
  • Built and released a back-end service that exposes the API to fetch the user (seller and buyer) statistics from the Redis cache.
  • Optimized the long-running and resource-consuming AWS EMR Spark job, reducing costs by 25%.
  • Built and deployed SQL-based ELTs for the AWS Redshift data warehousing platform and created the aggregation tables to power the KPI dashboards in Tableau.
Technologies: ETL, Redshift, Amazon Elastic MapReduce (EMR), AWS Elastic Beanstalk, Python 3, Apache Airflow, Snowplow Analytics, Data Engineering, Python, Documentation, Data Architecture, Data Warehouse Design, SQL, Database Modeling, Jupyter Notebook, OLAP, Amazon Web Services (AWS), Communication, Real-time Streaming, ELK (Elastic Stack), PyCharm, Debian Linux, Docker Hub, PostgreSQL 9, Scripting, MapReduce, Data Analytics, Data Warehousing, Big Data Architecture, Solution Architecture, Data Analysis, Apache Kafka, Lambda Architecture, Streaming

Senior Developer

2020 - 2021
AccionLabs
  • Helped migrate SQL-based pipelines to Spark, improving the quality and speed of data at which it's available in the warehouse.
  • Designed an end-to-end pipeline to bring the required data to the data mart for a product recommender system. Built data crunching pipelines with Apache Airflow, exposing the recommendation data via AWS Lambda.
  • Created and maintained Spark streaming pipelines, powering the data warehouse. Collaborated with the data analysts on building reports on top of them.
Technologies: Python, JavaScript, Angular, Node.js, MongoDB, PostgreSQL, Apache Airflow, AWS Lambda, Unit Testing, Data Analysis, Spark, Spark Streaming, ETL, Data Engineering, SQL, PySpark, Jupyter Notebook, Apache Spark, Amazon Web Services (AWS), Communication, Real-time Streaming, Pytest, Debian Linux, Docker Hub, Scripting, Data Analytics, OLTP, PL/SQL, Database Architecture, Data Migration, Big Data Architecture, Solution Architecture

Full-stack Developer

2018 - 2020
AccionLabs
  • Collaborated with a team in developing 10+ internal applications for a major cloud provider client—each application had an 80%+ test coverage on both unit and end-to-end test cases.
  • Worked with a team to build a generalized back end that can provide auth and basic endpoints for creating a new collection within MongoDB and all the necessary CRUD operation endpoints. This reduced back-end development time by almost 80%.
  • Coordinated with the onsite team on cross-team projects and agile sprint planning.
Technologies: Node.js, MongoDB, Express.js, AngularJS, Karma, Protractor, MySQL, Jenkins, Python, Flask-RESTful, Object-relational Mapping (ORM), Communication, OLTP, Database Architecture

Front-end Developer

2016 - 2018
AccionLabs
  • Worked alongside a team to develop the front end for a startup for security surveillance. This included displaying live feeds from the camera via socket-based communication and showing real-time bounding boxes for intruder detection.
  • Migrated 40% of an ongoing project's outdated components to Angular 5.
  • Documented 80% of the ongoing project's legacy code.
Technologies: Angular, React, Protractor, Selenium, HTML, CSS, CSS3, HTML5, JavaScript, Communication

CDC Pipeline

https://github.com/dhruvip/kafka-connect-cdc
Developed a CDC pipeline using Kafka. It brings in the stream of real-time data from the MySQL relational database using Kafka stack, specifically Kafka Connect and Kafka clusters, and dumps the data into an Elasticsearch cluster.

Mock Retail Store Analytics

An end-to-end data pipeline to power the analytics dashboard for a mock retail store.

This was a personal project where I showcased my skills in:
• Data architecture on cloud platforms like AWS
• Building robust data pipelines and orchestrating them with Airflow, Bash Scripting, Python, and plain SQL
• Data processing and exploratory analysis with Python, Pandas, and Jupyter
• Data modeling and dim/fact table creation with dbt
• Data visualization and dashboards with Metabase

I was able to gather insights on the following:
• Best-performing marketing campaigns
• Top 10 revenue-generating products in top-performing countries

Fast Food Chain Location Analysis

Jupyter Notebook-based exploratory data analysis to understand the correlation between the location chosen by successful food chains and its neighborhood landmarks. This helps a new restaurant owner understand what these thriving businesses look for when scouting for locations.

Languages

Python 3, Python, SQL, C++, JavaScript, HTML, CSS, CSS3, HTML5, Scala

Frameworks

Spark, Apache Spark, Angular, Express.js, AngularJS, Protractor, Selenium, Jinja

Libraries/APIs

Pandas, PySpark, Spark Streaming, Node.js, Matplotlib, Flask-RESTful, React

Tools

Git, Apache Airflow, Docker Compose, Snowplow Analytics, PyCharm, Amazon Elastic MapReduce (EMR), Seaborn, Karma, Jenkins, ELK (Elastic Stack), Pytest, Docker Hub, Plotly

Paradigms

ETL, OLAP, Data Science, Unit Testing, Object-relational Mapping (ORM), MapReduce, Lambda Architecture

Platforms

Docker, Apache Kafka, Linux, Jupyter Notebook, Amazon Web Services (AWS), Debian Linux, AWS Lambda, Visual Studio Code (VS Code), AWS Elastic Beanstalk

Storage

Databases, MySQL, Redshift, Database Modeling, OLTP, Database Architecture, NoSQL, MongoDB, PostgreSQL, Apache Hive, PL/SQL

Other

Big Data, Data Analysis, Data Visualization, Shell Scripting, Data Modeling, Data Engineering, Documentation, Data Warehouse Design, Communication, Real-time Streaming, Scripting, Data Analytics, Data Warehousing, Data Structures, Data Migration, Big Data Architecture, Solution Architecture, Algorithms, APIs, Machine Learning, CDC, Data Architecture, Metabase, Data Build Tool (dbt), Cosmos, PostgreSQL 9, Streaming, Macros

2012 - 2016

Bachelor's Degree in Computer Engineering

Ahmedabad University - Ahmedabad, India

SEPTEMBER 2023 - SEPTEMBER 2025

dbt Fundamentals

dbt

JULY 2019 - PRESENT

IBM Data Science Professional Certificate

IBM | via Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring