Dhruvi Pandya
Verified Expert in Engineering
Data Engineer and Software Developer
Mumbai, Maharashtra, India
Toptal member since September 1, 2023
Dhruvi is a data engineering professional with seven years of industry experience who started as a back-end developer. Her expertise lies in building, maintaining, and optimizing data pipelines, mostly with stacks such as Spark, Airflow, Storm, Snowplow, Docker, and Kafka. She has hands-on experience with AWS services like EC2, EMR, Elastic Beanstalk, Athena, and Redshift. Dhruvi is also well-versed in Agile development and sprint planning methodologies.
Portfolio
Experience
Availability
Preferred Environment
Linux, Visual Studio Code (VS Code), Git
The most amazing...
...thing I've done is optimize a Spark job running on Amazon EMR and bringing down the cost by 25%.
Work Experience
Senior Data Engineer
Saltside Technologies
- Collaborated with the team on migrating the Lambda architecture to streaming architecture using tools like Snowplow Analytics, Apache Kafka, and Apache Storm.
- Built and released a back-end service that exposes the API to fetch the user (seller and buyer) statistics from the Redis cache.
- Optimized the long-running and resource-consuming Amazon EMR Spark job, reducing costs by 25%.
- Built and deployed SQL-based ELTs for the Amazon Redshift data warehousing platform and created the aggregation tables to power the KPI dashboards in Tableau.
- Implemented, maintained, and upgraded ETLs orchestrated with Apache Airflow and dbt to ingest, stitch, and aggregate data to power the KPI dashboards.
Senior Developer
AccionLabs
- Helped migrate SQL-based pipelines to Spark, improving the quality and speed of data at which it's available in the warehouse.
- Designed an end-to-end pipeline to bring the required data to the data mart for a product recommender system. Built data crunching pipelines with Apache Airflow, exposing the recommendation data via AWS Lambda.
- Created and maintained Spark streaming pipelines, powering the data warehouse. Collaborated with the data analysts on building reports on top of them.
Full-stack Developer
AccionLabs
- Collaborated with a team in developing 10+ internal applications for a major cloud provider client—each application had an 80%+ test coverage on both unit and end-to-end test cases.
- Worked with a team to build a generalized back end that can provide auth and basic endpoints for creating a new collection within MongoDB and all the necessary CRUD operation endpoints. This reduced back-end development time by almost 80%.
- Coordinated with the onsite team on cross-team projects and agile sprint planning.
Front-end Developer
AccionLabs
- Worked alongside a team to develop the front end for a startup for security surveillance. This included displaying live feeds from the camera via socket-based communication and showing real-time bounding boxes for intruder detection.
- Migrated 40% of an ongoing project's outdated components to Angular 5.
- Documented 80% of the ongoing project's legacy code.
Experience
CDC Pipeline
https://github.com/dhruvip/kafka-connect-cdcMock Retail Store Analytics
This was a personal project where I showcased my skills in:
• Data architecture on cloud platforms like AWS
• Building robust data pipelines and orchestrating them with Airflow, Bash Scripting, Python, and plain SQL
• Data processing and exploratory analysis with Python, Pandas, and Jupyter
• Data modeling and dim/fact table creation with dbt
• Data visualization and dashboards with Metabase
I was able to gather insights on the following:
• Best-performing marketing campaigns
• Top 10 revenue-generating products in top-performing countries
Fast Food Chain Location Analysis
Education
Bachelor's Degree in Computer Engineering
Ahmedabad University - Ahmedabad, India
Certifications
AWS Certified Solutions Architect – Associate
Amazon Web Services
dbt Fundamentals
dbt
IBM Data Science Professional Certificate
IBM | via Coursera
Skills
Libraries/APIs
Pandas, PySpark, Spark Streaming, Node.js, Matplotlib, Flask-RESTful, React
Tools
Git, Apache Airflow, Docker Compose, Snowplow Analytics, PyCharm, Amazon Elastic MapReduce (EMR), Seaborn, Karma, Jenkins, ELK (Elastic Stack), Pytest, Docker Hub, Plotly, Microsoft Excel, AWS IAM
Languages
Python 3, Python, SQL, C++, JavaScript, HTML, CSS, CSS3, HTML5, Scala
Frameworks
Spark, Apache Spark, Angular, Express.js, AngularJS, Protractor, Selenium, Jinja
Paradigms
ETL, OLAP, Unit Testing, Object-relational Mapping (ORM), MapReduce, Lambda Architecture
Platforms
Docker, Apache Kafka, Linux, Jupyter Notebook, Amazon Web Services (AWS), Debian Linux, AWS Lambda, Visual Studio Code (VS Code), AWS Elastic Beanstalk
Storage
Databases, MySQL, Redshift, Database Modeling, OLTP, Database Architecture, JSON, NoSQL, MongoDB, PostgreSQL, Apache Hive, PL/SQL, Amazon S3 (AWS S3)
Other
Big Data, Data Analysis, Data Visualization, Shell Scripting, Data Modeling, Data Engineering, Documentation, Data Warehouse Design, Communication, Real-time Streaming, Scripting, Data Analytics, Data Warehousing, Data Science, Data Structures, Data Migration, Big Data Architecture, Solution Architecture, Algorithms, APIs, Machine Learning, CDC, Data Architecture, Metabase, Data Build Tool (dbt), Cosmos, PostgreSQL 9, Streaming, Macros, Architecture
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring