Paarth Kumar Gupta, Developer in Ghaziabad, Uttar Pradesh, India
Paarth is available for hire
Hire Paarth

Paarth Kumar Gupta

Verified Expert  in Engineering

Bio

Paarth is a senior data engineer with more than four years' experience building scalable and efficient BI and warehousing solutions for large scale systems. He specializes in RDBMS and big data analytics, using technologies ranging from on-premises (SQL server) to cloud-based (Azure, Hadoop, Spark). He has worked on numerous domains such as customer analysis, payments, and lead management. Paarth is looking forward to solving complex data problems using cutting edge technologies and techniques.

Portfolio

OKR/Goal setting
Snowflake, AWS IoT, Python, Azure Synapse, Azure Data Factory (ADF)...
FinTech Industry
SQL, ETL Tools, Apache Hive, Data Engineering, Data Visualization...
OYO rooms
SQL, Apache Hive, Google Sheets, R, Data Visualization, Web Scraping...

Experience

  • Data Analysis - 7 years
  • ETL - 6 years
  • Azure - 5 years
  • Azure SQL Databases - 5 years
  • Azure Data Lake - 4 years
  • Azure Data Lake Analytics - 4 years
  • Azure Data Factory (ADF) - 4 years
  • SQL Server Integration Services (SSIS) - 3 years

Availability

Part-time

Preferred Environment

Spark, Apache Hive, Azure Data Lake Analytics, Azure Data Factory (ADF), SQL Server 2016, Microsoft Project, Microsoft Planner

The most amazing...

...project I've worked on was to build a BI solution that allowed store owners to evaluate efficiency of promotions and improvise accordingly in near real-time.

Work Experience

Data Engineer

2020 - PRESENT
OKR/Goal setting
  • Optimized the existing architecture of showing data on UI to end users. Reduced page load latency from 2-3 mins to <10 secs by utilizing the Snowflake data warehouse.
  • Developed ETL from 3P data stores to gather valuable data points into DWH. Enabled the analysts to utilize these data points to generate reports.
  • Migrated and developed the DWH in Azure using Azure Synapse, ADF, and Azure Data Lake. Used SQL, Scala, and Python scripts to calculate metrics and generate data in the warehouse, making it available for querying from UI.
Technologies: Snowflake, AWS IoT, Python, Azure Synapse, Azure Data Factory (ADF), Azure Data Lake, Spark SQL, Data Engineering, Data Warehousing, Microsoft Excel, JSON, Relational Databases, Relational Database Services (RDS), SOP Development, Data Migration, Amazon Web Services (AWS), PostgreSQL, Data Lakes, Database Administration (DBA), Windows, Tableau, Azure Cloud Services, Microsoft Project, APIs, Databricks, Automation, Database Development

Senior Software Engineer

2019 - 2020
FinTech Industry
  • Developed features enabling end-users to view the data in a concise and understandable format.
  • Optimized hive queries and data flow architecture to reduce data availability latency by 20-25%.
  • Involved in internal reviews and process set up to ensure smooth project deliveries, reduce issues in production, and enable learning for all.
Technologies: SQL, ETL Tools, Apache Hive, Data Engineering, Data Visualization, Microsoft Excel, JSON, Relational Databases, Relational Database Services (RDS), SOP Development, Data Lakes, Database Administration (DBA), APIs, Databricks, Jupyter Notebook, JupyterLab, Automation, Database Development

Analyst

2019 - 2019
OYO rooms
  • Created data-gathering processes from various sources such as Google Sheets, Hive, and Postgres.
  • Ensured data availability for the international operations department and other associates, enabling users to view metrics at various levels and make decisions quickly.
  • Created various dashboards to highlight customer sentiment's monthly, weekly, and daily progress on properties across countries and regions.
  • Automated the data processing and reporting processes that helped stakeholders with up-to-date metrics across countries (such as the USA, UAE, Brazil, Spain, UK, and Mexico).
Technologies: SQL, Apache Hive, Google Sheets, R, Data Visualization, Web Scraping, Microsoft Excel, JSON, Microsoft Data Transformation Services (now SSIS), Relational Databases, Relational Database Services (RDS), SOP Development, Data Migration, Database Administration (DBA), Tableau, Azure Cloud Services, Data Science, Jupyter Notebook, Automation, Database Development

Software Engineer 2

2018 - 2019
MAQ Software
  • Decreased data latency by 30-35%, reducing the cost of running jobs on Hadoop clusters.
  • Oversaw the complete project lifecycle, from requirement gathering to sprint deployments.
  • Tracked the project health metrics regularly to keep the needless issues at bay.
Technologies: Microsoft Azure, SQL Server 2016, Data Engineering, Data Warehousing, Data Visualization, Microsoft Excel, JSON, Microsoft Data Transformation Services (now SSIS), Relational Databases, Relational Database Services (RDS), SOP Development, Data Migration, Data Lakes, Database Administration (DBA), Windows, Azure Cloud Services, Streaming Data, Data Science, Databricks, Jupyter Notebook, Automation, Database Development

Software Engineer 1

2016 - 2018
MAQ Software
  • Developed 5+ BI and warehousing solutions. Key metrics related to sales, customer identification, and lead management were part of the reporting layer.
  • Built a near real-time reporting solution for sales in retail stores during the holiday period.
  • Optimized various processes, architectures, and logic for efficient performance and optimal resource consumption.
Technologies: SQL, SQL Server Integration Services (SSIS), SSAS Tabular, Azure, Data Warehousing, Data Visualization, Microsoft Excel, JSON, Microsoft Data Transformation Services (now SSIS), Relational Databases, Relational Database Services (RDS), SOP Development, Data Migration, Data Lakes, Database Administration (DBA), Windows, Azure Cloud Services, Streaming Data, Data Science, Microsoft Project, Jupyter Notebook, Automation, Database Development

Experience

Customer Identification and Analytics

A warehousing back-end solution aimed at the identification and mapping of customers.

• Customer identification is based on unique identifiers (UID and email) based on the customer's login preference.

• Identification was done using the APIs with C# utilities.

• Developed solutions to integrate big data streams into DWH for customer identification and purchase mapping.

• Enabled use cases for analytics based on customer purchase history, such as the propensity of a customer to purchase a to-be-launched product, target forecast, budget analysis, etc.

• Identified the areas of improvement at the architectural and code levels; optimization in these areas improved latency and resource consumption, thereby reducing the cost of running jobs on servers and clusters.

• The biggest challenge in this process was to optimize the architecture to reduce data latency. I approached this problem by applying indexes to fact tables and modularised the data flow instead of a single process.

Project for a Toptal Client

An ETL-based project aimed to gather raw data from Postgres and transform it into a usable format. This project was completed using AWS Glue, PySpark (Python), S3, and Athena. The data gathered in the raw bucket had to be collected, processed, de-duplicated, and transformed back into a processed S3 bucket. The Athena layer on top of S3 was used to query the data points.

Delayed Invoicing for Commission and Taxes

• Developed a mechanism to charge the trader/seller at a regular interval instead of on a per-purchase basis and provided the sellers with an option to choose their invoicing cycle.

• Worked on Hive to collect, load, and transform data points involved in the identification of such sellers who opted for this delay.

• Optimized the existing flow using Hive configurations and parameters to incorporate these changes.

Sales Reporting for Online and Retail Stores

• The project was meant to provide real-time insights and metrics to the store owners and product stakeholders during a certain period of a crucial sale.

• I designed and developed the end-to-end architecture of a refreshable pipeline to collate data from multiple sources (spreadsheets, databases, events, etc) and create a data warehouse.

• It was a completely cloud-based architecture, to be refreshed every 15 minutes, bringing in around 500 million records every run.

• It involved collecting transactions in real-time, to be made available to store owners and stakeholders in the form of reports (MIS and PowerBI) in 15 minutes. This involved creating a handshake between different environments that were not in sync.

• The complete architecture of this project was hosted on Azure Cloud, and the reporting was done through Power BI.

• The major challenge was bringing in some sources from outside the Azure environment into Azure pipelines since they will not be in sync. I solved this problem by implementing a look-back time interval for this new dataset so that no record is missed.

Education

2012 - 2016

Bachelor's Degree in Computer Science and Engineering

KIET, Ghaziabad - Ghaziabad, India

Certifications

MARCH 2019 - PRESENT

SQL 2016 Business Intelligence Development

Microsoft

DECEMBER 2017 - PRESENT

Analyzing and Visualizing Data with Microsoft Power BI

Microsoft

MARCH 2017 - PRESENT

Developing SQL Databases

Microsoft

FEBRUARY 2017 - PRESENT

Querying Data with Transact-SQL

Microsoft

Skills

Libraries/APIs

PySpark

Tools

Microsoft Power BI, Microsoft Excel, AWS Glue, Google Sheets, Spark SQL, Tableau, Microsoft Project

Languages

T-SQL (Transact-SQL), SQL, Python, Snowflake, C#, R

Paradigms

ETL, Business Intelligence (BI), Database Development, Database Design, Automation, Azure DevOps

Platforms

Azure, Amazon Web Services (AWS), Windows, Azure Event Hubs, Azure Synapse Analytics, AWS IoT, Azure Synapse, Databricks, Jupyter Notebook

Storage

MySQL, SQL Server 2016, SQL Server Integration Services (SSIS), Database Architecture, Microsoft SQL Server, Databases, JSON, Relational Databases, PostgreSQL, Data Lakes, Database Administration (DBA), Azure Cloud Services, Azure SQL Databases, Apache Hive, SQL Server Analysis Services (SSAS), SSAS Tabular, Azure SQL

Frameworks

Spark

Other

Data Engineering, Data Analysis, Microsoft Data Transformation Services (now SSIS), Azure Data Factory (ADF), Database Optimization, Azure Data Lake, Data Modeling, Big Data, Data Warehousing, Data Marts, Data Architecture, Data Warehouse Design, Data Analytics, Technical Documentation, Technical Writing, Data Visualization, Relational Database Services (RDS), Data Migration, Streaming Data, Data Science, APIs, Azure Data Lake Analytics, Microsoft Azure, DAX, ETL Tools, Azure Stream Analytics, ELT, Web Scraping, SOP Development, Microsoft Planner, JupyterLab

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring