
Paarth Kumar Gupta
Verified Expert in Engineering
Data Analysis Developer
Ghaziabad, Uttar Pradesh, India
Toptal member since August 25, 2020
Paarth is a senior data engineer with more than four years' experience building scalable and efficient BI and warehousing solutions for large scale systems. He specializes in RDBMS and big data analytics, using technologies ranging from on-premises (SQL server) to cloud-based (Azure, Hadoop, Spark). He has worked on numerous domains such as customer analysis, payments, and lead management. Paarth is looking forward to solving complex data problems using cutting edge technologies and techniques.
Portfolio
Experience
- Data Analysis - 7 years
- ETL - 6 years
- Azure - 5 years
- Azure SQL Databases - 5 years
- Azure Data Lake - 4 years
- Azure Data Lake Analytics - 4 years
- Azure Data Factory (ADF) - 4 years
- SQL Server Integration Services (SSIS) - 3 years
Availability
Preferred Environment
Spark, Apache Hive, Azure Data Lake Analytics, Azure Data Factory (ADF), SQL Server 2016, Microsoft Project, Microsoft Planner
The most amazing...
...project I've worked on was to build a BI solution that allowed store owners to evaluate efficiency of promotions and improvise accordingly in near real-time.
Work Experience
Data Engineer
OKR/Goal setting
- Optimized the existing architecture of showing data on UI to end users. Reduced page load latency from 2-3 mins to <10 secs by utilizing the Snowflake data warehouse.
- Developed ETL from 3P data stores to gather valuable data points into DWH. Enabled the analysts to utilize these data points to generate reports.
- Migrated and developed the DWH in Azure using Azure Synapse, ADF, and Azure Data Lake. Used SQL, Scala, and Python scripts to calculate metrics and generate data in the warehouse, making it available for querying from UI.
Senior Software Engineer
FinTech Industry
- Developed features enabling end-users to view the data in a concise and understandable format.
- Optimized hive queries and data flow architecture to reduce data availability latency by 20-25%.
- Involved in internal reviews and process set up to ensure smooth project deliveries, reduce issues in production, and enable learning for all.
Analyst
OYO rooms
- Created data-gathering processes from various sources such as Google Sheets, Hive, and Postgres.
- Ensured data availability for the international operations department and other associates, enabling users to view metrics at various levels and make decisions quickly.
- Created various dashboards to highlight customer sentiment's monthly, weekly, and daily progress on properties across countries and regions.
- Automated the data processing and reporting processes that helped stakeholders with up-to-date metrics across countries (such as the USA, UAE, Brazil, Spain, UK, and Mexico).
Software Engineer 2
MAQ Software
- Decreased data latency by 30-35%, reducing the cost of running jobs on Hadoop clusters.
- Oversaw the complete project lifecycle, from requirement gathering to sprint deployments.
- Tracked the project health metrics regularly to keep the needless issues at bay.
Software Engineer 1
MAQ Software
- Developed 5+ BI and warehousing solutions. Key metrics related to sales, customer identification, and lead management were part of the reporting layer.
- Built a near real-time reporting solution for sales in retail stores during the holiday period.
- Optimized various processes, architectures, and logic for efficient performance and optimal resource consumption.
Experience
Customer Identification and Analytics
• Customer identification is based on unique identifiers (UID and email) based on the customer's login preference.
• Identification was done using the APIs with C# utilities.
• Developed solutions to integrate big data streams into DWH for customer identification and purchase mapping.
• Enabled use cases for analytics based on customer purchase history, such as the propensity of a customer to purchase a to-be-launched product, target forecast, budget analysis, etc.
• Identified the areas of improvement at the architectural and code levels; optimization in these areas improved latency and resource consumption, thereby reducing the cost of running jobs on servers and clusters.
• The biggest challenge in this process was to optimize the architecture to reduce data latency. I approached this problem by applying indexes to fact tables and modularised the data flow instead of a single process.
Project for a Toptal Client
Delayed Invoicing for Commission and Taxes
• Worked on Hive to collect, load, and transform data points involved in the identification of such sellers who opted for this delay.
• Optimized the existing flow using Hive configurations and parameters to incorporate these changes.
Sales Reporting for Online and Retail Stores
• I designed and developed the end-to-end architecture of a refreshable pipeline to collate data from multiple sources (spreadsheets, databases, events, etc) and create a data warehouse.
• It was a completely cloud-based architecture, to be refreshed every 15 minutes, bringing in around 500 million records every run.
• It involved collecting transactions in real-time, to be made available to store owners and stakeholders in the form of reports (MIS and PowerBI) in 15 minutes. This involved creating a handshake between different environments that were not in sync.
• The complete architecture of this project was hosted on Azure Cloud, and the reporting was done through Power BI.
• The major challenge was bringing in some sources from outside the Azure environment into Azure pipelines since they will not be in sync. I solved this problem by implementing a look-back time interval for this new dataset so that no record is missed.
Education
Bachelor's Degree in Computer Science and Engineering
KIET, Ghaziabad - Ghaziabad, India
Certifications
SQL 2016 Business Intelligence Development
Microsoft
Analyzing and Visualizing Data with Microsoft Power BI
Microsoft
Developing SQL Databases
Microsoft
Querying Data with Transact-SQL
Microsoft
Skills
Libraries/APIs
PySpark
Tools
Microsoft Power BI, Microsoft Excel, AWS Glue, Google Sheets, Spark SQL, Tableau, Microsoft Project
Languages
T-SQL (Transact-SQL), SQL, Python, Snowflake, C#, R
Paradigms
ETL, Business Intelligence (BI), Database Development, Database Design, Automation, Azure DevOps
Platforms
Azure, Amazon Web Services (AWS), Windows, Azure Event Hubs, Azure Synapse Analytics, AWS IoT, Azure Synapse, Databricks, Jupyter Notebook
Storage
MySQL, SQL Server 2016, SQL Server Integration Services (SSIS), Database Architecture, Microsoft SQL Server, Databases, JSON, Relational Databases, PostgreSQL, Data Lakes, Database Administration (DBA), Azure Cloud Services, Azure SQL Databases, Apache Hive, SQL Server Analysis Services (SSAS), SSAS Tabular, Azure SQL
Frameworks
Spark
Other
Data Engineering, Data Analysis, Microsoft Data Transformation Services (now SSIS), Azure Data Factory (ADF), Database Optimization, Azure Data Lake, Data Modeling, Big Data, Data Warehousing, Data Marts, Data Architecture, Data Warehouse Design, Data Analytics, Technical Documentation, Technical Writing, Data Visualization, Relational Database Services (RDS), Data Migration, Streaming Data, Data Science, APIs, Azure Data Lake Analytics, Microsoft Azure, DAX, ETL Tools, Azure Stream Analytics, ELT, Web Scraping, SOP Development, Microsoft Planner, JupyterLab
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring