Sandesh Pawar, Developer in Pune, Maharashtra, India
Sandesh is available for hire
Hire Sandesh

Sandesh Pawar

Verified Expert  in Engineering

Big Data Developer

Location
Pune, Maharashtra, India
Toptal Member Since
June 18, 2020

Sandesh is a competent data architect and engineer specializing in SQL Server, MySQL, MongoDB, Spark, Databricks, Redshift, Tableau, and Power BI. In 2011, Sandesh started his career working with relational databases and then transitioned as a BI lead. Recently, he has been working as a BI lead and database architect. Sandesh is equally skilled in Azure and AWS, with over five years of experience in the public cloud platform for data-related services.

Portfolio

PepsiCo
Azure Cosmos DB, Big Data, Microsoft Power BI, MSBI, Azure Data Lake...
Nexuus LLC
Data Engineering, Data Architecture, Data Analytics, Elasticsearch, Big Data...
IQVIA
Azure, Azure Databricks, Azure Data Factory, Azure Synapse...

Experience

Availability

Part-time

Preferred Environment

SQL Server Data Tools (SSDT), Workbench, SQL Server Management Studio, SQL, Linux, Windows, MongoDB Atlas

The most amazing...

...thing I've done was the design of a data lake on AWS, which involved data transformation and storage in the ORC format and a quick retrieval with Presto.

Work Experience

Senior Data Engineer

2020 - PRESENT
PepsiCo
  • Prototyped a continuous streaming ETL requirement with the help of SQL Server CDC, Azure Event Hub, and Azure Databricks.
  • Created PySpark Azure Databricks jobs to consume data from relational databases and CSV and JSON files.
  • Implemented sharding and partitioning in Azure SQL Database based on use cases.
  • Designed a data lake architecture and a partitioning strategy to store historical data.
  • Innovated and researched different options to implement the SMART ETL metadata-driven concept. Implemented Azure Data Factory pipelines to move data from application databases to ODS and DWH for analytics.
  • Implemented the design collection schema in Azure Cosmos DB. Migrated the MongoDB collections to Cosmos DB. Evaluated and implemented the multi-master feature of Cosmos DB.
  • Implemented Tableau dashboards consuming sales data.
  • Developed ROI solution comprising marketing and sales data. It involves multiple data sources to help the business team understand net spending versus revenue for PepsiCo products.
  • Developed a number of dbt models and orchestrated pipelines using Airflow. Optimized DAGs and models two times to improve the overall execution time.
  • Designed final reporting tables in Snowflake. Played an important role in the data vault architecture of initial tables and evolved models over time to satisfy business needs.
Technologies: Azure Cosmos DB, Big Data, Microsoft Power BI, MSBI, Azure Data Lake, Microsoft SQL Server, Azure, Data Engineering, Schemas, Databases, Python, Pandas, Apache Airflow, Data Build Tool (dbt), Snowflake

Data Engineer Consultant

2022 - 2022
Nexuus LLC
  • Delivered transaction monitoring data architecture to scale for billions of transactions for a financial product company.
  • Suggested best practices to scale existing infrastructure utilizing Azure SQL Databases.
  • Recommended long-term architecture utilizing Azure Cosmos DB and Elasticsearch to scale the solution for bigger clients.
Technologies: Data Engineering, Data Architecture, Data Analytics, Elasticsearch, Big Data, Machine Learning, Big Data Architecture

Senior Data Engineer

2021 - 2022
IQVIA
  • Developed custom ETL process to convert OMOP data into FHIR R4 resources.
  • Designed and developed a common data model aligning with USCDI standards for OMOP and site-mediated EMR files.
  • Built data access layer to fetch FHIR R4 data files from 1upHealth.
  • Used Microsoft FHIR Converter service to convert data from STU3 to R4 formats using Liquid templates.
Technologies: Azure, Azure Databricks, Azure Data Factory, Azure Synapse, Fast Healthcare Interoperability Resources (FHIR), HL7 FHIR Standard, Electronic Medical Records (EMR), EHR

Snowflake Architect

2021 - 2021
Self-employed
  • Handled the migration of an on-prem data warehouse to Snowflake.
  • Performed data modeling for new data sources as per dimensional modeling standards.
  • Created multiple stored procedures to automate the data flow from different sources in S3 to Snowflake.
  • Implemented streams to automatically push data in Snowflake on top of S3 using SQS.
  • Reduced Snowflake credit costs by one-third by implementing best practices.
  • Migrated Snowflake tasks to Airflow to provide a better orchestration mechanism for data pipelines.
Technologies: Snowflake, SQL, Apache Airflow, Big Data Architecture, Amazon Web Services (AWS), Amazon S3 (AWS S3)

Python/Data Engineer

2019 - 2020
SupportLogic
  • Implemented a generic CRM importer in Python to cater schema variance from different CRMs.
  • Explored CRM data models for Zendesk, Salesforce, ServiceNow, and Dynamics and implemented metadata-driven importer to adhere to a common schema.
  • Used Fivetran connectors for different CRMs to pull data into the staging area.
  • Designed data warehouse model and implemented best practices to optimize processing and cost for Google BigQuery.
  • Orchestrated different pipelines using Airflow and optimized overall pipeline execution by two times.
Technologies: Google Cloud Platform (GCP), Python 3, PostgreSQL, Google BigQuery, PubSubJS, Google Cloud Storage, Data Build Tool (dbt), Apache Airflow, ServiceNow, Salesforce, Fivetran

Freelance Database Specialist

2018 - 2018
CartHook, Inc.
  • Designed and implemented a strategy for character encoding changes in MySQL, all without downtime.
  • Evaluated a one-way replica feature of Aurora RDS replica for zero downtime.
  • Generated a script for modifications of a large number of tables to increase turnaround time.
  • Prepared a dynamic script for verification of content before and after migration.
  • Suggested best practices for a MySQL table design for better performance.
  • Handled the migration activity from end-to-end in the staging and production environments.
Technologies: Database Migration, Percona, Amazon Aurora, MySQL, Amazon Web Services (AWS)

DBA Lead | Database Architect

2016 - 2018
NICE Ltd
  • Evaluated different NoSQL databases and selected them based on project requirements.
  • Created a multitenant and scalable schema design using MySQL and Aurora RDS.
  • Architected and implemented a data lake using Spark, Hive, and EMR Hadoop.
  • Designed and implemented Redshift DW as a central data store.
  • Created multiple PySpark Jobs in AWS Glue to move data from MySQL RDS to Redshift.
  • Analyzed and gained insights using Sample POC from Neilsen Retail Scanner data and consumed the same in AWS Quicksight.
  • Set up and managed MongoDB Clusters in AWS EC2. MongoDB Data Model Design(Embedded V/S Separate Collection Approach) and performance tuning in MongoDB.
  • Used the Aggregation framework for analytics queries and migrated MongoDB Clusters from AWS EC2 to Atlas.
Technologies: Amazon Web Services (AWS), Apache Kylin, Presto DB, Apache Hive, Spark, MySQL, Redshift, Microsoft SQL Server, Elasticsearch, MongoDB, Python, Jupyter Notebook

Business Intelligence (BI) Lead

2013 - 2016
Cognizant
  • Led a team of four individuals to implement different BI solutions for a healthcare's core systems, specifically implementing a central DW and SSAS cube.
  • Built SSRS reporting solutions for different clients.
  • Designed and prototyped scorecard and dashboard management reporting systems for claims turnaround time and processor productivity reports.
  • Implemented reconciliation reports to compare data across different source systems—resulting in significant FTE savings and increased SLA.
  • Managed the smooth transition from SSRS 2005 to SSRS 2014 reports and SSRS to PowerBI for multiple clients.
  • Designed the packages to extract data from SQL and Sybase database, flat files, and then loaded into a SQL server database.
  • Created a relational database design for a claims-and-financial data warehouse. With the help of ETL packages, the data gets loaded into a centralized data warehouse.
  • Made different measure groups and dimensions; also implemented MDX scripts for several reports.
  • Implemented an ad-hoc reporting solution with the help of SSAS for the finance data warehouse.
  • Developed and designed a data warehouse and cube and implemented an ad-hoc reporting solution.
Technologies: Database Administration (DBA), SQL Server Reporting Services (SSRS), SSAS, SQL Server Integration Services (SSIS), Microsoft SQL Server

Database Developer | Database Administrator (DBA)

2011 - 2013
Persistent Systems Limited
  • Gained significant hands-on experience in database schema design and complex stored procedures. Was also exposed to different BI development tools and DW development.
  • Designed and developed more than 50+ tables; all the tables were indexed and tuned, then de-normalized when necessary to improve performance.
  • Developed more than 100 stored procedures complete with parameters, RETURN values, complex multi-table JOINs and cursors.
  • Performance-tested, troubleshot, and optimized using SQL profiler, execution plans, and DMVs.
  • Implemented database mirroring, log shipping, and transaction replication as a high availability solution for different customers as per requirements specified in SLA.
  • Designed a collections schema in MongoDB for unstructured data from social networks; also automated the data flow process for real-time data.
  • Performed query tuning for reports developed in SQL server to reduce the response time by 60% in some cases.
  • Designed and developed an engagement analysis schema on top of an existing framework.
  • Wrote analysis reports—using the open source reporting tool JasperSoft—to provide accurate reports about activities.
Technologies: MySQL, SQL Server Reporting Services (SSRS), SSAS, SQL Server Integration Services (SSIS), MongoDB, Microsoft SQL Server

Online Shopping Platform Database Design and Development

• Performed query tuning for reports developed in MySQL to reduce response time by 60% in some cases.
• Designed and developed engagement analysis schema on top of an existing framework.
• Wrote analysis reports—using the open source reporting tool, JasperSoft—to provide accurate reports about activities.

Power BI Reporting Solution

• Designed and developed Power BI dashboards and reports.
• Designed a data model for Power BI reports.
• Deployed Power BI reports.

Data Lake Architecture Design and Spark Implementation

• Created the architecture for a data lake based on a number of AWS services.
• Evaluated ORC vs Parquet performance and made the decision to store data in the ORC format.
• Developed Spark PySpark jobs to move data from different files.

ETL Solution using SSIS Packages

• Designed the packages in order to extract data from SQL/Sybase database, flat files, and loaded into an SQL server database
• Designed ETL packages with the SSIS framework. It deals with different data sources (SQL server, flat files, and Excel) and loads the data into target data sources by performing different kinds of transformations using SQL server integration services
• Migrated number of DTS packages (SQL 2000) to SQL Server 2012 SSIS database.

SSRS Reporting Solution for Different Clients

• Designed and prototyped scorecard/dashboard management reporting systems for claims turnaround time and processor’s productivity reports.
• Implemented reconciliation reports to compare data across different source systems. It resulted in significant FTE savings and increased SLA.
• Managed a smooth transition from SSRS 2005 to SSRS 2014 reports and SSRS to PowerBI for multiple clients.

Data Warehouse and SSAS Multidimensional/Tabular Model Design and Development

• Created the relational database design for a claims-and-financial data warehouse. With the help of ETL packages, the data gets loaded into a central data warehouse.
• Designed different measure groups and dimensions.
• Implemented MDX scripts for a number of reports.
• Implemented ad-hoc reporting solution with the help of SSAS for the finance DW.

Database Design and Development for Gamification Projects

• Designed and developed more than 50 tables. All the tables were indexed and tuned, then de-normalized when necessary to improve performance.
• Developed more than 100 stored procedures with parameters, RETURN values, complex multi-table JOINs and cursors.
• Performance tested, troubleshot, and optimized (using SQL Profiler, execution plans, and DMVs).

Database High Availability Implementation and Production Support

• Implemented SQL Server HA solutions for clients using the below solutions:
1) Log shipping
2) Database mirroring
3) Transaction replication
4) Always on

Languages

SQL, T-SQL (Transact-SQL), Snowflake, Python, Python 3

Frameworks

Spark, Presto DB

Libraries/APIs

PySpark, Pandas, PubSubJS

Tools

MySQL Workbench, SQL Server BI, Amazon Elastic MapReduce (EMR), AWS Glue, Spark SQL, iReport Designer, Microsoft Power BI, MongoDB Atlas, Tableau, SSAS, Azure IoT Suite, Azure Machine Learning, Apache Airflow, Google Analytics

Paradigms

ETL, Dimensional Modeling, Business Intelligence (BI), Data Science, Fast Healthcare Interoperability Resources (FHIR), HL7 FHIR Standard

Platforms

Amazon Web Services (AWS), Azure, Databricks, Jupyter Notebook, Windows, Linux, Percona, Azure Event Hubs, Google Cloud Platform (GCP), Salesforce

Storage

Azure Cosmos DB, Microsoft SQL Server, SQL Server Management Studio, SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server DBA, SQL Server 2014, Database Backups, Database Modeling, SQL Server Reporting Services (SSRS), Amazon DynamoDB, Data Lakes, Azure SQL Databases, Amazon S3 (AWS S3), Databases, Data Pipelines, MySQL, MongoDB, Amazon Aurora, Database Migration, Database Replication, Elasticsearch, Redshift, SSAS Tabular, Database Administration (DBA), SQL Server Data Tools (SSDT), Apache Hive, PostgreSQL, Google Cloud Storage

Other

Data Architecture, Data Engineering, Azure Data Lake, Azure Data Lake Analytics, MSBI, Big Data, Query Optimization, Log Shipping, Performance Optimization, Azure Synapse, Azure Data Factory, Azure SQL Data Warehouse (SQL DW), Azure Databricks, APIs, Data Warehousing, Schemas, Dashboards, Data Analysis, Data Visualization, Cloud, Business Continuity & Disaster Recovery (BCDR), Multidimensional Expressions (MDX), DAX, Always On, Data Vaults, Workbench, Apache Kylin, Azure Stream Analytics, Machine Learning, Big Data Architecture, Data Analytics, Data Build Tool (dbt), Software Development, Electronic Medical Records (EMR), EHR, Google BigQuery, ServiceNow, Fivetran, Electronic Health Records (EHR)

2007 - 2011

Bachelor of Technology Degree in Computer Science and Engineering

Walchand College of Engineering, Sangli - Sangli, Maharashtra, India

2007 - 2007

Higher Secondary School Certificate in Basics

Pune University - Pune, Maharashtra, India

2005 - 2005

Secondary School Certificate in Basics

Pune University - Pune, Maharashtra, India

DECEMBER 2022 - PRESENT

Apache Airflow Fundamentals

Astronomer

SEPTEMBER 2022 - PRESENT

SnowPro Core

Snowflake

OCTOBER 2021 - PRESENT

Microsoft Certified Azure Data Scientist

Microsoft

OCTOBER 2021 - PRESENT

Exam DP-900: Microsoft Azure Data Fundamentals

Microsoft

OCTOBER 2021 - PRESENT

Exam AZ-900: Microsoft Azure Fundamentals

Microsoft

OCTOBER 2021 - OCTOBER 2022

Microsoft Azure Data Engineer Associate

Microsoft

MAY 2021 - PRESENT

Databricks Certified Apache Spark 3.0 Developer

Databricks

OCTOBER 2017 - OCTOBER 2020

AWS Certified Solutions Architect Associate

AWS

APRIL 2015 - PRESENT

Microsoft Certified Technology Specialist SQL Server 2008 Business Intelligence and Development

Microsoft

JUNE 2013 - PRESENT

Microsoft Certified Technology Specialist SQL Server 2008 Implementation and Maintenance

Microsoft

NOVEMBER 2011 - PRESENT

Microsoft Certified Technology Specialist SQL Server 2008 Database Development

Microsoft

JANUARY 2011 - PRESENT

Microsoft Certified SQL Server Associate

Microsoft