
Sandesh Pawar
Verified Expert in Engineering
Data Warehousing Developer
Pune, Maharashtra, India
Toptal member since June 18, 2020
Sandesh is a competent data architect and engineer specializing in SQL Server, MySQL, MongoDB, Spark, Databricks, Redshift, Tableau, and Power BI. In 2011, Sandesh started his career working with relational databases and then transitioned as a BI lead. Recently, he has been working as a BI lead and database architect. Sandesh is equally skilled in Azure and AWS, with over five years of experience in the public cloud platform for data-related services.
Portfolio
Experience
- Big Data - 7 years
- Data Architecture - 5 years
- Azure - 5 years
- Snowflake - 4 years
- Databricks - 4 years
- Amazon Web Services (AWS) - 3 years
- Spark - 3 years
- Microsoft Power BI - 1 year
Availability
Preferred Environment
SQL Server Data Tools (SSDT), Workbench, SQL Server Management Studio (SSMS), SQL, Linux, Windows, MongoDB Atlas
The most amazing...
...thing I've done was the design of a data lake on AWS, which involved data transformation and storage in the ORC format and a quick retrieval with Presto.
Work Experience
Azure Data Engineer via Toptal
Circles Sodexo
- Developed a generic ETL pipeline in Azure to cater to different schema tables from Salesforce Service Cloud.
- Contributed to a metadata-driven ETL to onboard new customers easily via a config file.
- Optimized the overall ETL flow by 50% and reduced Azure cost significantly.
- Built a monitoring and logging solution so that business users can track ETL progress for multiple clients in a single uniform view.
Lead Data Engineer via Toptal
PepsiCo
- Prototyped a continuous streaming ETL requirement with the help of SQL Server CDC, Azure Event Hubs, and Azure Databricks.
- Created PySpark Azure Databricks jobs to consume data from relational databases and CSV and JSON files.
- Implemented sharding and partitioning in Azure SQL Database based on use cases.
- Designed a data lake architecture and a partitioning strategy to store historical data.
- Innovated and researched different options to implement the Smart ETL metadata-driven concept. Implemented Azure Data Factory pipelines to move data from application databases to ODS and DWH for analytics.
- Implemented the design collection schema in Azure Cosmos DB. Migrated the MongoDB collections to Cosmos DB. Evaluated and implemented the multi-master feature of Cosmos DB.
- Implemented Tableau dashboards consuming sales data.
- Developed an ROI solution comprising marketing and sales data. It involves multiple data sources to help the business team understand net spending versus revenue for PepsiCo products.
- Developed a number of dbt models and orchestrated pipelines using Airflow. Optimized DAGs and models two times to improve the overall execution time.
- Designed final reporting tables in Snowflake. Played an important role in the data vault architecture of initial tables and evolved models over time to satisfy business needs.
Data Engineer Consultant
Nexuus LLC
- Delivered transaction monitoring data architecture to scale for billions of transactions for a financial product company.
- Suggested best practices to scale existing infrastructure utilizing Azure SQL Databases.
- Recommended long-term architecture utilizing Azure Cosmos DB and Elasticsearch to scale the solution for bigger clients.
Cosmos DB Engineer via Toptal
The Local Data Company Ltd
- Optimized the Cosmos DB collection and index design to improve throughput by two times.
- Refactored Databricks Spark notebooks to reduce processing time from 12 hours to 45 minutes.
- Handled the Azure Data Factory implementation and converted scikit-learn to Spark ML (distributed ML).
Data Engineer Consultant
IQVIA
- Developed a custom ETL process to convert OMOP data into FHIR R4 resources.
- Designed and developed a common data model aligning with USCDI standards for OMOP and site-mediated EMR files.
- Built a data access layer to fetch FHIR R4 data files from 1upHealth.
- Used Microsoft FHIR Converter service to convert data from STU3 to R4 formats using Liquid templates.
Snowflake Architect
Self-employed
- Handled the migration of an on-prem data warehouse to Snowflake.
- Performed data modeling for new data sources as per dimensional modeling standards.
- Created multiple stored procedures to automate the data flow from different sources in S3 to Snowflake.
- Implemented streams to automatically push data in Snowflake on top of S3 using SQS.
- Reduced Snowflake credit costs by one-third by implementing best practices.
- Migrated Snowflake tasks to Airflow to provide a better orchestration mechanism for data pipelines.
AWS ETL Expert via Toptal
Indigovern LLC
- Designed scalable and robust ETL architecture using different Amazon Web Services.
- Understood the existing Python code and refactored it in PySpark to achieve 50% more performance.
- Implemented a generic connector for fetching details from Zendesk API.
Python/Data Engineer
SupportLogic
- Implemented a generic CRM importer in Python to cater to schema variance from different CRMs.
- Explored CRM data models for Zendesk, Salesforce, ServiceNow, and Dynamics and implemented metadata-driven importer to adhere to a common schema.
- Used Fivetran connectors for different CRMs to pull data into the staging area.
- Designed a data warehouse model and implemented best practices to optimize processing and cost for Google BigQuery.
- Orchestrated different pipelines using Airflow and optimized overall pipeline execution by two times.
- Implemented different dbt models for transformations.
Freelance Database Specialist
CartHook, Inc.
- Designed and implemented a strategy for character encoding changes in MySQL, all without downtime.
- Evaluated a one-way replica feature of Aurora RDS replica for zero downtime.
- Generated a script for modifications of a large number of tables to increase turnaround time.
- Prepared a dynamic script for verification of content before and after migration.
- Suggested best practices for a MySQL table design for better performance.
- Handled the migration activity from end-to-end in the staging and production environments.
DBA Lead | Database Architect
NICE Ltd
- Evaluated different NoSQL databases and selected them based on project requirements.
- Created a multitenant and scalable schema design using MySQL and Aurora RDS.
- Architected and implemented a data lake using Spark, Hive, and EMR Hadoop.
- Designed and implemented Redshift DW as a central data store.
- Created multiple PySpark Jobs in AWS Glue to move data from MySQL RDS to Redshift.
- Analyzed and gained insights using Sample POC from Neilsen Retail Scanner data and consumed the same in AWS Quicksight.
- Set up and managed MongoDB Clusters in AWS EC2. MongoDB Data Model Design(Embedded V/S Separate Collection Approach) and performance tuning in MongoDB.
- Used the Aggregation framework for analytics queries and migrated MongoDB Clusters from AWS EC2 to Atlas.
Business Intelligence (BI) Lead
Cognizant
- Led a team of four individuals to implement different BI solutions for a healthcare's core systems, specifically implementing a central DW and SSAS cube.
- Built SSRS reporting solutions for different clients.
- Designed and prototyped scorecard and dashboard management reporting systems for claims turnaround time and processor productivity reports.
- Implemented reconciliation reports to compare data across different source systems—resulting in significant FTE savings and increased SLA.
- Managed the smooth transition from SSRS 2005 to SSRS 2014 reports and SSRS to PowerBI for multiple clients.
- Designed the packages to extract data from SQL and Sybase database, flat files, and then loaded into a SQL server database.
- Created a relational database design for a claims-and-financial data warehouse. With the help of ETL packages, the data gets loaded into a centralized data warehouse.
- Made different measure groups and dimensions; also implemented MDX scripts for several reports.
- Implemented an ad-hoc reporting solution with the help of SSAS for the finance data warehouse.
- Developed and designed a data warehouse and cube and implemented an ad-hoc reporting solution.
Database Developer | Database Administrator (DBA)
Persistent Systems Limited
- Gained significant hands-on experience in database schema design and complex stored procedures. Was also exposed to different BI development tools and DW development.
- Designed and developed more than 50+ tables; all the tables were indexed and tuned, then de-normalized when necessary to improve performance.
- Developed more than 100 stored procedures complete with parameters, RETURN values, complex multi-table JOINs and cursors.
- Performance-tested, troubleshot, and optimized using SQL profiler, execution plans, and DMVs.
- Implemented database mirroring, log shipping, and transaction replication as a high availability solution for different customers as per requirements specified in SLA.
- Designed a collections schema in MongoDB for unstructured data from social networks; also automated the data flow process for real-time data.
- Performed query tuning for reports developed in SQL server to reduce the response time by 60% in some cases.
- Designed and developed an engagement analysis schema on top of an existing framework.
- Wrote analysis reports—using the open source reporting tool JasperSoft—to provide accurate reports about activities.
Experience
Online Shopping Platform Database Design and Development
• Designed and developed engagement analysis schema on top of an existing framework.
• Wrote analysis reports—using the open source reporting tool, JasperSoft—to provide accurate reports about activities.
Power BI Reporting Solution
• Designed a data model for Power BI reports.
• Deployed Power BI reports.
Data Lake Architecture Design and Spark Implementation
• Evaluated ORC vs Parquet performance and made the decision to store data in the ORC format.
• Developed Spark PySpark jobs to move data from different files.
ETL Solution using SSIS Packages
• Designed ETL packages with the SSIS framework. It deals with different data sources (SQL server, flat files, and Excel) and loads the data into target data sources by performing different kinds of transformations using SQL server integration services
• Migrated number of DTS packages (SQL 2000) to SQL Server 2012 SSIS database.
SSRS Reporting Solution for Different Clients
• Implemented reconciliation reports to compare data across different source systems. It resulted in significant FTE savings and increased SLA.
• Managed a smooth transition from SSRS 2005 to SSRS 2014 reports and SSRS to PowerBI for multiple clients.
Data Warehouse and SSAS Multidimensional/Tabular Model Design and Development
• Designed different measure groups and dimensions.
• Implemented MDX scripts for a number of reports.
• Implemented ad-hoc reporting solution with the help of SSAS for the finance DW.
Database Design and Development for Gamification Projects
• Developed more than 100 stored procedures with parameters, RETURN values, complex multi-table JOINs and cursors.
• Performance tested, troubleshot, and optimized (using SQL Profiler, execution plans, and DMVs).
Database High Availability Implementation and Production Support
1) Log shipping
2) Database mirroring
3) Transaction replication
4) Always on
Education
Bachelor of Technology Degree in Computer Science and Engineering
Walchand College of Engineering, Sangli - Sangli, Maharashtra, India
Higher Secondary School Certificate in Basics
Pune University - Pune, Maharashtra, India
Secondary School Certificate in Basics
Pune University - Pune, Maharashtra, India
Certifications
Apache Airflow Fundamentals
Astronomer
SnowPro Core
Snowflake
Microsoft Certified Azure Data Scientist
Microsoft
Exam DP-900: Microsoft Azure Data Fundamentals
Microsoft
Exam AZ-900: Microsoft Azure Fundamentals
Microsoft
Microsoft Azure Data Engineer Associate
Microsoft
Databricks Certified Apache Spark 3.0 Developer
Databricks
AWS Certified Solutions Architect Associate
AWS
Microsoft Certified Technology Specialist SQL Server 2008 Business Intelligence and Development
Microsoft
Microsoft Certified Technology Specialist SQL Server 2008 Implementation and Maintenance
Microsoft
Microsoft Certified Technology Specialist SQL Server 2008 Database Development
Microsoft
Microsoft Certified SQL Server Associate
Microsoft
Skills
Libraries/APIs
PySpark, Pandas, PubSubJS, Zendesk API
Tools
MySQL Workbench, SQL Server BI, Amazon Elastic MapReduce (EMR), AWS Glue, Spark SQL, iReport Designer, Microsoft Power BI, Azure IoT Suite, MongoDB Atlas, Tableau, SSAS, Azure Machine Learning, Apache Airflow, Google Analytics, Azure ML Studio, Power Query, Amazon Simple Queue Service (SQS), AWS Step Functions
Languages
SQL, T-SQL (Transact-SQL), Snowflake, Python, Python 3
Frameworks
Spark, Data Fabric, Presto, Data Lakehouse
Paradigms
ETL, Dimensional Modeling, Business Intelligence (BI), Fast Healthcare Interoperability Resources (FHIR), HL7 FHIR Standard
Platforms
Amazon Web Services (AWS), Azure, Azure Synapse, Azure SQL Data Warehouse, Databricks, Dedicated SQL Pool (formerly SQL DW), Azure Event Hubs, Jupyter Notebook, Windows, Linux, Percona, Google Cloud Platform (GCP), Salesforce, Azure Synapse Analytics, Azure Functions
Storage
Azure Cosmos DB, Microsoft SQL Server, SQL Server Management Studio (SSMS), SQL Server Integration Services (SSIS), SQL Server Analysis Services (SSAS), SQL Server DBA, SQL Server 2014, Database Backups, Database Modeling, SQL Server Reporting Services (SSRS), Amazon DynamoDB, Data Lakes, Azure SQL Databases, Amazon S3 (AWS S3), Databases, Data Pipelines, MySQL, MongoDB, Amazon Aurora, Database Migration, Database Replication, Elasticsearch, Redshift, SSAS Tabular, Database Administration (DBA), SQL Server Data Tools (SSDT), Apache Hive, PostgreSQL, Google Cloud Storage
Other
Data Architecture, Data Engineering, Azure Data Lake, Azure Data Lake Analytics, MSBI, Big Data, Query Optimization, Log Shipping, Performance Optimization, Azure Data Factory (ADF), Azure Databricks, APIs, Data Warehousing, Schemas, Dashboards, Data Analysis, Data Visualization, Cloud, Business Continuity & Disaster Recovery (BCDR), Multidimensional Expressions (MDX), DAX, Always On, Azure Stream Analytics, Data Vaults, Retrieval-augmented Generation (RAG), OpenAI, Workbench, Apache Kylin, Data Science, Machine Learning, Big Data Architecture, Data Analytics, Data Build Tool (dbt), Software Development, Electronic Medical Records (EMR), Google BigQuery, ServiceNow, Fivetran, Electronic Health Records (EHR), Data Modeling, Data Governance, Pipelines
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring