Darshan is available for hire

Darshan Singh

Verified Expert in Engineering

Data Engineering Developer

Location

Berlin, Germany

Toptal Member Since

June 18, 2020

For the past 16 years, Darshan has worked as a database developer, architect, and performance-tuning expert using MS SQL, PostgreSQL, Redshift, and Snowflake. Since 2012, he’s been focusing on data and big data engineering projects using Spark, Hadoop, NoSQL, Python, Java 8, Kafka, and AWS, mainly building traditional ETL pipelines (Unix, Python, and SQL), big data ETL pipelines (Python, Spark, Hadoop, and HDFS), and real-time ETL pipelines (Kafka).

Data Engineering Data Warehousing Big Data Architecture Azure Data Lake Azure Data Lake Analytics ETL T-SQL (Transact-SQL)Database Design SQL Python 3 Sybase Microsoft SQL Server Python Data Pipelines Amazon CloudWatch HDFS AWS Athena

Portfolio

BCG - Gamma

Data Engineering, Snowflake, Pandas, Python 3, SQL, ETL, Snowpark

Eon

Snowflake, SQL, Python 3, Apache Airflow

Pfizer

SQL, Neo4j, Cypher, Redshift, PostgreSQL, Python, Pandas...

Experience

ETL - 15 years SQL - 15 years Data Engineering - 15 years Python 3 - 15 years Database Modeling - 12 years Apache Kafka - 3 years Snowflake - 3 years Apache Spark - 3 years

Availability

Part-time

Preferred Environment

Linux, OS X, Windows

The most amazing...

...project I scaled was an eight-terabyte SQL server application from 4,000 requests per second to around 70,000 requests per second.

Work Experience

Data Engineer

2022 - PRESENT

BCG - Gamma

Designed and developed the data model for Snowflake for a greenfield project. The data has lots of geospatial requirements like finding distances, covering by area, etc.
Created API endpoints for uploading and returning data.
Used Snowpark API while loading and querying the data and Snowpark Python to transform and precalculate the formulas.
Created the entire history of the major tables. Separated the core database from the local and sandbox databases so that each user can have their own data and use the core one as well.
Used pandas for geospatial analysis, which was impossible with Snowflake due to the size of each geometry.

Technologies: Data Engineering, Snowflake, Pandas, Python 3, SQL, ETL, Snowpark

Senior Data Engineer

2022 - 2023

Eon

Designed and architected Snowflake green data warehouse.
Designed tables, SQL, and stored procedures using Snowpark.
Integrated data from various sources to Snowflake like MySQL, Azure Blobs, etc.
Created a data anonymization service in Snowflake, Python Pandas, and Polars.
Used Apache Airflow to orchestrate the ETL pipelines.

Technologies: Snowflake, SQL, Python 3, Apache Airflow

Senior Data Engineer

2021 - 2022

Pfizer

Developed the ETL to process a large amount of data in Redshift, then moved the results to Postgres. Used Python and pandas for ETL and executed using Apache Airflow.
Built the ETL to move and process PostgreSQL data to the Neo4j graph using Python and pandas and executed using Apache Airflow.
Created Cypher queries to get the genealogy of materials with high efficiency.
Engineered advanced SQL in Redshift to process the data with speed and accuracy.

Technologies: SQL, Neo4j, Cypher, Redshift, PostgreSQL, Python, Pandas, Amazon Web Services (AWS)

Senior Data Engineer

2020 - 2021

BCG

Helped the BCG marketing team move their existing SQL Server to AWS Cloud and Snowflake. Developed Snowflake JavaScript stored procedures and UDF alongside designing tables.
Contributed to the design and developed the whole ETL flow to bring data from various sources into Snowflake using Fivetran, AWS Lambda, Python, AWS Glue, and so on.
Collaborated on the design and development of Snowflake tables and other objects to efficiently implement ETL and the reporting requirements for the team.
Involved in developing efficient and optimized queries used by the reporting team. Tableau calls these queries.
Contributed to the automation and deployment of various AWS components using AWS SAM and AWS CloudFormation.
Used Jenkins to automatically deploy Lambda and Glue jobs in various environments.
Used DBT for all transformations within Snowflake to have all the transformations in Git (one place only), applicable across environments.
Moved Microsoft Power BI and the reporting stack to a Tableau-based reporting. Worked in designing and developing queries used by these reports in Snowflake.

Technologies: Amazon Web Services (AWS), Git, Fivetran, AWS Glue, Tableau, Amazon S3 (AWS S3), Microsoft SQL Server, Python, AWS Lambda, Snowflake, Data Build Tool (dbt), Azure SQL, Microsoft Power BI

Senior Data Engineer

2018 - 2021

Deutsche Börse Group

Migrated the existing database from on-premises SQL Server to Azure using lift and shift.
Created the Azure Databricks ETL data pipeline solution to integrate data from various sources and stored the data in Snowflake and Azure Synapse warehouse.
Wrote Azure Databricks Spark code using Python and Delta Lake technologies.
Designed and developed the Snowflake database, tables, views, stored procedures, functions, and stages.
Created the Azure Data Factory pipelines for running ETL.

Technologies: Azure Synapse, Azure SQL, Azure, Azure Data Factory, Snowflake, SQL, SQL Server 2016, Azure Databricks, Apache Spark

Data Engineer

2017 - 2019

Mobilityware

Developed an AWS data pipeline to execute AWS EMR, which then called an ETL process, which used Apache Spark (PySpark) to process data from S3 and finally loaded the processed data into S3.
Designed and developed AWS Redshift data warehouse to handle terabytes of data which was then used by the data analyst team for dashboards.
Designed and tuned Redshift queries for efficiency.
Designed the tables using proper distribution keys and sort keys for efficiency.
Built a solution in Python and AWS Athena for the GDPR based on users' requests to delete or return their data—the data was either deleted or returned to the users.
Developed a solution to find all the data for the users in S3 files using AWS Athena, then read the files, deleted the users' data, and rewrote the files (because all of the users' raw data was stored in S3).
Enabled the return of user data using AWS Athena queries and made sure that the data stored in Redshift was deleted or returned to the end-users, which made it much easier as the data was much more structured.
Designed and developed Tableau dashboards based on Redshift data for various KPIs.
Implemented real-time stream processing using Apache Kafka and AWS Kinesis for incoming data from various IOT devices. Finally saved the processed data in AWS S3 and created Athena tables for further querying and processing.

Technologies: Amazon Web Services (AWS), Apache Kafka, Amazon Elastic MapReduce (EMR), Amazon S3 (AWS S3), Amazon Kinesis, AWS Lambda, AWS Data Pipeline Service, Redshift, Python, Java 8, Flink, Apache Spark

Database Designer and Developer

2017 - 2017

Transparency AI

Designed and developed a database in PostgreSQL to collect the data from different car dealerships. The data was in XML, CSV, and JSON format.
Conceptualized and built a Python ETL process to transform the data into XML, CSV, and JSON formats as required per data model.
Wrote efficient PostgreSQL SQL, PL/pgSQL code, and other functions for reporting and loading data.
Built a proof of concept (POC) and developed dashboards using Power BI and Tableau to find which one suits better.

Technologies: PostgreSQL, Python

Senior Associate

2015 - 2017

JP Morgan Chase UK

Designed and developed columnar database systems using Sybase IQ for better performance.
Designed and developed Apache spark solution for handling complex business transformation for the profit and loss benefits where we had to generate the reports with almost 1,000 columns.
Used HDFS and parquet files to handle schema-less data with some rows having 100 columns and others with 1,000 columns for high performance.
Designed and developed an Apache Kafka solution for real-time processing of events and thus provide real-time updates on the profit and loss dashboards to various analysts.
Deployed and administrated Apache Hadoop, HDFS, Spark, and Kafka.
Used SQL for Sybase ASE and Sybase IQ related work.
Used Java, Python and Spark SQL for the big data work.

Technologies: Java, Python, Apache Kafka, Apache Spark, Hadoop, Sybase, Spark SQL, SQL

Principal Consultant

2014 - 2014

Genpact Singapore

Designed, modeled, and architected a new database system using Sybase ASE, MS SQL Server for scalability and performance.
Optimized and performance-tuned existing and new procedures using SQL DMVs Sybase Monitoring Tables to reduce queries that ran for hours to mere minutes.
Used SQL server trace, profiler, and extended events to troubleshoot the performance root causes (analysis and fixes).
Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase as well SQL server.
Conceptualized and implemented HA clustering and DR using database mirroring.
Partitioned the database table for maintenance and performance tuning.

Technologies: Sybase, Microsoft SQL Server

Database Architect

2012 - 2014

McAfee India Pvt Ltd.

Troubleshot the performance root causes by analyzing and implementing fixes. Used SQL server trace and profiler and extended events.
Designed and developed tables, stored procedures,, and indexes for new development and enhancement.
Worked on test-driven development and development using the agile methodology.
Worked on data modeling for changes and new development.
Monitored production server performance using DMVs and Perfmon; depending on the requirements for tuning the system, also the application and existing queries and objects.
Designed, tested, and tuned extensively on big data and NoSQL technologies like Cassandra and Hadoop, hive and pig stack to test different scenarios using Python to migrate the existing application onto a big data platform.
Designed and implemented HA clustering and DR using database mirroring.
Partitioned a database table for maintenance and performance tuning.

Technologies: Microsoft SQL Server

Associate

2011 - 2012

JP Morgan Chase India

Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase.
Used Sybase ASE’s XML to show plans, trace flags, and abstract query plans/statistics for performance root cause analysis.
Optimized and query performance-tuned existing and new procedures using monitoring tables to reduce queries running time by up to 2 to 30 times.
Worked on data modeling for changes and new development.
Developed SQL and T-SQL code using Sybase.
Developed Unix shell, Python, and Perl scripts for ETL and data analytics.
Partitioned database tables for maintenance and performance tuning.

Technologies: Python, Sybase

Experience

Redesign and Architecture of a Compliant Web Database System

I redesigned the ETL as well as the database system to improve the performance of loading the data. It went from loading in more than eight hours to two hours. I designed the SQL server architecture to be a highly scalable system using various approaches including partitioning and partitioned views.

Other Roles and Responsibilities:
• Designed, modeled, and architected new database system using Sybase ASE, MS SQL server, and Oracle for scalability and performance.
• Optimized and performance-tuned existing and new procedures using SQL DMVs Sybase Monitoring Tables and Oracle performance views to reduce queries running in hours to minutes.
• Used SQL server trace, profiler, and extended events to troubleshoot the performance root causes (using analysis and fixes).
• Used Sybase ASE’s XML show plans, trace flags, abstract query plans and statistics for performance root cause analysis.
• Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase as well as the SQL server.

Scaling the McAfee Mobile Security App Database System

I worked on the McAfee Mobile Security application which used by various mobile devices for security and other activities. The data was being stored in a SQL Server 2008/2012. I worked on scaling the system from 4,000 transactions per minute to around 70,000 transactions per minute by removing various performance issues. I redesigned the HA and DR as well as the reporting systems using clustering, and database mirroring as well as replication.

Other Roles and Responsibilities:
• Used SQL Server trace, profiler, and extended events to troubleshoot the performance root cause (using analysis and fixes).
• Designed and developed tables, stored procedures, indexes for new development and enhancement.
• Worked on test-driven development and development using Agile.
• Implemented data modeling for changes and new development.
• Monitored the production server performance using DMVs and Perfmon; depending on requirements for tuning the system, also the application and existing queries/objects.
• Designed, tested, and tuned extensively on big data and NoSQL technologies like Cassandra and Hadoop, Hive, and Pig stack to test different scenarios using Python to migrate the existing application onto a big data platform.

Sybase Database System Design and Development for the PB Credit System

I worked as a Sybase database designer as well as a developer for J.P. Morgan's PB credit application. I mainly worked with business analysts and gathered the requirements for new changes and reports. With this material, I then redesigned the existing database system as well as developed using SQL, TSQL, stored procedures views, and more.

Other Roles and Responsibilities:
• Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase.
• Used Sybase ASE’s XML show plans, trace flags, abstract query plans, and statistics for a performance root cause analysis.
• Optimized and query performance-tuned existing and new procedures using monitoring tables to reduce queries running time by up to 2 to 30 times.
• Developed Unix shell, Python, and Perl scripts for ETL and data analytics.
• Implemented data modeling for changes and new development.
• Wrote SQL and T-SQL code using Sybase.

Credit Suisse Swap Database System

I worked as the database designer and developer and administrator to the Credit Suisse Swap database system—I redesigned the database for new reports and migrated some of the reports from a SQL server to Sybase IQ. I also migrated the SQL server from 2000 to 2008. The main highlight for me was redesigning a C# FIFO trade matching in a SQL server for ad-hoc rematching.

Other Roles and Responsibilities:
• Created new stored procedure, functions, triggers, and views in a SQL Server.
• Optimized and query performance-tuned existing and new procedures using DMVs.
• ETL development using SSIS 2008 and report development using SSRS 2008.
• Using SQL server trace, profiler, and extended events to troubleshoot the performance root cause analysis and fixes.
• Developed Unix Shell, Python, and Perl scripts.
• Conducted an impact analysis for the migration of SQL server 2000 to the 2008 version.
• Defined the capacity planning and designed the migration to SQL server 2008 for performance improvement.
• Changed DTS packages to SSIS packages as well changed SQL and T-SQL code to be compatible with SQL Server 2008.

Real-time Analytics Platform

I have designed data pipelines for risk and PNL teams for an investment bank using technologies such as SQL, Sybase, Python, HDFS, Spark, and Kafka. The pipeline uses Kafka to move data from Sybase IQ to HDFS. Then we run a computing-intensive job against the HDFS and again move the output to Kafka and then it is sent to our reporting server in Sybase IQ and HDFS. Real-time data is then directly fed from Kafka to Spark and then processed and sent to reporting servers.

Data Warehouse and Data Lake for Transparency

I designed a data warehouse and data lake for transparency using AWS services like ETL, S3, Lambda, EMR, Redshift, SQL, Python, and Spark. The data is used to land in S3 buckets in CSV files and then a lambda used to launch an EMR Spark job to clean up and process data in parallel.

I cleaned up and enriched the data that was to be moved to Redshift and where we will run our reporting queries. I have used PySpark and Python.

Skills

Languages

Snowflake, Python, T-SQL (Transact-SQL), SQL, Python 3, Java 8, Java, R, Cypher

Frameworks

Presto, Hadoop, Apache Spark

Libraries/APIs

Spark Streaming, Pandas

Tools

AWS CloudFormation, Amazon CloudWatch, Amazon Athena, Spark SQL, Amazon Elastic MapReduce (EMR), Kafka Streams, Azure Logic Apps, AWS Glue, Terraform, Microsoft Access, Flink, Git, Tableau, Tableau Desktop Pro, Microsoft Power BI, Apache Airflow

Paradigms

ETL, Database Design, ETL Implementation & Design, Data Science, Agile, Scrum

Platforms

Databricks, AWS Lambda, Spark Core, Apache Kafka, Azure Functions, Linux, Azure, Kubernetes, Azure Event Hubs, Azure Synapse, Windows, OS X, Amazon Web Services (AWS)

Storage

HDFS, Apache Hive, Redshift, Amazon S3 (AWS S3), PostgreSQL, Sybase, Microsoft SQL Server, Database Modeling, Azure SQL, Data Pipelines, AWS Data Pipeline Service, Azure Blobs, Neo4j, SQL Server 2016

Other

Big Data Architecture, Data Architecture, Data Engineering, ETL Development, Data Build Tool (dbt), Data Warehousing, Azure Data Lake, Azure Data Lake Analytics, Azure Databricks, Snowpark, Fivetran, Amazon Kinesis Data Firehose, Azure Data Factory, Azure Event Grid, Amazon Kinesis, Machine Learning

Industry Expertise

Healthcare

Education

2014 - 2015

Master of Science Degree in Data Science

Goldsmiths, University of London - London, UK

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring