Darshan Singh
Verified Expert in Engineering
Data Engineering Developer
Berlin, Germany
Toptal member since June 18, 2020
For the past 16 years, Darshan has worked as a database developer, architect, and performance-tuning expert using MS SQL, PostgreSQL, Redshift, and Snowflake. Since 2012, he’s been focusing on data and big data engineering projects using Spark, Hadoop, NoSQL, Python, Java 8, Kafka, and AWS, mainly building traditional ETL pipelines (Unix, Python, and SQL), big data ETL pipelines (Python, Spark, Hadoop, and HDFS), and real-time ETL pipelines (Kafka).
Portfolio
Experience
Availability
Preferred Environment
Linux, OS X, Windows
The most amazing...
...project I scaled was an eight-terabyte SQL server application from 4,000 requests per second to around 70,000 requests per second.
Work Experience
Data Engineer
BCG - Gamma
- Designed and developed the data model for Snowflake for a greenfield project. The data has lots of geospatial requirements like finding distances, covering by area, etc.
- Created API endpoints for uploading and returning data.
- Used Snowpark API while loading and querying the data and Snowpark Python to transform and precalculate the formulas.
- Created the entire history of the major tables. Separated the core database from the local and sandbox databases so that each user can have their own data and use the core one as well.
- Used pandas for geospatial analysis, which was impossible with Snowflake due to the size of each geometry.
Senior Data Engineer
Eon
- Designed and architected Snowflake green data warehouse.
- Designed tables, SQL, and stored procedures using Snowpark.
- Integrated data from various sources to Snowflake like MySQL, Azure Blobs, etc.
- Created a data anonymization service in Snowflake, Python Pandas, and Polars.
- Used Apache Airflow to orchestrate the ETL pipelines.
Senior Data Engineer
Pfizer
- Developed the ETL to process a large amount of data in Redshift, then moved the results to Postgres. Used Python and pandas for ETL and executed using Apache Airflow.
- Built the ETL to move and process PostgreSQL data to the Neo4j graph using Python and pandas and executed using Apache Airflow.
- Created Cypher queries to get the genealogy of materials with high efficiency.
- Engineered advanced SQL in Redshift to process the data with speed and accuracy.
Senior Data Engineer
BCG
- Helped the BCG marketing team move their existing SQL Server to AWS Cloud and Snowflake. Developed Snowflake JavaScript stored procedures and UDF alongside designing tables.
- Contributed to the design and developed the whole ETL flow to bring data from various sources into Snowflake using Fivetran, AWS Lambda, Python, AWS Glue, and so on.
- Collaborated on the design and development of Snowflake tables and other objects to efficiently implement ETL and the reporting requirements for the team.
- Involved in developing efficient and optimized queries used by the reporting team. Tableau calls these queries.
- Contributed to the automation and deployment of various AWS components using AWS SAM and AWS CloudFormation.
- Used Jenkins to automatically deploy Lambda and Glue jobs in various environments.
- Used DBT for all transformations within Snowflake to have all the transformations in Git (one place only), applicable across environments.
- Moved Microsoft Power BI and the reporting stack to a Tableau-based reporting. Worked in designing and developing queries used by these reports in Snowflake.
Senior Data Engineer
Deutsche Börse Group
- Migrated the existing database from on-premises SQL Server to Azure using lift and shift.
- Created the Azure Databricks ETL data pipeline solution to integrate data from various sources and stored the data in Snowflake and Azure Synapse warehouse.
- Wrote Azure Databricks Spark code using Python and Delta Lake technologies.
- Designed and developed the Snowflake database, tables, views, stored procedures, functions, and stages.
- Created the Azure Data Factory pipelines for running ETL.
Data Engineer
Mobilityware
- Developed an AWS data pipeline to execute AWS EMR, which then called an ETL process, which used Apache Spark (PySpark) to process data from S3 and finally loaded the processed data into S3.
- Designed and developed AWS Redshift data warehouse to handle terabytes of data which was then used by the data analyst team for dashboards.
- Designed and tuned Redshift queries for efficiency.
- Designed the tables using proper distribution keys and sort keys for efficiency.
- Built a solution in Python and AWS Athena for the GDPR based on users' requests to delete or return their data—the data was either deleted or returned to the users.
- Developed a solution to find all the data for the users in S3 files using AWS Athena, then read the files, deleted the users' data, and rewrote the files (because all of the users' raw data was stored in S3).
- Enabled the return of user data using AWS Athena queries and made sure that the data stored in Redshift was deleted or returned to the end-users, which made it much easier as the data was much more structured.
- Designed and developed Tableau dashboards based on Redshift data for various KPIs.
- Implemented real-time stream processing using Apache Kafka and AWS Kinesis for incoming data from various IOT devices. Finally saved the processed data in AWS S3 and created Athena tables for further querying and processing.
Database Designer and Developer
Transparency AI
- Designed and developed a database in PostgreSQL to collect the data from different car dealerships. The data was in XML, CSV, and JSON format.
- Conceptualized and built a Python ETL process to transform the data into XML, CSV, and JSON formats as required per data model.
- Wrote efficient PostgreSQL SQL, PL/pgSQL code, and other functions for reporting and loading data.
- Built a proof of concept (POC) and developed dashboards using Power BI and Tableau to find which one suits better.
Senior Associate
JP Morgan Chase UK
- Designed and developed columnar database systems using Sybase IQ for better performance.
- Designed and developed Apache spark solution for handling complex business transformation for the profit and loss benefits where we had to generate the reports with almost 1,000 columns.
- Used HDFS and parquet files to handle schema-less data with some rows having 100 columns and others with 1,000 columns for high performance.
- Designed and developed an Apache Kafka solution for real-time processing of events and thus provide real-time updates on the profit and loss dashboards to various analysts.
- Deployed and administrated Apache Hadoop, HDFS, Spark, and Kafka.
- Used SQL for Sybase ASE and Sybase IQ related work.
- Used Java, Python and Spark SQL for the big data work.
Principal Consultant
Genpact Singapore
- Designed, modeled, and architected a new database system using Sybase ASE, MS SQL Server for scalability and performance.
- Optimized and performance-tuned existing and new procedures using SQL DMVs Sybase Monitoring Tables to reduce queries that ran for hours to mere minutes.
- Used SQL server trace, profiler, and extended events to troubleshoot the performance root causes (analysis and fixes).
- Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase as well SQL server.
- Conceptualized and implemented HA clustering and DR using database mirroring.
- Partitioned the database table for maintenance and performance tuning.
Database Architect
McAfee India Pvt Ltd.
- Troubleshot the performance root causes by analyzing and implementing fixes. Used SQL server trace and profiler and extended events.
- Designed and developed tables, stored procedures,, and indexes for new development and enhancement.
- Worked on test-driven development and development using the agile methodology.
- Worked on data modeling for changes and new development.
- Monitored production server performance using DMVs and Perfmon; depending on the requirements for tuning the system, also the application and existing queries and objects.
- Designed, tested, and tuned extensively on big data and NoSQL technologies like Cassandra and Hadoop, hive and pig stack to test different scenarios using Python to migrate the existing application onto a big data platform.
- Designed and implemented HA clustering and DR using database mirroring.
- Partitioned a database table for maintenance and performance tuning.
Associate
JP Morgan Chase India
- Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase.
- Used Sybase ASE’s XML to show plans, trace flags, and abstract query plans/statistics for performance root cause analysis.
- Optimized and query performance-tuned existing and new procedures using monitoring tables to reduce queries running time by up to 2 to 30 times.
- Worked on data modeling for changes and new development.
- Developed SQL and T-SQL code using Sybase.
- Developed Unix shell, Python, and Perl scripts for ETL and data analytics.
- Partitioned database tables for maintenance and performance tuning.
Experience
Redesign and Architecture of a Compliant Web Database System
Other Roles and Responsibilities:
• Designed, modeled, and architected new database system using Sybase ASE, MS SQL server, and Oracle for scalability and performance.
• Optimized and performance-tuned existing and new procedures using SQL DMVs Sybase Monitoring Tables and Oracle performance views to reduce queries running in hours to minutes.
• Used SQL server trace, profiler, and extended events to troubleshoot the performance root causes (using analysis and fixes).
• Used Sybase ASE’s XML show plans, trace flags, abstract query plans and statistics for performance root cause analysis.
• Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase as well as the SQL server.
Scaling the McAfee Mobile Security App Database System
Other Roles and Responsibilities:
• Used SQL Server trace, profiler, and extended events to troubleshoot the performance root cause (using analysis and fixes).
• Designed and developed tables, stored procedures, indexes for new development and enhancement.
• Worked on test-driven development and development using Agile.
• Implemented data modeling for changes and new development.
• Monitored the production server performance using DMVs and Perfmon; depending on requirements for tuning the system, also the application and existing queries/objects.
• Designed, tested, and tuned extensively on big data and NoSQL technologies like Cassandra and Hadoop, Hive, and Pig stack to test different scenarios using Python to migrate the existing application onto a big data platform.
Sybase Database System Design and Development for the PB Credit System
Other Roles and Responsibilities:
• Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase.
• Used Sybase ASE’s XML show plans, trace flags, abstract query plans, and statistics for a performance root cause analysis.
• Optimized and query performance-tuned existing and new procedures using monitoring tables to reduce queries running time by up to 2 to 30 times.
• Developed Unix shell, Python, and Perl scripts for ETL and data analytics.
• Implemented data modeling for changes and new development.
• Wrote SQL and T-SQL code using Sybase.
Credit Suisse Swap Database System
Other Roles and Responsibilities:
• Created new stored procedure, functions, triggers, and views in a SQL Server.
• Optimized and query performance-tuned existing and new procedures using DMVs.
• ETL development using SSIS 2008 and report development using SSRS 2008.
• Using SQL server trace, profiler, and extended events to troubleshoot the performance root cause analysis and fixes.
• Developed Unix Shell, Python, and Perl scripts.
• Conducted an impact analysis for the migration of SQL server 2000 to the 2008 version.
• Defined the capacity planning and designed the migration to SQL server 2008 for performance improvement.
• Changed DTS packages to SSIS packages as well changed SQL and T-SQL code to be compatible with SQL Server 2008.
Real-time Analytics Platform
Data Warehouse and Data Lake for Transparency
I cleaned up and enriched the data that was to be moved to Redshift and where we will run our reporting queries. I have used PySpark and Python.
Education
Master of Science Degree in Data Science
Goldsmiths, University of London - London, UK
Skills
Libraries/APIs
Spark Streaming, Snowpark, Pandas
Tools
AWS CloudFormation, Amazon CloudWatch, Amazon Athena, Spark SQL, Amazon Elastic MapReduce (EMR), Kafka Streams, Azure Logic Apps, Amazon Kinesis Data Firehose, AWS Glue, Terraform, Microsoft Access, Flink, Git, Tableau, Tableau Desktop Pro, Microsoft Power BI, Apache Airflow
Languages
Snowflake, Python, T-SQL (Transact-SQL), SQL, Python 3, Java 8, Java, R, Cypher
Frameworks
Presto, Hadoop, Apache Spark
Paradigms
ETL, Database Design, ETL Implementation & Design, Agile, Scrum
Platforms
Databricks, AWS Lambda, Spark Core, Apache Kafka, Azure Functions, Linux, Azure, Kubernetes, Azure Event Hubs, Azure Synapse, Windows, OS X, Amazon Web Services (AWS)
Storage
HDFS, Apache Hive, Redshift, Amazon S3 (AWS S3), PostgreSQL, Sybase, Microsoft SQL Server, Database Modeling, Azure SQL, Data Pipelines, AWS Data Pipeline Service, Azure Blobs, Neo4j, SQL Server 2016
Industry Expertise
Healthcare
Other
Big Data Architecture, Data Architecture, Data Engineering, ETL Development, Data Build Tool (dbt), Data Warehousing, Azure Data Lake, Azure Data Lake Analytics, Azure Databricks, Fivetran, Data Science, Azure Data Factory, Azure Event Grid, Amazon Kinesis, Machine Learning
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring