Data Engineer
2022 - PRESENTBCG - Gamma- Designed and developed the data model for Snowflake for a greenfield project. The data has lots of geospatial requirements like finding distances, covering by area, etc.
- Created API endpoints for uploading and returning data.
- Used Snowpark API while loading and querying the data and Snowpark Python to transform and precalculate the formulas.
- Created the entire history of the major tables. Separated the core database from the local and sandbox databases so that each user can have their own data and use the core one as well.
- Used pandas for geospatial analysis, which was impossible with Snowflake due to the size of each geometry.
Technologies: Data Engineering, Snowflake, Pandas, Python 3, SQL, ETL, SnowparkSenior Data Engineer
2021 - 2022Pfizer- Developed the ETL to process a large amount of data in Redshift, then moved the results to Postgres. Used Python and pandas for ETL and executed using Apache Airflow.
- Built the ETL to move and process PostgreSQL data to the Neo4j graph using Python and pandas and executed using Apache Airflow.
- Created Cypher queries to get the genealogy of materials with high efficiency.
- Engineered advanced SQL in Redshift to process the data with speed and accuracy.
Technologies: SQL, Neo4j, Cypher, Redshift, PostgreSQL, Python, Pandas, Amazon Web Services (AWS)Senior Data Engineer
2020 - 2021BCG- Helped the BCG marketing team move their existing SQL Server to AWS Cloud and Snowflake. Developed Snowflake JavaScript stored procedures and UDF alongside designing tables.
- Contributed to the design and developed the whole ETL flow to bring data from various sources into Snowflake using Fivetran, AWS Lambda, Python, AWS Glue, and so on.
- Collaborated on the design and development of Snowflake tables and other objects to efficiently implement ETL and the reporting requirements for the team.
- Involved in developing efficient and optimized queries used by the reporting team. Tableau calls these queries.
- Contributed to the automation and deployment of various AWS components using AWS SAM and AWS CloudFormation.
- Used Jenkins to automatically deploy Lambda and Glue jobs in various environments.
- Used DBT for all transformations within Snowflake to have all the transformations in Git (one place only), applicable across environments.
- Moved Microsoft Power BI and the reporting stack to a Tableau-based reporting. Worked in designing and developing queries used by these reports in Snowflake.
Technologies: Amazon Web Services (AWS), Git, Fivetran, AWS Glue, Tableau, Amazon S3 (AWS S3), Microsoft SQL Server, Python, AWS Lambda, Snowflake, Data Build Tool (dbt), Azure SQL, Microsoft Power BISenior Data Engineer
2019 - 2021Deutsche Börse Group- Migrated the existing database from on-premises SQL Server to Azure using lift and shift.
- Created the Azure Databricks solution to integrate data from various sources and stored the data in Snowflake and Azure Synapse warehouse.
- Wrote Azure Databricks spark code using Python and Delta Lake technologies.
- Designed and developed the Snowflake database, tables, views, stored procedures, functions, and stages.
- Created the Azure Data Factory pipelines for running ETL.
Technologies: Azure Synapse, Azure SQL, Azure, Azure Data Factory, Snowflake, SQL, SQL Server 2016, Azure Databricks, Apache SparkData Engineer
2017 - 2019Mobilityware- Developed an AWS data pipeline to execute AWS EMR which in turn called an ETL process which in turn used a Flink batch mode to process data from S3 and finally loaded the processed data into S3; this was an hourly job.
- Designed and developed AWS Redshift data warehouse to handle terabytes of data which was then used by the data analyst team for dashboards.
- Designed and tuned Redshift queries for efficiency to run again Redshift.
- Designed the tables using proper distribution key and sort keys for efficiency.
- Built a solution in Python and AWS Athena for the GDPR that was based on a user's requests to delete their data or return their data—the data was either deleted or return back to the users.
- Developed a solution to find all the data for the users in S3 files using AWS Athena and then read the files and delete the user's data and rewrite the files (because all of the user's raw data was stored in S3).
- Enabled the return of user data using AWS Athena queries and made sure that the data stored in Redshift was deleted or returned to the end-users which made it much easier as the data was much more structured.
- Designed and developed Tableau dashboards based on redshift data for various KPIs.
- Implemented real-time stream processing using Apache Kafka and AWS Kinesis for incoming data from various IOT devices. Finally saved the processed data in AWS S3 and created Athena tables for further querying and processing.
Technologies: Amazon Web Services (AWS), Apache Kafka, AWS EMR, Amazon S3 (AWS S3), AWS Kinesis, AWS Lambda, AWS Data Pipeline Service, Redshift, Python, Java 8, FlinkDatabase Designer and Developer
2017 - 2017Transparency AI- Designed and developed a database in PostgreSQL to collect the data from different car dealerships. The data was in XML, CSV, and JSON format.
- Conceptualized and built a Python ETL process to transform the data into XML, CSV, and JSON formats as required per data model.
- Wrote efficient PostgreSQL SQL, PL/pgSQL code, and other functions for reporting and loading data.
- Built a proof of concept (POC) and developed dashboards using Power BI and Tableau to find which one suits better.
Technologies: PostgreSQL, PythonSenior Associate
2015 - 2017JP Morgan Chase UK- Designed and developed columnar database systems using Sybase IQ for better performance.
- Designed and developed Apache spark solution for handling complex business transformation for the profit and loss benefits where we had to generate the reports with almost 1,000 columns.
- Used HDFS and parquet files to handle schema-less data with some rows having 100 columns and others with 1,000 columns for high performance.
- Designed and developed an Apache Kafka solution for real-time processing of events and thus provide real-time updates on the profit and loss dashboards to various analysts.
- Deployed and administrated Apache Hadoop, HDFS, Spark, and Kafka.
- Used SQL for Sybase ASE and Sybase IQ related work.
- Used Java, Python and Spark SQL for the big data work.
Technologies: Java, Python, Apache Kafka, Apache Spark, Hadoop, Sybase, Spark SQL, SQLPrincipal Consultant
2014 - 2014Genpact Singapore- Designed, modeled, and architected a new database system using Sybase ASE, MS SQL Server for scalability and performance.
- Optimized and performance-tuned existing and new procedures using SQL DMVs Sybase Monitoring Tables to reduce queries that ran for hours to mere minutes.
- Used SQL server trace, profiler, and extended events to troubleshoot the performance root causes (analysis and fixes).
- Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase as well SQL server.
- Conceptualized and implemented HA clustering and DR using database mirroring.
- Partitioned the database table for maintenance and performance tuning.
Technologies: Sybase, Microsoft SQL ServerDatabase Architect
2012 - 2014McAfee India Pvt Ltd.- Troubleshot the performance root causes by analyzing and implementing fixes. Used SQL server trace and profiler and extended events.
- Designed and developed tables, stored procedures,, and indexes for new development and enhancement.
- Worked on test-driven development and development using the agile methodology.
- Worked on data modeling for changes and new development.
- Monitored production server performance using DMVs and Perfmon; depending on the requirements for tuning the system, also the application and existing queries and objects.
- Designed, tested, and tuned extensively on big data and NoSQL technologies like Cassandra and Hadoop, hive and pig stack to test different scenarios using Python to migrate the existing application onto a big data platform.
- Designed and implemented HA clustering and DR using database mirroring.
- Partitioned a database table for maintenance and performance tuning.
Technologies: Microsoft SQL ServerAssociate
2011 - 2012JP Morgan Chase India- Designed and developed stored procedures, functions, triggers, views, and indexes in Sybase.
- Used Sybase ASE’s XML to show plans, trace flags, and abstract query plans/statistics for performance root cause analysis.
- Optimized and query performance-tuned existing and new procedures using monitoring tables to reduce queries running time by up to 2 to 30 times.
- Worked on data modeling for changes and new development.
- Developed SQL and T-SQL code using Sybase.
- Developed Unix shell, Python, and Perl scripts for ETL and data analytics.
- Partitioned database tables for maintenance and performance tuning.
Technologies: Python, Sybase