Abishek Duraiswamy, Developer in Toronto, Canada
Abishek is available for hire
Hire Abishek

Abishek Duraiswamy

Software Developer

Toronto, Canada

Toptal member since June 9, 2026

Bio

Abishek is a senior data engineer with over 10 years of experience in designing scalable data pipelines and automating ETL workflows. His expertise spans AWS, Azure, GCP, Snowflake, and Databricks, with notable work at VML MAP and Pelmorex Corp. He successfully led large-scale cloud migrations and architected cost-effective data privacy solutions for enterprise environments.

Portfolio

VML
PySpark, Python, Databricks, Snowflake, AWS Glue, Step Functions, Amazon Athena...
Pelmorex
Encryption, Google Cloud Platform (GCP), AWS Glue, Apache Iceberg...
Experian
Linux, Shell, Python Script, AWS Glue, Amazon CloudWatch...

Experience

  • SQL - 10 years
  • ETL - 10 years
  • Spark - 7 years
  • Python - 7 years
  • AWS IoT - 6 years
  • Google Cloud Platform (GCP) - 4 years
  • Databricks - 2 years
  • Snowflake - 2 years

Preferred Environment

AWS IoT, Azure, Google Cloud Platform (GCP), Snowflake, Databricks, Kibana, Apache Airflow, Terraform, QlikView

The most amazing...

...data pipeline I've built involved migrating legacy systems to Snowflake and Databricks for real-time analytics at scale.

Work Experience

Senior Data Engineer

2023 - PRESENT
VML
  • Spearheaded the development of data products, providing actionable insights to business users.
  • Designed and implemented scalable ETL pipelines using a config-driven framework to process batch and real-time data with PySpark, Python, Databricks, and Snowflake.
  • Developed Spark applications using AWS Glue and utilized Dynamo DB, Step Functions, Athena, Lambda, and S3.
  • Architected modular DBT projects using a multi-layer approach and refactored legacy SQL stored procedures with version-controlled dbt-models.
  • Implemented dbt macros to automate repetitive SQL logic for dynamic environment handling.
  • Engineered pipelines to provide data to graph database teams using Neo4J and Cloud Spanner DBs.
  • Worked closely with the legal team to address GDPR and CCPA requests adhering to strict SLAs.
  • Facilitated teams to build AI agents to help analysts, such as SQL for SFMC, to reduce manual efforts.
Technologies: PySpark, Python, Databricks, Snowflake, AWS Glue, Step Functions, Amazon Athena, AWS Lambda, Amazon S3 (AWS S3), Data Build Tool (dbt), SQL, Neo4j, Data Engineering, Data Pipelines, BigQuery, Composer, Dataproc

Senior Data Engineer

2021 - 2023
Pelmorex
  • Modified the legacy GCP dataflow code to incorporate encryption logic to achieve data privacy.
  • Implemented the privacy framework into the dataflow pipelines, which ingest data from various cloud sources, including AWS S3 and GCS.
  • Designed and architected BigQuery data model efficiently.
  • Explored, developed, and tested encryption and hashing techniques available in GCP.
  • Created ETL pipelines using AWS Glue and persisted the data into Iceberg.
  • Created dashboards that show metrics on pipeline health.
Technologies: Encryption, Google Cloud Platform (GCP), AWS Glue, Apache Iceberg, Data Engineering, Data Pipelines, Python, SQL, BigQuery, Composer, Dataproc

Data Engineer

2020 - 2021
Experian
  • Handled the monthly load of PII data and designed/implemented architecture to use cloud components, moving the data pipeline from on-prem system to AWS Cloud using Glue, CloudWatch, SNS, and Lambda.
  • Added new functions and features to the stand-alone Java application used by analysts to submit and monitor jobs.
  • Supported UAT activities involving the coordination of analysts and engineers from various teams to test new features.
  • Collaborated with DevOps to resolve IAM policy, terraform configuration, and API rate limiting issues, ensuring secure and efficient data infrastructure.
Technologies: Linux, Shell, Python Script, AWS Glue, Amazon CloudWatch, Amazon Simple Notification Service (SNS), AWS Lambda, DevOps, Terraform, Data Engineering, Python, SQL

Data Engineer

2018 - 2020
Bose
  • Created data pipelines to ingest data from various sources including devices, apps, survey data, S3 Buckets and databases such as SAP Hana, MySQL, SQL Server, Amazon Redshift, and PostgreSQL.
  • Collected metrics about data pipelines and stored indexes in Elasticsearch. Created Kibana dashboards that facilitated the quick identification of data loss and anomalies.
  • Partnered with DevOps teams to design and deploy scalable cloud infrastructure using Terraform for seamless integration with data pipelines and platforms.
  • Worked on custom PySpark libraries to push columns from various formats of data to a data governance tool called Collibra.
  • Successfully migrated data pipeline jobs from Oozie to Airflow.
  • Worked as a liaison between data, legal, and governance teams to provide and delete users’ data based on end customer requests, adhering to GDPR and CCPA rules.
Technologies: MySQL, SQL Server, Amazon Redshift, PostgreSQL, Elasticsearch, Kibana, DevOps, Terraform, PySpark, Collibra, Oozie, Apache Airflow, Data Engineering, Data Pipelines, Python, SQL, Composer

Data Engineer

2017 - 2018
American Express Global Business Travel
  • Leveraged Java Spark and streaming APIs to perform necessary transformations and actions on the fly by pulling JSON data from Kafka and persisting it into HDFS in ORC format.
  • Used Spark checkpointing to maintain Kafka offsets for application recovery from failure.
  • Assisted the data science team in productionalizing and modularizing PySpark code while maintaining coding standards.
  • Performed ETL using Pig, Hive, and MapReduce to transform transactional data to a de-normalized form.
  • Leveraged HBase for maintaining the latest set of records in projects wherever necessary, thus eliminating the complex CDC process in Hive.
  • Extended Hive and Pig core functionality by using custom User Defined Functions (UDF) and User Defined Aggregating Functions (UDAF) for Hive and Pig in Java.
  • Developed Java UDTF to parse JSON data into rows using GSON Parser.
  • Developed Java RESTful client applications to pull data from 3rd-party APIs, which were validated and ingested into the data lake.
Technologies: Kafka, HDFS, Apache Pig, Apache Hive, MapReduce, HBase, Data Engineering, SQL

Data Engineer

2015 - 2017
Shell
  • Built a scalable distributed data solution using Hadoop on a 30-node cluster with AWS cloud to run analysis on over 25 terabytes of data.
  • Developed several new MapReduce and Spark programs to analyze and transform the data for insights into customer usage patterns.
  • Performed ETL using Pig, Hive, and MapReduce to transform transactional data into a de-normalized form.
  • Configured periodic incremental imports of data from DB2 into HDFS using Sqoop.
  • Worked extensively with importing metadata into Hive and migrated existing tables for use on Hive and AWS cloud.
Technologies: Hadoop, MapReduce, Apache Pig, Apache Hive, IBM Db2, HDFS, Data Engineering, SQL

Programmer Analyst

2013 - 2015
SoftHQ
  • Extracted data from SQL Server and Oracle 10g using Sqoop Connectors into HDFS for processing with Pig and Hive.
  • Developed simple and complex MapReduce jobs using Hive and Pig.
  • Debugged performance issues by using Oracle hints, explain plans, and table partitioning.
  • Worked with heterogeneous source systems like Oracle and SQL Server.
  • Analyzed existing code. Optimized and enhanced existing procedures and SQL statements for better performance.
  • Provided structured data from Databricks pipelines to the QlikView team enabling seamless dashboard development.
Technologies: SQL Server, Oracle 10g, HDFS, Apache Pig, Apache Hive

Experience

End-to-end Cloud Data Platform Migrations and Pipeline Automation

I architected and executed complex enterprise data migrations, transitioning legacy on-premises systems and traditional databases into secure, highly optimized cloud data platforms. I spearheaded large-scale migrations, transitioning on-premises Cloudera/Hadoop and rigid infrastructure (Linux, IBM DB2) to AWS and Snowflake.

I designed scalable, config-driven ETL/ELT frameworks for batch and real-time data using PySpark, Databricks, and Snowflake. I also refactored legacy stored procedures into modular, version-controlled dbt models using custom macros to automate environment handling. In addition, I built robust security into core platform architectures by implementing advanced encryption logic and managing high-risk PII data loads. I also partnered with legal teams to build automated pipelines, ensuring strict compliance with GDPR and CCPA under tight SLAs.

Education

2010 - 2012

Master's Degree in Computer Engineering

Florida Atlantic University - Florida, USA

2006 - 2010

Bachelor's Degree in Electronics and Communication Engineering

Anna University - Tamil Nādu, India

Skills

Libraries/APIs

PySpark

Tools

BigQuery, Composer, AWS Glue, Amazon Athena, Apache Iceberg, Shell, Amazon CloudWatch, Amazon Simple Notification Service (SNS), Terraform, Kibana, Collibra, Oozie, Apache Airflow, Apache Sqoop

Languages

Python, SQL, Java, Snowflake, Python Script

Frameworks

Spark, Hadoop

Paradigms

ETL, DevOps, MapReduce

Platforms

Google Cloud Platform (GCP), Databricks, AWS Lambda, Linux, Apache Pig, AWS IoT, Azure, QlikView

Storage

Data Pipelines, Amazon S3 (AWS S3), Neo4j, MySQL, PostgreSQL, Elasticsearch, HDFS, Apache Hive, HBase, IBM Db2, Oracle 10g, Amazon DynamoDB

Other

Data Engineering, Dataproc, Step Functions, Data Build Tool (dbt), Encryption, SQL Server, Amazon Redshift, Kafka, GDPR, California Consumer Privacy Act (CCPA), Software Engineering, Electronics, Communication

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring