Nidhin Nandhakumar, Developer in Hamilton, ON, Canada
Nidhin is available for hire
Hire Nidhin

Nidhin Nandhakumar

Verified Expert  in Engineering

Big Data Developer

Location
Hamilton, ON, Canada
Toptal Member Since
October 25, 2022

Nidhin is a big data developer constantly looking for exciting and challenging projects. He has worked most of his career in the data domain and comes with a lot of experience working in cloud ecosystems. Nidhin has worked with various cloud migration projects ranging from Amazon Web Services (AWS) to Google Cloud Platform (GCP), GCP to AWS, AWS to Databricks on AWS, etc. His specialty is mainly on cloud migrations and data warehouse designs.

Portfolio

Coursera
Data Engineering, DataHub, Apache Airflow...
FreshBooks
Big Data, Google Cloud, Google Cloud Composer...
Mobivity
Python, Redshift, EMR, ETL Development, MySQL, Amazon Web Services (AWS)...

Experience

Availability

Part-time

Preferred Environment

Python, SQL, Amazon Web Services (AWS), Google Cloud Platform (GCP)

The most amazing...

...thing I've developed is the implementation of the metadata management system in Coursera for organization-wide metadata tracking and management.

Work Experience

Senior Data Engineer

2021 - 2022
Coursera
  • Led design and contributed to implementing a datahub metadata management service for Coursera. Led the ingestion of the back end, data warehouse, scheduling, and bi-dashboard metadata to datahub for easy metadata management.
  • Guided the marketing and LME data projects, including ingesting data from various third-party sources for easy consumption in Looker for analysis and dashboarding. Designed a unified platform for all marketing-spend data.
  • Directed the eventing data ingestion model that provided critical datasets for the machine learning team to analyze and model their models.
  • Coordinated and managed quarterly planning and project prioritization for marketing and LME teams.
Technologies: Data Engineering, DataHub, Apache Airflow, Amazon Elastic Container Service (Amazon ECS), AWS Lambda, Python, Redshift, Terraform, Databricks, ETL Development, MySQL, Data Build Tool (dbt), Amazon Web Services (AWS), Amazon RDS, Data Pipelines, Amazon S3 (AWS S3), ELT, Amazon Kinesis, Amazon Elastic MapReduce (EMR)

Senior Data Engineer

2019 - 2021
FreshBooks
  • Contributed as a core developer on the data engineering team that managed the core infrastructure components of the data team. It involved designing and maintaining the Google Cloud Platform components and designing the overall data infrastructure.
  • Managed the entire data infrastructure with Terraform and implemented user access and policy management.
  • Led the core data-ingestion framework development and deployment for ingesting back-end data through a third-party ingestion platform.
  • Facilitated additional team members and guided junior developers.
  • Optimized performance and reduced cost with improved design in Redshift.
  • Held a core developer's role in designing and building a Gen2 pipeline in GCP for the AWS migration work.
Technologies: Big Data, Google Cloud, Google Cloud Composer, Amazon Elastic Container Service (Amazon ECS), Google BigQuery, Terraform, ETL Development, MySQL, Data Build Tool (dbt), Data Pipelines, ELT, BigQuery

Data Engineer

2017 - 2019
Mobivity
  • Contributed as a core developer of Mobivity's data warehouse, using Python and Spark programming combined with AWS infrastructure.
  • Extracted and processed data from multiple sources, including legacy SQL servers and raw data feeds, using Python and Spark programming.
  • Processed real-time stream of POS transactional data using Python, Spark, and Amazon Kinesis on the AWS platform.
  • Designed data warehouse on Amazon Redshift for cost-efficient and optimal performance with star schema architecture.
  • Processed ETL jobs with Hive, Amazon Kinesis Data Firehose, AWS Lambda, Amazon Simple Notification Service (SNS), and Spark for faster processing of extensive data.
  • Designed data-warehouse job flow using AWS Data Pipeline service.
  • Managed the data warehouse team with agile-based project planning.
  • Reduced operational cost by one-tenth for EMR-based ETL processing.
Technologies: Python, Redshift, EMR, ETL Development, MySQL, Amazon Web Services (AWS), Amazon RDS, Data Pipelines, Amazon S3 (AWS S3), ELT, Amazon Kinesis, Amazon Athena, AWS Glue, Amazon Elastic MapReduce (EMR)

Senior Data Analyst

2014 - 2015
Barclays India
  • Contributed as a key developer in the foreign-exchange business vertical.
  • Developed customer and product-oriented reports using Teradata and shell scripting based on business requirements. It helped higher-level business partners make critical business decisions, resulting in 15 million pounds in revenue for the bank.
  • Created scheduled reports for customer profiling and product impact analysis for business partners using automated shell scripting and Teradata SQL.
  • Held a key developer position in providing Barclays's regulatory reports with accurate and detailed results for the foreign-exchange vertical.
  • Received recognition from the employer for maintaining a good relationship with the clients by handling multiple tasks and meeting deadlines with high quality.
Technologies: Teradata, MySQL, Amazon RDS, Data Pipelines

Senior Software Developer

2011 - 2014
Infosys
  • Created scripts that pulled information from various upstreams and moved to downstream applications after complex processing.
  • Developed custom tools using shell scripting and Teradata queries to make tasks in the team simpler.
  • Analyzed various data sources. Analyzed legacy projects for compatibility with new technologies such as Vertica.
  • Developed intermediary modules or batch programs using Autosys as an automated scheduler, which acted as a data processing unit for further downstream.
  • Contributed as a core developer in converting the Teradata environment of various initiatives to another platform (Vertica), which required extensive planning, designing, and impact testing.
  • Built, tested, and implemented singlehandedly various critical modules for the customer profiling system within tight deadlines.
Technologies: Teradata, Vertica, Shell

DataHub Deployment

https://datahubproject.io/
Coursera wanted to create a centralized location for storing and managing all the metadata for various data assets. These data assets included:
• data warehouse metadata
• back-end database metadata
• job scheduling and automation metadata
• bi-platform metadata
• and various others, including SFMC, SFDc, etc.

I led this project and implemented an open-source solution called DataHub (formally developed by LinkedIn), which allowed Coursera to store and manage the above metadata components. Coursera's employees could log in through a two-factor authentication model, then browse and read the metadata information for all their data assets.

Some of the use cases included:
• understanding what each data-warehouse table contained, their column names, data types, description, tagging, domain categorization, documentation, etc.
• understanding the downstream and upstream impact of each table
• understanding the lineage of each table and airflow jobs
• understanding which dashboards are derived from what table in their data warehouse
• understanding each table's test results and analyzing if the table passes all the quality tests.

Data Warehouse Design in Google Cloud

FreshBooks wanted to develop an end-to-end ELT and reporting system with Google BigQuery. The requirements were to ingest data coming from both streaming and nonstreaming offline databases regularly and to create an analytics reporting layer to be used by data analysts for looker dashboards.

I was a core developer in architecting and modeling the data warehouse system with Google BigQuery involving components such as Cloud Composer for scheduling the pipelines, Pub/Sub and Dataflow for streaming data and dbt, and Python for transforming and creating final reporting layers.
Also, I spearheaded the Terraform codebase to create and manage infrastructure as code and to organize and maintain it.

Data Warehouse Design in AWS

I worked on building an AWS end-to-end reporting platform from the ground up for one of the biggest educational platforms in the world.

The project involved organizing and ingesting data from various source systems, including real-time and batch models, transforming and cleaning data, and creating reporting tables for dashboards and data science models.

I used tools, such as Apache Airflow, for orchestrating the models and transformations, dbt for reporting and transformational logic, and Redshift as the data warehouse. An additional tool stack includes Terraform for IAC, Kinesis for real-time data ingestion, and Lambda for triggers.

Languages

Python, SQL

Tools

Apache Airflow, DataHub, Terraform, Google Cloud Composer, BigQuery, Amazon Elastic Container Service (Amazon ECS), Shell, Amazon Athena, AWS Glue, Amazon Elastic MapReduce (EMR), Cloud Dataflow

Storage

Redshift, Google Cloud, Amazon S3 (AWS S3), DB, MySQL, Data Pipelines, Teradata, Vertica

Other

Data Engineering, ELT, ETL Development, Data Build Tool (dbt), Amazon RDS, Amazon Kinesis, Data Warehousing, Data Warehouse Design, Machine Learning, Big Data, Google BigQuery, EMR, Pub/Sub, VM, Dagster

Paradigms

ETL

Platforms

AWS Lambda, Databricks, Amazon Web Services (AWS), Google Cloud Platform (GCP)

2015 - 2017

Master's Degree in Computer Science

Dalhousie University - Halifax, Nova Scotia, Canada

FEBRUARY 2020 - FEBRUARY 2022

Professional Data Engineer

Google Cloud

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring