Narotam is available for hire

Narotam Aggarwal

Verified Expert in Engineering

Data Engineer and Developer

Location

Edinburgh, United Kingdom

Toptal Member Since

April 15, 2022

Narotam is an experienced data engineer who has worked with Big Data, Spark, Hive, Kafka, Scala, Python, data modeling, and many other related technologies. He builds enterprise applications that help data analytics teams and data scientists prepare reports and build machine learning models. With a palpable enthusiasm for data engineering, Narotam is a lifelong learner committed to personal and professional growth.

Data Warehousing Informatica Data Warehouse Design Data Engineering SQL ETL Data Pipelines MySQL PostgreSQL Snowflake Databricks Data Analysis Amazon RDS Python Hadoop

Portfolio

Cognizant

Apache Kafka, MongoDB, StreamSets, Spark, Scala, PySpark, Data Engineering...

BigSpark

Apache Kafka, Hadoop, Apache Hive, Amazon Web Services (AWS), StreamSets, Spark...

DataWave

Informatica, Teradata, SQL, ETL, Data Warehousing, Data Analysis...

Experience

ETL - 13 years SQL - 13 years Data Warehousing - 13 years PySpark - 5 years Python - 5 years Spark - 3 years Apache Kafka - 3 years Azure Databricks - 1 year

Availability

Part-time

Preferred Environment

Teradata, StreamSets, Hadoop, Apache Kafka, Apache Hive, Python 3, Spark, Azure Databricks, ADF, PySpark

The most amazing...

...thing I've built is a data pipeline for frauds and scams analytics that handles 22 million payments.

Work Experience

Senior Data Engineer

2022 - 2022

Cognizant

Implemented ISO 20022 changes in the payments data.
Carried out the development of real-time data pipelines to ingest payment data in the payment investigation system.
Managed the production and deployment of data pipelines.

Technologies: Apache Kafka, MongoDB, StreamSets, Spark, Scala, PySpark, Data Engineering, Data Pipelines, Python, SQL, ETL, Data Warehousing, Data Analysis, Azure Databricks, Databricks, Data Architecture, Data Warehouse Design

Senior Data Engineer

2020 - 2021

BigSpark

Developed near real-time payment applications to consume data from Kafka for data analytics.
Created re-usable components to archive and purge data on Hadoop Distributed File System (HDFS) and Amazon S3 cloud object storage.
Built a data pipeline to ingest feature bank data for a machine learning (ML) model.

Technologies: Apache Kafka, Hadoop, Apache Hive, Amazon Web Services (AWS), StreamSets, Spark, Scala, PySpark, SQL, ETL, Data Warehousing, Data Analysis, Data Architecture, Data Warehouse Design

Senior ETL Engineer

2015 - 2020

DataWave

Designed and implemented a solution to a data ingestion problem in a common source system that impacted multiple downstream systems whenever we changed the source layout.
Implemented a versioning solution in the data record where changes were occurring so that only the expected downstream system was impacted and no other project had to undergo regression testing.
Created data pipelines and workflows to load data into the enterprise data warehouse for data analytics, adhering to the Financial Services Logical Data Model (FSLDM).
Led an Agile development team to deploy new, domain-specific features.

Technologies: Informatica, Teradata, SQL, ETL, Data Warehousing, Data Analysis, Data Warehouse Design

ETL Developer

2007 - 2015

Cognizant

Identified patterns in data pipelines and workflows and automated the Informatica ETL tool in an Excel sheet for code generation, saving the organization three to four months' worth of effort and costs.
Built data warehouse applications using Informatica ETL tool, Oracle database, Linux operating system, and Autosys scheduler.
Performed data analysis for the transfer agency data.

Technologies: Informatica, Unix Shell Scripting, Autosys, SQL, ETL, Perl, Data Warehousing, Data Analysis

Experience

WeBazaar eCommerce Website

Built WeBazaar's eCommerce website using the Magento 2 open source platform, with end-to-end architecture on AWS. Used AWS as the virtual private cloud (VPC), with private and public subnets to hold the AWS EC2 web instance, bastion host, load balancer, NAT instance, and PostgreSQL database. Integrated the website with payments and basket checkouts.

Apache Airflow on Docker with AWS S3

I completed a project and wrote a blog post using Apache Airflow on Docker and connecting to AWS S3 for data storage and retrieval.

Conducted the following tasks for this project:
a) Created a weblog file using a Python script
b) Uploaded the file to an AWS S3 bucket created in the previous step
c) Connected to AWS S3 using AWS CLI for object validation

I completed the Airflow set up and started Docker by following the steps below, after which I was able to run a pipeline in Airflow and retrieve the data.

GitHub link for complete code:
Github.com/narotam333/de-project-1

1. Docker configuration for Airflow.
2. Docker configuration for Airflow’s extended image.
3. Docker configuration for AWS.
4. Executed Docker image to create a container.
5. DAG and Tasks creation in Airflow
6. Executed DAG from Airflow UI
7. Accessed S3 bucket or objects using AWS CLI

How to use variables and runtime config in Apache Airflow

https://medium.com/@narotam333/how-to-use-variables-and-runtime-config-in-apache-airflow-15731b4b168a

I completed a project and wrote a blog post using variables and runtime config in Apache Airflow.

Conducted the following tasks for this project:
a) Created a weblog file.
b) Uploaded the weblog file to an AWS S3 bucket.
c) Processed the file before uploading it again to an AWS S3 bucket.

Followed the below steps in this article to complete our project to understand variables and runtime config in Apache Airflow.

1. Wrote an ETL DAG and Task to generate a weblog with a dynamic filename.
2. Wrote a Task to upload weblog into AWS S3 and store dynamic file name using variables.
3. Wrote a Task to process the weblog file using S3FileTransformOperator, runtime config, and variables.
4. Executed the DAG using runtime config and checked variables values.
5. Masked variable values in Airflow.

GitHub link for complete code:
Github.com/narotam333/de-project-1a

Skills

Languages

SQL, Snowflake, Scala, Python, Perl

Paradigms

ETL

Platforms

Databricks, Apache Kafka, Amazon Web Services (AWS), Amazon EC2, Docker, Magento 2

Storage

Data Pipelines, MySQL, PostgreSQL, Teradata, Apache Hive, MongoDB, Amazon S3 (AWS S3)

Other

Informatica, StreamSets, Data Engineering, Data Warehousing, Azure Databricks, Data Warehouse Design, Data Analysis, AWS SDK for Python (Boto3), Amazon RDS, Data Architecture, Unix Shell Scripting

Frameworks

Hadoop, Spark, ADF

Tools

Apache Airflow, Git, Autosys, Terraform, Docker Compose, GitHub, AWS SDK

Libraries/APIs

PySpark

Certifications

MARCH 2021 - MARCH 2023

Confluent Certified Developer for Apache Kafka (CCDAK)

Confluent

JUNE 2020 - JUNE 2023

AWS Certified Developer Associate

AWS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring