Narotam Aggarwal
Verified Expert in Engineering
Data Engineer and Developer
Narotam is an experienced data engineer who has worked with Big Data, Spark, Hive, Kafka, Scala, Python, data modeling, and many other related technologies. He builds enterprise applications that help data analytics teams and data scientists prepare reports and build machine learning models. With a palpable enthusiasm for data engineering, Narotam is a lifelong learner committed to personal and professional growth.
Portfolio
Experience
Availability
Preferred Environment
Teradata, StreamSets, Hadoop, Apache Kafka, Apache Hive, Python 3, Spark, Azure Databricks, ADF, PySpark
The most amazing...
...thing I've built is a data pipeline for frauds and scams analytics that handles 22 million payments.
Work Experience
Senior Data Engineer
Cognizant
- Implemented ISO 20022 changes in the payments data.
- Carried out the development of real-time data pipelines to ingest payment data in the payment investigation system.
- Managed the production and deployment of data pipelines.
Senior Data Engineer
BigSpark
- Developed near real-time payment applications to consume data from Kafka for data analytics.
- Created re-usable components to archive and purge data on Hadoop Distributed File System (HDFS) and Amazon S3 cloud object storage.
- Built a data pipeline to ingest feature bank data for a machine learning (ML) model.
Senior ETL Engineer
DataWave
- Designed and implemented a solution to a data ingestion problem in a common source system that impacted multiple downstream systems whenever we changed the source layout.
- Implemented a versioning solution in the data record where changes were occurring so that only the expected downstream system was impacted and no other project had to undergo regression testing.
- Created data pipelines and workflows to load data into the enterprise data warehouse for data analytics, adhering to the Financial Services Logical Data Model (FSLDM).
- Led an Agile development team to deploy new, domain-specific features.
ETL Developer
Cognizant
- Identified patterns in data pipelines and workflows and automated the Informatica ETL tool in an Excel sheet for code generation, saving the organization three to four months' worth of effort and costs.
- Built data warehouse applications using Informatica ETL tool, Oracle database, Linux operating system, and Autosys scheduler.
- Performed data analysis for the transfer agency data.
Experience
WeBazaar eCommerce Website
Apache Airflow on Docker with AWS S3
Conducted the following tasks for this project:
a) Created a weblog file using a Python script
b) Uploaded the file to an AWS S3 bucket created in the previous step
c) Connected to AWS S3 using AWS CLI for object validation
I completed the Airflow set up and started Docker by following the steps below, after which I was able to run a pipeline in Airflow and retrieve the data.
GitHub link for complete code:
Github.com/narotam333/de-project-1
1. Docker configuration for Airflow.
2. Docker configuration for Airflow’s extended image.
3. Docker configuration for AWS.
4. Executed Docker image to create a container.
5. DAG and Tasks creation in Airflow
6. Executed DAG from Airflow UI
7. Accessed S3 bucket or objects using AWS CLI
How to use variables and runtime config in Apache Airflow
https://medium.com/@narotam333/how-to-use-variables-and-runtime-config-in-apache-airflow-15731b4b168aConducted the following tasks for this project:
a) Created a weblog file.
b) Uploaded the weblog file to an AWS S3 bucket.
c) Processed the file before uploading it again to an AWS S3 bucket.
Followed the below steps in this article to complete our project to understand variables and runtime config in Apache Airflow.
1. Wrote an ETL DAG and Task to generate a weblog with a dynamic filename.
2. Wrote a Task to upload weblog into AWS S3 and store dynamic file name using variables.
3. Wrote a Task to process the weblog file using S3FileTransformOperator, runtime config, and variables.
4. Executed the DAG using runtime config and checked variables values.
5. Masked variable values in Airflow.
GitHub link for complete code:
Github.com/narotam333/de-project-1a
Skills
Languages
SQL, Snowflake, Scala, Python, Perl
Paradigms
ETL
Platforms
Databricks, Apache Kafka, Amazon Web Services (AWS), Amazon EC2, Docker, Magento 2
Storage
Data Pipelines, MySQL, PostgreSQL, Teradata, Apache Hive, MongoDB, Amazon S3 (AWS S3)
Other
Informatica, StreamSets, Data Engineering, Data Warehousing, Azure Databricks, Data Warehouse Design, Data Analysis, AWS SDK for Python (Boto3), Amazon RDS, Data Architecture, Unix Shell Scripting
Frameworks
Hadoop, Spark, ADF
Tools
Apache Airflow, Git, Autosys, Terraform, Docker Compose, GitHub, AWS SDK
Libraries/APIs
PySpark
Certifications
Confluent Certified Developer for Apache Kafka (CCDAK)
Confluent
AWS Certified Developer Associate
AWS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring