Dinesh Kumar Agarwal Vijayakumar, Developer in Chennai, Tamil Nadu, India
Dinesh is available for hire
Hire Dinesh

Dinesh Kumar Agarwal Vijayakumar

Verified Expert  in Engineering

Data Engineer and Developer

Location
Chennai, Tamil Nadu, India
Toptal Member Since
June 3, 2022

Dinesh has a master's degree in business analytics from NUS Business School with experience in business intelligence and data engineering. He specializes in building distributed systems for NLP workloads and creating cost- and performance-efficient data pipelines. Dinesh has also worked across all data lifecycle phases and is an adaptable, fast learner comfortable working in cross-functional teams.

Portfolio

RegASK
Azure, MongoDB, Docker, Kubernetes, RabbitMQ, CI/CD Pipelines, Python 3, APIs...
ST Electronics Infosoft
IBM Cloud, IBM InfoSphere (DataStage), IBM Db2, Python 3, Regression Modeling...
ST Electronics Infosoft
Python 3, Predictive Modeling, Machine Learning, R, APIs...

Experience

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), Google Cloud, Azure, Docker, Cloud Native, Git, Linux, Windows

The most amazing...

...project I've delivered is an asynchronous, distributed web monitoring NLP pipeline that was the organization's core.

Work Experience

Data Engineer

2020 - 2022
RegASK
  • Architected and built an Azure-based web monitoring solution that automates data collection, ingestion to an operational data store, and text processing in a distributed asynchronous cloud-native environment.
  • Created REST APIs using FastAPI on Python to facilitate content management, centralize shared data, and serve data to downstream systems.
  • Built NLP workloads, including named entity recognition, language detection, text extraction from PDF articles, and document translation in a containerized environment.
  • Automated ETL health check and ingest reports to SharePoint for users.
Technologies: Azure, MongoDB, Docker, Kubernetes, RabbitMQ, CI/CD Pipelines, Python 3, APIs, Data Modeling, Cloud Architecture, Data Cleaning, Data Analysis, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Data Architecture, SQL, NoSQL, Python, Data Pipelines, SharePoint

Data Analyst

2019 - 2020
ST Electronics Infosoft
  • Set up an end-to-end data analytics platform on a secure private cloud environment.
  • Designed and implemented ETL pipelines to ingest and warehouse department-specific data on their respective data stores and enable secure data governance using catalogs.
  • Built prediction models based on historical data using regression modeling techniques.
Technologies: IBM Cloud, IBM InfoSphere (DataStage), IBM Db2, Python 3, Regression Modeling, Data Matching, Data Cleaning, Data Analysis, Data Quality, Data Architecture, Exploratory Data Analysis, R, SQL, Data Analytics, Data Pipelines, Business Intelligence (BI), Dashboards

Data Science Intern

2019 - 2019
ST Electronics Infosoft
  • Performed exploratory data analysis to identify departure flight trends from the Singapore Changi airport. The result of this activity aided in identifying potential factors that could be used to predict passenger loads of outgoing flights.
  • Built a hybrid predictive model for passenger loads of outgoing flights for the operations research team to assist with daily operations planning with 95% precision.
  • Developed statistical models to analyze and predict passenger traffic within the airport for optimal resource allocation.
Technologies: Python 3, Predictive Modeling, Machine Learning, R, APIs, Exploratory Data Analysis, Data Analytics, Dashboards, Relational Database Design, Data Reporting

Research Associate

2018 - 2019
National University of Singapore
  • Worked with a semiconductor manufacturer on an operations research project focused on identifying improvements to the manufacturing process using regression modeling.
  • Collaborated with a real estate client on a geospatial analytics project focused on building a predictive model to predict property prices based on historic data and geospatial entities.
  • Migrated an analytical warehouse hosted on Apache Drill to Google BigQuery for a cosmetics client on a data migration project.
Technologies: R, Python 3, Apache Drill, Google BigQuery, Pentaho, Exploratory Data Analysis, Geospatial Analytics, Predictive Modeling, Operations Research, Regression Modeling, SQL, Data Analytics, Data Pipelines

Data Analyst Intern

2018 - 2019
Anywhr
  • Migrated legacy data from varied sources to the updated transactional database.
  • Designed and built an ETL pipeline to warehouse transactional data from AWS RDS to Google BigQuery using a Python client library for Google Cloud.
  • Built visual dashboards on Tableau to facilitate periodic reporting for the marketing team.
Technologies: Amazon Web Services (AWS), Google BigQuery, Tableau, Python 3, PostgreSQL, SQL, Data Analytics, Data Pipelines, ETL, ELT, Business Intelligence (BI), Dashboards, Relational Database Design, BigQuery, Data Analysis, Data Reporting

System Engineer (Data Engineer)

2014 - 2018
Tata Consultancy Services
  • Translated ETL logic—implemented to build dashboards and logical views on clickstream data and hosted on Google BigQuery—for a business intelligence project.
  • Reduced the query cost on BigQuery by 40% using performance-efficient query logic.
  • Migrated supply chain and logistical data from varied data sources to Google BigQuery for a North American eCommerce client.
  • Warehoused sensitive customer information and built analytical views for a British banking client.
  • Reduced the overnight batch runtime by two hours by optimizing batch process schedules.
  • Managed a small data team responsible for maintaining and enhancing a department-specific data store.
  • Contributed to data warehousing, extracting analytical reports, and batch processing data required by downstream systems for an insurance client.
  • Reduced runtime by 35% by modifying existing ETL logic.
Technologies: Google Cloud, Google BigQuery, IBM Db2, Teradata, Oracle SQL, Microsoft Data Transformation Services (now SSIS), Informatica ETL, SAP BusinessObjects (BO), Tableau, Big Data Architecture, Data Migration, Data Warehousing, Data Warehouse Design, ETL, Data Matching, Data Cleaning, Data Quality, Exploratory Data Analysis, SQL, Team Leadership, Leadership, ELT, Data Pipelines, Data Analytics, Business Intelligence (BI), Dashboards, Google Data Studio, Databases, Relational Database Design, Data Reporting

Source Monitoring Pipeline

A cloud-native, distributed, asynchronous pipeline that ingests data through multiple entry points and performs multiple NLP tasks, including text extraction from PDF articles, language detection, named entity recognition, and text translation.

As a stand-alone data engineer, I had the opportunity to design and implement the pipeline in a cloud-native environment. The pipeline prototype was a monolith, sharded into Dockerized microservices deployed in Azure Kubernetes Service. While the monolith was crawling data from five web pages whose metadata had to be ingested directly into the database, the distributed environment crawled data from 300+ sources. Its built-in REST API then handled the metadata, enabling users to independently configure and manage the source metadata.

Online Business Intelligence

An ETL pipeline focused on warehousing and utilizing terabyte-scale clickstream data.

I collaborated with a cross-functional team to translate pre-built ETL logic and worked closely with business analysts to build real-time analytical dashboards for understanding consumer behavior. The data was hosted on Google BigQuery, optimized for performance and query cost.

Pax Load Prediction

I built a predictive model to assist an airport operations research team as part of the capstone project for my master's degree. This predictive model could predict the number of departing passengers on each flight so that the operations team could plan their daily activities better.

This project had all the components of a data science project, such as data engineering and transformation, exploratory data analysis, data visualization, and data modeling. I was required to build a predictive model using machine learning and to come up with a data-backed story that could convince the stakeholders of the real-life use case of my model.

The modeling phase went through iterations, starting from basic regression modeling to building complex machine-learning models by stacking weak learners together. The model's error rate was +/- 5%, which enabled the operations research team to plan their operations better.

It is also the first project where I learned to also productionize my machine learning model by wrapping it into a RESTful API.

Languages

Python 3, SQL, Python, R

Tools

BigQuery, RabbitMQ, IBM InfoSphere (DataStage), Informatica ETL, Tableau, Azure Kubernetes Service (AKS), SQL Server BI, Git

Paradigms

ETL, Business Intelligence (BI)

Storage

PostgreSQL, Data Pipelines, Databases, MongoDB, Google Cloud, NoSQL, IBM Db2, Teradata, Oracle SQL

Other

Google BigQuery, Data Migration, Data Warehousing, Data Engineering, Data Architecture, ELT, Relational Database Design, Regression Modeling, Natural Language Processing (NLP), Machine Learning, Cloud Architecture, Data Cleaning, Data Matching, Data Quality, Data Analysis, Data Warehouse Design, Exploratory Data Analysis, Data Analytics, Google Data Studio, GPT, Generative Pre-trained Transformers (GPT), Big Data Architecture, Software Engineering, Artificial Intelligence (AI), CI/CD Pipelines, APIs, Data Modeling, IBM Cloud, Microsoft Data Transformation Services (now SSIS), SAP BusinessObjects (BO), Cloud Storage, Google Cloud ML, Geospatial Analytics, Predictive Modeling, Operations Research, Data Visualization, Team Leadership, Leadership, Dashboards, Causal Inference, Business Analysis, Data Reporting

Libraries/APIs

Google Cloud API

Platforms

Azure, Docker, Cloud Native, Kubernetes, Google Cloud SDK, Apache Kafka, Linux, Windows, Amazon Web Services (AWS), Pentaho, KNIME, SharePoint

Frameworks

Apache Drill, Apache Spark, Spark

2018 - 2019

Master of Science in Business Analytics

National University of Singapore - Singapore, Singapore

2010 - 2014

Bachelor of Technology in Computer Science and Engineering

SRM University - Chennai, India

DECEMBER 2020 - DECEMBER 2022

Professional Cloud Architect

Google Cloud

DECEMBER 2018 - PRESENT

Data Engineering on Google Cloud Platform (GCP)

Google Cloud | via Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring