Dinesh Kumar Agarwal Vijayakumar
Verified Expert in Engineering
Data Engineer and Developer
Dinesh has a master's degree in business analytics from NUS Business School with experience in business intelligence and data engineering. He specializes in building distributed systems for NLP workloads and creating cost- and performance-efficient data pipelines. Dinesh has also worked across all data lifecycle phases and is an adaptable, fast learner comfortable working in cross-functional teams.
Visual Studio Code (VS Code), Google Cloud, Azure, Docker, Cloud Native, Git, Linux, Windows
The most amazing...
...project I've delivered is an asynchronous, distributed web monitoring NLP pipeline that was the organization's core.
- Architected and built an Azure-based web monitoring solution that automates data collection, ingestion to an operational data store, and text processing in a distributed asynchronous cloud-native environment.
- Created REST APIs using FastAPI on Python to facilitate content management, centralize shared data, and serve data to downstream systems.
- Built NLP workloads, including named entity recognition, language detection, text extraction from PDF articles, and document translation in a containerized environment.
- Automated ETL health check and ingest reports to SharePoint for users.
ST Electronics Infosoft
- Set up an end-to-end data analytics platform on a secure private cloud environment.
- Designed and implemented ETL pipelines to ingest and warehouse department-specific data on their respective data stores and enable secure data governance using catalogs.
- Built prediction models based on historical data using regression modeling techniques.
Data Science Intern
ST Electronics Infosoft
- Performed exploratory data analysis to identify departure flight trends from the Singapore Changi airport. The result of this activity aided in identifying potential factors that could be used to predict passenger loads of outgoing flights.
- Built a hybrid predictive model for passenger loads of outgoing flights for the operations research team to assist with daily operations planning with 95% precision.
- Developed statistical models to analyze and predict passenger traffic within the airport for optimal resource allocation.
National University of Singapore
- Worked with a semiconductor manufacturer on an operations research project focused on identifying improvements to the manufacturing process using regression modeling.
- Collaborated with a real estate client on a geospatial analytics project focused on building a predictive model to predict property prices based on historic data and geospatial entities.
- Migrated an analytical warehouse hosted on Apache Drill to Google BigQuery for a cosmetics client on a data migration project.
Data Analyst Intern
- Migrated legacy data from varied sources to the updated transactional database.
- Designed and built an ETL pipeline to warehouse transactional data from AWS RDS to Google BigQuery using a Python client library for Google Cloud.
- Built visual dashboards on Tableau to facilitate periodic reporting for the marketing team.
System Engineer (Data Engineer)
Tata Consultancy Services
- Translated ETL logic—implemented to build dashboards and logical views on clickstream data and hosted on Google BigQuery—for a business intelligence project.
- Reduced the query cost on BigQuery by 40% using performance-efficient query logic.
- Migrated supply chain and logistical data from varied data sources to Google BigQuery for a North American eCommerce client.
- Warehoused sensitive customer information and built analytical views for a British banking client.
- Reduced the overnight batch runtime by two hours by optimizing batch process schedules.
- Managed a small data team responsible for maintaining and enhancing a department-specific data store.
- Contributed to data warehousing, extracting analytical reports, and batch processing data required by downstream systems for an insurance client.
- Reduced runtime by 35% by modifying existing ETL logic.
Source Monitoring Pipeline
As a stand-alone data engineer, I had the opportunity to design and implement the pipeline in a cloud-native environment. The pipeline prototype was a monolith, sharded into Dockerized microservices deployed in Azure Kubernetes Service. While the monolith was crawling data from five web pages whose metadata had to be ingested directly into the database, the distributed environment crawled data from 300+ sources. Its built-in REST API then handled the metadata, enabling users to independently configure and manage the source metadata.
Online Business Intelligence
I collaborated with a cross-functional team to translate pre-built ETL logic and worked closely with business analysts to build real-time analytical dashboards for understanding consumer behavior. The data was hosted on Google BigQuery, optimized for performance and query cost.
Pax Load Prediction
This project had all the components of a data science project, such as data engineering and transformation, exploratory data analysis, data visualization, and data modeling. I was required to build a predictive model using machine learning and to come up with a data-backed story that could convince the stakeholders of the real-life use case of my model.
The modeling phase went through iterations, starting from basic regression modeling to building complex machine-learning models by stacking weak learners together. The model's error rate was +/- 5%, which enabled the operations research team to plan their operations better.
It is also the first project where I learned to also productionize my machine learning model by wrapping it into a RESTful API.
Python 3, SQL, Python, R
BigQuery, RabbitMQ, IBM InfoSphere (DataStage), Informatica ETL, Tableau, Azure Kubernetes Service (AKS), SQL Server BI, Git
ETL, Business Intelligence (BI)
PostgreSQL, Data Pipelines, Databases, MongoDB, Google Cloud, NoSQL, IBM Db2, Teradata, Oracle SQL
Google BigQuery, Data Migration, Data Warehousing, Data Engineering, Data Architecture, ELT, Relational Database Design, Regression Modeling, Natural Language Processing (NLP), Machine Learning, Cloud Architecture, Data Cleaning, Data Matching, Data Quality, Data Analysis, Data Warehouse Design, Exploratory Data Analysis, Data Analytics, Google Data Studio, GPT, Generative Pre-trained Transformers (GPT), Big Data Architecture, Software Engineering, Artificial Intelligence (AI), CI/CD Pipelines, APIs, Data Modeling, IBM Cloud, Microsoft Data Transformation Services (now SSIS), SAP BusinessObjects (BO), Cloud Storage, Google Cloud ML, Geospatial Analytics, Predictive Modeling, Operations Research, Data Visualization, Team Leadership, Leadership, Dashboards, Causal Inference, Business Analysis, Data Reporting
Google Cloud API
Azure, Docker, Cloud Native, Kubernetes, Google Cloud SDK, Apache Kafka, Linux, Windows, Amazon Web Services (AWS), Pentaho, KNIME, SharePoint
Apache Drill, Apache Spark, Spark
Master of Science in Business Analytics
National University of Singapore - Singapore, Singapore
Bachelor of Technology in Computer Science and Engineering
SRM University - Chennai, India
Professional Cloud Architect
Data Engineering on Google Cloud Platform (GCP)
Google Cloud | via Coursera
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring