Dinesh Kumar Agarwal Vijayakumar, Developer in Chennai, Tamil Nadu, India
Dinesh is available for hire
Hire Dinesh

Dinesh Kumar Agarwal Vijayakumar

Verified Expert  in Engineering

Data Engineer and Developer

Chennai, Tamil Nadu, India

Toptal member since June 3, 2022

Bio

Dinesh is a results-driven senior data engineer with 9+ years of experience. He's an expert in building ETL pipelines, optimizing data flows, and applying machine learning to solve business problems. Dinesh is proficient in cloud-based BI solutions, driving cost efficiencies and enhancing data quality. He's certified in Google Cloud and skilled in AWS, Kubernetes, Git, R, Python, and Azure. Dinesh is edicated to delivering data solutions that empower informed decision-making.

Portfolio

Google
Google Cloud, SQL, Python, Kotlin, ETL, BI Reporting, Machine Learning...
Not Just Travel (Agency) Limited
SQL, Google Data Studio, Relational Database Design, Databases, Knack...
PropertyGuru
Elasticsearch, MongoDB, Python, BigQuery, Google Cloud...

Experience

  • Data Engineering - 8 years
  • SQL - 8 years
  • ETL - 8 years
  • Data Warehousing - 7 years
  • Data Warehouse Design - 6 years
  • Google Cloud - 5 years
  • Python - 5 years
  • Azure - 2 years

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), Google Cloud, Azure, Docker, Cloud Native, Git, Linux, Windows

The most amazing...

...project I've delivered is an asynchronous, distributed web monitoring NLP pipeline that was the organization's core.

Work Experience

Business Systems Analyst

2024 - PRESENT
Google
  • Partnered with stakeholders to identify and address business challenges, delivering tailored solutions.
  • Collaborated with vendor teams to ensure seamless implementation and alignment with business objectives.
  • Architected and developed a robust ETL framework utilizing Kotlin and GoogleSQL, enhancing data processing efficiency.
Technologies: Google Cloud, SQL, Python, Kotlin, ETL, BI Reporting, Machine Learning, Large Language Model Operations (LLMOps), Database Design

Database Developer

2022 - PRESENT
Not Just Travel (Agency) Limited
  • Collaborated with stakeholders to gather requirements, design solutions that aligned with business objectives, and ensure data infrastructure supported evolving needs.
  • Set up and maintained scalable cloud infrastructure, ensuring high availability and reliability for all reporting and data processing tasks.
  • Implemented an end-to-end self-serve reporting system using Google Data Studio and BigQuery, enabling real-time data access for business users.
  • Developed and managed pipelines from Knack to BigQuery, automating ETL processes to support dynamic reporting needs.
  • Built time-sensitive data pipelines from an SFTP server to Knack, ensuring timely and accurate data flow for critical operations.
Technologies: SQL, Google Data Studio, Relational Database Design, Databases, Knack, Google Cloud, Google Cloud Platform (GCP), Google BigQuery, Apache Airflow, GitHub Actions, FastAPI, Windows Server, Looker Studio, API Integration, Performance Tuning, Reporting, BI Reporting, ETL Tools, Database Design, Pandas

Senior Data Engineer

2022 - 2024
PropertyGuru
  • Led the data platform engineering team, driving project ownership, solution design, and collaboration with junior engineers.
  • Set up an end-to-end data platform that auto-generates workflows and DAGs using GitHub Actions, enabling data engineers to stream changed data from MySQL tables to Amazon S3 via AWS DMS and load it into BigQuery in batches.
  • Developed and deployed data ingestion pipelines to Google BigQuery using Cloud Composer (Apache Airflow) for orchestration.
  • Established a BigQuery usage analysis environment, helping the org to reduce monthly query and storage costs by 20%.
  • Implemented a data governance and quality framework, integrating data quality measures with the ETL environment and enhancing organizational data awareness.
  • Conducted a proof of concept (PoC) in Snowflake to compare its performance, scalability, and cost-effectiveness with Google BigQuery, providing strategic insights that influenced the organization’s data platform decisions.
  • Automated Looker and GCP access management using Microsoft AD, Terraform and Google client libraries, centralizing access control and governance across the organization to enhance security and streamline access processes.
Technologies: Elasticsearch, MongoDB, Python, BigQuery, Google Cloud, Amazon Web Services (AWS), AWS Glue, MySQL, DataHub, Apache Airflow, GitHub Actions, Change Data Capture, Apache Kafka, Debezium, AWS Database Migration Service (DMS), Artificial Intelligence (AI), Cloud Firestore, Snowflake, Data Build Tool (dbt), API Integration, Looker, Looker Modeling Language (LookML), Performance Tuning, Reporting, BI Reporting, AWS Lambda, Amazon Athena, ETL Tools, Amazon RDS, Database Design, DevOps, NumPy, Pandas, Terraform

Data Engineer

2020 - 2022
RegASK
  • Architected and built an Azure-based web monitoring solution that automates data collection, ingestion to an operational data store, and text processing in a distributed asynchronous cloud-native environment.
  • Created REST APIs using FastAPI on Python to facilitate content management, centralize shared data, and serve data to downstream systems.
  • Built NLP workloads, including named entity recognition, language detection, text extraction from PDF articles, and document translation in a containerized environment.
  • Automated ETL health check and ingest reports to SharePoint for users.
Technologies: Azure, MongoDB, Docker, Kubernetes, RabbitMQ, CI/CD Pipelines, Python 3, APIs, Data Modeling, Cloud Architecture, Data Cleaning, Data Analysis, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Architecture, SQL, NoSQL, Python, Data Pipelines, SharePoint, Artificial Intelligence (AI), Data Science, API Integration, Performance Tuning, ETL Tools, Database Design, DevOps, NumPy, Pandas

Data Analyst

2019 - 2020
ST Electronics Infosoft
  • Set up an end-to-end data analytics platform on a secure private cloud environment.
  • Designed and implemented ETL pipelines to ingest and warehouse department-specific data on their respective data stores and enable secure data governance using catalogs.
  • Built prediction models based on historical data using regression modeling techniques.
Technologies: IBM Cloud, IBM InfoSphere (DataStage), IBM Db2, Python 3, Regression Modeling, Data Matching, Data Cleaning, Data Analysis, Data Quality, Data Architecture, Exploratory Data Analysis, R, SQL, Data Analytics, Data Pipelines, Business Intelligence (BI), Dashboards, Data Science, ETL Tools, Database Design, NumPy, Pandas

Data Science Intern

2019 - 2019
ST Electronics Infosoft
  • Performed exploratory data analysis to identify departure flight trends from the Singapore Changi airport. The result of this activity aided in identifying potential factors that could be used to predict passenger loads of outgoing flights.
  • Built a hybrid predictive model for passenger loads of outgoing flights for the operations research team to assist with daily operations planning with 95% precision.
  • Developed statistical models to analyze and predict passenger traffic within the airport for optimal resource allocation.
Technologies: Python 3, Predictive Modeling, Machine Learning, R, APIs, Exploratory Data Analysis, Data Analytics, Dashboards, Relational Database Design, Data Reporting, Data Science, Amazon RDS, NumPy

Research Associate

2018 - 2019
National University of Singapore
  • Worked with a semiconductor manufacturer on an operations research project focused on identifying improvements to the manufacturing process using regression modeling.
  • Collaborated with a real estate client on a geospatial analytics project focused on building a predictive model to predict property prices based on historic data and geospatial entities.
  • Migrated an analytical warehouse hosted on Apache Drill to Google BigQuery for a cosmetics client on a data migration project.
Technologies: R, Python 3, Apache Drill, Google BigQuery, Pentaho, Exploratory Data Analysis, Geospatial Analytics, Predictive Modeling, Operations Research, Regression Modeling, SQL, Data Analytics, Data Pipelines, NumPy

Data Analyst Intern

2018 - 2019
Anywhr
  • Migrated legacy data from varied sources to the updated transactional database.
  • Designed and built an ETL pipeline to warehouse transactional data from AWS RDS to Google BigQuery using a Python client library for Google Cloud.
  • Built visual dashboards on Tableau to facilitate periodic reporting for the marketing team.
Technologies: Amazon Web Services (AWS), Google BigQuery, Tableau, Python 3, PostgreSQL, SQL, Data Analytics, Data Pipelines, ETL, ELT, Business Intelligence (BI), Dashboards, Relational Database Design, BigQuery, Data Analysis, Data Reporting, Reporting, BI Reporting

System Engineer | Data Engineer

2014 - 2018
Tata Consultancy Services
  • Translated ETL logic—implemented to build dashboards and logical views on clickstream data and hosted on Google BigQuery—for a business intelligence project.
  • Reduced the query cost on BigQuery by 40% using performance-efficient query logic.
  • Migrated supply chain and logistical data from varied data sources to Google BigQuery for a North American eCommerce client.
  • Warehoused sensitive customer information and built analytical views for a British banking client.
  • Reduced the overnight batch runtime by two hours by optimizing batch process schedules.
  • Managed a small data team that maintained and enhanced a department-specific data store.
  • Contributed to data warehousing by extracting analytical reports and batch processing data downstream systems required for an insurance client.
  • Reduced runtime by 35% by modifying existing ETL logic.
Technologies: Google Cloud, Google BigQuery, IBM Db2, Teradata, Oracle SQL, Microsoft Data Transformation Services (now SSIS), Informatica ETL, SAP BusinessObjects (BO), Tableau, Big Data Architecture, Data Migration, Data Warehousing, Data Warehouse Design, ETL, Data Matching, Data Cleaning, Data Quality, Exploratory Data Analysis, SQL, Team Leadership, Leadership, ELT, Data Pipelines, Data Analytics, Business Intelligence (BI), Dashboards, Google Data Studio, Databases, Relational Database Design, Data Reporting, ClickStream, Web Analytics, T-SQL (Transact-SQL), Microsoft SQL Server, Oracle, Performance Tuning, SQL Performance, Reporting, BI Reporting, Database Design

Experience

Source Monitoring Pipeline

A cloud-native, distributed, asynchronous pipeline that ingests data through multiple entry points and performs multiple NLP tasks, including text extraction from PDF articles, language detection, named entity recognition, and text translation.

As a stand-alone data engineer, I had the opportunity to design and implement the pipeline in a cloud-native environment. The pipeline prototype was a monolith, sharded into Dockerized microservices deployed in Azure Kubernetes Service. While the monolith was crawling data from five web pages whose metadata had to be ingested directly into the database, the distributed environment crawled data from 300+ sources. Its built-in REST API then handled the metadata, enabling users to independently configure and manage the source metadata.

Online Business Intelligence

An ETL pipeline focused on warehousing and utilizing terabyte-scale clickstream data.

I collaborated with a cross-functional team to translate pre-built ETL logic and worked closely with business analysts to build real-time analytical dashboards for understanding consumer behavior. The data was hosted on Google BigQuery, optimized for performance and query cost.

Pax Load Prediction

I built a predictive model to assist an airport operations research team as part of the capstone project for my master's degree. This predictive model could predict the number of departing passengers on each flight so that the operations team could plan their daily activities better.

This project had all the components of a data science project, such as data engineering and transformation, exploratory data analysis, data visualization, and data modeling. I was required to build a predictive model using machine learning and to come up with a data-backed story that could convince the stakeholders of the real-life use case of my model.

The modeling phase went through iterations, starting from basic regression modeling to building complex machine-learning models by stacking weak learners together. The model's error rate was +/- 5%, which enabled the operations research team to plan their operations better.

It is also the first project where I learned to also productionize my machine learning model by wrapping it into a RESTful API.

Education

2018 - 2019

Master of Science in Business Analytics

National University of Singapore - Singapore, Singapore

2010 - 2014

Bachelor of Technology in Computer Science and Engineering

SRM University - Chennai, India

Certifications

DECEMBER 2020 - DECEMBER 2022

Professional Cloud Architect

Google Cloud

DECEMBER 2018 - PRESENT

Data Engineering on Google Cloud Platform (GCP)

Google Cloud | via Coursera

Skills

Libraries/APIs

Google Cloud API, Pandas, NumPy

Tools

BigQuery, RabbitMQ, IBM InfoSphere (DataStage), Informatica ETL, Tableau, Azure Kubernetes Service (AKS), SQL Server BI, Git, AWS Glue, DataHub, Apache Airflow, Knack, Looker, Amazon Athena, Terraform

Languages

Python 3, SQL, Python, R, T-SQL (Transact-SQL), Snowflake, Looker Modeling Language (LookML), Kotlin

Paradigms

ETL, Database Design, Business Intelligence (BI), DevOps

Storage

PostgreSQL, Data Pipelines, Databases, MongoDB, Google Cloud, NoSQL, IBM Db2, Teradata, Oracle SQL, Elasticsearch, MySQL, Cloud Firestore, Microsoft SQL Server, SQL Performance

Platforms

Azure, Docker, Cloud Native, Amazon Web Services (AWS), Kubernetes, Google Cloud SDK, Apache Kafka, Linux, Windows, Pentaho, KNIME, SharePoint, Debezium, Google Cloud Platform (GCP), Firebase, Windows Server, Oracle, AWS Lambda

Frameworks

Apache Drill, Apache Spark, Spark

Other

Google BigQuery, Data Migration, Data Warehousing, Data Engineering, Data Architecture, ELT, Relational Database Design, Regression Modeling, Natural Language Processing (NLP), Machine Learning, Cloud Architecture, Data Cleaning, Data Matching, Data Quality, Data Analysis, Data Warehouse Design, Exploratory Data Analysis, Data Analytics, Google Data Studio, Generative Pre-trained Transformers (GPT), Web Analytics, Big Data Architecture, Software Engineering, Artificial Intelligence (AI), CI/CD Pipelines, APIs, Data Modeling, IBM Cloud, Microsoft Data Transformation Services (now SSIS), SAP BusinessObjects (BO), Cloud Storage, Google Cloud ML, Geospatial Analytics, Predictive Modeling, Operations Research, Data Visualization, Team Leadership, Leadership, Dashboards, Causal Inference, Business Analysis, Data Reporting, ClickStream, GitHub Actions, Change Data Capture, AWS Database Migration Service (DMS), FastAPI, Data Science, Data Build Tool (dbt), Looker Studio, API Integration, Performance Tuning, Reporting, BI Reporting, ETL Tools, Large Language Model Operations (LLMOps), Amazon RDS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring