Rituraj Kumar, Developer in Mumbai, Maharashtra, India
Rituraj is available for hire
Hire Rituraj

Rituraj Kumar

Verified Expert  in Engineering

Data Engineer and Developer

Mumbai, Maharashtra, India

Toptal member since August 23, 2024

Bio

Rituraj has over six years of experience in data engineering and MLOps and excels in crafting scalable data models and deploying ML workflows. His experience spans marketing, healthcare, fintech, and retail industries, where he bridged technical and business teams to drive data-driven insights and innovation. Rituraj is excited to apply his expertise to impactful projects.

Portfolio

Zeals
Python, PySpark, Google BigQuery, Data Build Tool (dbt), Apache Airflow...
Quantiphi
Apache Airflow, Data Analysis, Data Build Tool (dbt), Data Warehousing...
Quantiphi
Apache Airflow, Communication, Data Analysis, Data Marts, Data Modeling, Python...

Experience

  • Python - 7 years
  • ETL - 6 years
  • Google BigQuery - 6 years
  • Data Warehousing - 5 years
  • PySpark - 5 years
  • Apache Airflow - 5 years
  • Machine Learning Operations (MLOps) - 4 years
  • Data Build Tool (dbt) - 3 years

Availability

Part-time

Preferred Environment

Python, PySpark, Apache Airflow, Data Build Tool (dbt), SQL, Data Modeling, ETL, Data Warehousing, Machine Learning Operations (MLOps), Kubeflow

The most amazing...

...thing I've done was lead a chatbot retail project that built a scalable data warehouse and deployed a recommendation engine, enhancing user engagement by 40%.

Work Experience

Senior Data and MLOps Engineer

2021 - PRESENT
Zeals
  • Developed and optimized scalable data marts using ETL pipelines in Python, SQL, Spark, and Airflow, reducing query times by 35% over a period of 12 months.
  • Engineered a robust data pipeline using dbt, GCP BigQuery, Data Catalog, and Apache Airflow to process and analyze terabytes of data weekly, improving data retrieval times by 50% and enabling more accurate predictive modeling.
  • Built a Vertex AI Pipelines-based production pipeline for ML use cases, enhancing training efficiency by 50% and availability by 40%.
  • Implemented real-time streaming pipelines using PySpark for a recommendation engine, leading to a 20% increase in user engagement and a 15% boost in conversion rates by delivering personalized offers in real time.
  • Optimized the data processing pipeline, achieving an 85 – 90% reduction in costs and processing time and increasing the speed by ten times.
Technologies: Python, PySpark, Google BigQuery, Data Build Tool (dbt), Apache Airflow, Google Cloud Platform (GCP), SQL, MongoDB, ETL, Data Warehousing, Data Modeling, Data Marts, Machine Learning Operations (MLOps), Model Deployment, Model Monitoring, Tableau, Google Data Studio, Vertex AI, Kubeflow, Data Visualization, Data Masking, Batch and Stream Pipeline, Cloud Run, MLflow, Google Cloud SQL, Machine Learning, Business Analysis, ELT, Shopify, Cloud Scheduler

Senior Data Engineer

2019 - 2021
Quantiphi
  • Developed and deployed a healthcare analytics platform and data warehouse on GCP Cloud, utilizing BigQuery for data storage and analytics and Dataflow for efficient data processing.
  • Implemented automated testing frameworks using Python and pytest, achieving a 40% reduction in manual testing time and enhancing the reliability of data pipelines.
  • Achieved a 60% improvement in data accessibility and reduced processing time by 70% through optimized data pipelines.
  • Utilized data engineering tools, including Airflow for workflow management and dbt for data transformation on GCP. Provided actionable insights to healthcare professionals, resulting in a 30% enhancement in business KPIs delivery efficiency.
  • Collaborated with data scientists to enhance AI models for predictive analytics in patient outcomes and personalized treatments.
  • Integrated AI models into production using the GCP AI platform and TensorFlow, leveraging Vertex Pipelines for machine learning operations, which improved prediction accuracy by 20%.
  • Used federated learning for cross-country data analysis and model training to ensure data sensitivity and governance, utilizing NVIDIA Clara for enhanced data security and compliance.
Technologies: Apache Airflow, Data Analysis, Data Build Tool (dbt), Data Warehousing, Data Modeling, Data Marts, ETL, Google Cloud Platform (GCP), Google BigQuery, Apache Kafka, Machine Learning Operations (MLOps), Python, PySpark, SQL, NoSQL, Teamwork, Communication, Slack, Jira, Vertex AI, Data Visualization, Data Masking, Batch and Stream Pipeline, Cloud Run, MLflow, Google Cloud SQL, Business Analysis, ELT, Shopify, Cloud Scheduler

Data Engineer

2018 - 2019
Quantiphi
  • Enhanced business insights through advanced data engineering techniques, leading to a 20% increase in sales efficiency and more effective targeting by sales teams.
  • Created a scalable GCP data warehouse, optimizing data accessibility and analytics capabilities for managing large datasets effectively using Airflow, PySpark, and GCP BigQuery.
  • Optimized data processing and analysis workflows with GCP services and PySpark, improving operational efficiency and decision-making, which resulted in a 30% reduction in data processing time and a 25% reduction in infrastructure cost.
  • Implemented CRM solutions integrating GA 360, Salesforce Marketing Cloud, and other data sources to enhance user profiling and targeted marketing strategies, resulting in a 15% increase in conversion rates.
  • Developed a real-time streaming solution using PySpark for marketing analytics projects, resulting in a 30% reduction in data processing time and accurate campaign performance insights, driving a 25% increase in marketing ROI.
Technologies: Apache Airflow, Communication, Data Analysis, Data Marts, Data Modeling, Python, Salesforce Sales Cloud, Google Analytics 360, REST APIs, SOAP APIs, YouTube API, GotoWebiner, Databases, SQL, Docker, Flask, Google BigQuery, Data Loss Prevention (DLP), Data Lakes, Delta Lake, FTP Servers, Data Visualization, Data Masking, Batch and Stream Pipeline, Google App Engine, Cloud Run, Google Cloud SQL, Business Analysis, ELT, Cloud Scheduler

Software Engineer

2017 - 2018
Quantiphi
  • Designed and implemented a GCP-hosted microservices platform for speech and recognition analytics, ensuring secure and efficient resource access.
  • Deployed data workflows and pipelines on GCP, reducing data processing time by 30% and accelerating model training cycles.
  • Collaborated with a data scientist to improve speech recognition accuracy, driving business growth and customer satisfaction.
  • Developed back-end services integrating analytics KPIs, such as user interaction metrics and speech analytics use cases, providing useful data features for enhancing model performance, resulting in a 25% increase in speech recognition accuracy.
  • Demonstrated expertise in data engineering, MLOps, and cloud infrastructure to deliver impactful solutions aligned with business objectives.
Technologies: Python, Docker, Kubernetes, Google Cloud Platform (GCP), Amazon Web Services (AWS), REST APIs, Flask, SQL, Databases, Google BigQuery, Communication, Apache Kafka, Data Loss Prevention (DLP), Data Warehousing, Google Cloud SQL, Business Analysis

Experience

Chatbot Analytics and Recommendation

Acted as a senior data and MLOps engineer at Zeals, where I spearheaded the development of a chatbot recommendation and analytics platform and handled the implementation of a scalable data warehouse for the analytics platform.

The platform was designed to prioritize personalized user interactions and campaign optimization, aiming to improve user engagement and enhance campaign performance through advanced data analysis and machine learning techniques.

Healthcare Data Analytics Platform

Developed a healthcare analytics platform on Google Cloud Platform (GCP) to manage and analyze terabytes of data every week. This data lake serves as a foundation for machine learning use cases, enabling the integration of advanced analytics and predictive modeling into the platform.

I assisted in setting up an automated MLOps pipeline to streamline the deployment, monitoring, and maintenance of machine learning models, ensuring efficient and consistent insight delivery.

Marketing Analytics Platform

To improve sales efficiency and enable more effective targeting by sales teams, I developed a scalable GCP-based data warehouse for handling large datasets and delivering real-time insights for marketing analytics. I enhanced user profiling and targeting strategies by integrating Google Analytics 360 and Salesforce Marketing Cloud. I implemented a PySpark-powered real-time streaming solution to optimize data processing and analysis workflows. These efforts led to a 20% increase in sales efficiency, a 15% boost in conversion rates, a 30% reduction in data processing time, and a 25% increase in marketing ROI, enhancing customer satisfaction and operational efficiency.

Speech Analytics Platform

To enhance speech recognition accuracy and operational efficiency, I designed and implemented a GCP-hosted microservices platform for speech and recognition analytics.

The platform ensured secure and efficient resource access while integrating back-end services with analytics KPIs, such as user interaction metrics and speech analytics use cases. The integrations provided valuable data features, resulting in a 25% increase in speech recognition accuracy. By deploying optimized data workflows and pipelines on GCP, I reduced data processing time by 30% and accelerated model training cycles.

I collaborated closely with a data scientist and leveraged my expertise in data engineering, MLOps, and cloud infrastructure to deliver impactful solutions aligned with business objectives, driving growth and enhancing customer satisfaction.

Education

2013 - 2017

Bachelor's Degree in Information Technology

VIT University - Vellore, Tamil Nadu, India

Certifications

NOVEMBER 2020 - PRESENT

Machine Learning for Business

Coursera

SEPTEMBER 2020 - PRESENT

Associate Cloud Engineer

Google Cloud

OCTOBER 2018 - PRESENT

Serverless Data Analysis with Google BigQuery and Cloud Dataflow

Coursera

JUNE 2018 - PRESENT

Big Data Integration and Processing

Coursera

Skills

Libraries/APIs

PySpark, REST APIs, SOAP APIs, YouTube API

Tools

Cloud Scheduler, Apache Airflow, Tableau, Salesforce Sales Cloud, Slack, Jira, Apache Beam

Platforms

Google Cloud Platform (GCP), Software Design Patterns, Vertex AI, Kubeflow, Apache Kafka, Google Analytics 360, Docker, Kubernetes, Amazon Web Services (AWS), Google App Engine, Cloud Run, Shopify

Languages

Python, SQL

Frameworks

Flask, Hadoop, Apache Spark

Paradigms

ETL, Management

Storage

Databases, MongoDB, NoSQL, Data Lakes, Google Cloud SQL, Apache Hive

Other

Google BigQuery, Teamwork, Communication, ELT, Data Build Tool (dbt), Data Modeling, Data Warehousing, Machine Learning Operations (MLOps), Data Analysis, System Design, Data Marts, Model Deployment, Model Monitoring, Google Data Studio, GotoWebiner, Data Loss Prevention (DLP), Delta Lake, FTP Servers, Business Analysis, Data Visualization, Data Masking, Batch and Stream Pipeline, Machine Learning, Big Data, MLflow

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring