
Rituraj Kumar
Verified Expert in Engineering
Data Engineer and Developer
Mumbai, Maharashtra, India
Toptal member since August 23, 2024
Rituraj has over six years of experience in data engineering and MLOps and excels in crafting scalable data models and deploying ML workflows. His experience spans marketing, healthcare, fintech, and retail industries, where he bridged technical and business teams to drive data-driven insights and innovation. Rituraj is excited to apply his expertise to impactful projects.
Portfolio
Experience
- Python - 7 years
- ETL - 6 years
- Google BigQuery - 6 years
- Data Warehousing - 5 years
- PySpark - 5 years
- Apache Airflow - 5 years
- Machine Learning Operations (MLOps) - 4 years
- Data Build Tool (dbt) - 3 years
Availability
Preferred Environment
Python, PySpark, Apache Airflow, Data Build Tool (dbt), SQL, Data Modeling, ETL, Data Warehousing, Machine Learning Operations (MLOps), Kubeflow
The most amazing...
...thing I've done was lead a chatbot retail project that built a scalable data warehouse and deployed a recommendation engine, enhancing user engagement by 40%.
Work Experience
Senior Data and MLOps Engineer
Zeals
- Developed and optimized scalable data marts using ETL pipelines in Python, SQL, Spark, and Airflow, reducing query times by 35% over a period of 12 months.
- Engineered a robust data pipeline using dbt, GCP BigQuery, Data Catalog, and Apache Airflow to process and analyze terabytes of data weekly, improving data retrieval times by 50% and enabling more accurate predictive modeling.
- Built a Vertex AI Pipelines-based production pipeline for ML use cases, enhancing training efficiency by 50% and availability by 40%.
- Implemented real-time streaming pipelines using PySpark for a recommendation engine, leading to a 20% increase in user engagement and a 15% boost in conversion rates by delivering personalized offers in real time.
- Optimized the data processing pipeline, achieving an 85 – 90% reduction in costs and processing time and increasing the speed by ten times.
Senior Data Engineer
Quantiphi
- Developed and deployed a healthcare analytics platform and data warehouse on GCP Cloud, utilizing BigQuery for data storage and analytics and Dataflow for efficient data processing.
- Implemented automated testing frameworks using Python and pytest, achieving a 40% reduction in manual testing time and enhancing the reliability of data pipelines.
- Achieved a 60% improvement in data accessibility and reduced processing time by 70% through optimized data pipelines.
- Utilized data engineering tools, including Airflow for workflow management and dbt for data transformation on GCP. Provided actionable insights to healthcare professionals, resulting in a 30% enhancement in business KPIs delivery efficiency.
- Collaborated with data scientists to enhance AI models for predictive analytics in patient outcomes and personalized treatments.
- Integrated AI models into production using the GCP AI platform and TensorFlow, leveraging Vertex Pipelines for machine learning operations, which improved prediction accuracy by 20%.
- Used federated learning for cross-country data analysis and model training to ensure data sensitivity and governance, utilizing NVIDIA Clara for enhanced data security and compliance.
Data Engineer
Quantiphi
- Enhanced business insights through advanced data engineering techniques, leading to a 20% increase in sales efficiency and more effective targeting by sales teams.
- Created a scalable GCP data warehouse, optimizing data accessibility and analytics capabilities for managing large datasets effectively using Airflow, PySpark, and GCP BigQuery.
- Optimized data processing and analysis workflows with GCP services and PySpark, improving operational efficiency and decision-making, which resulted in a 30% reduction in data processing time and a 25% reduction in infrastructure cost.
- Implemented CRM solutions integrating GA 360, Salesforce Marketing Cloud, and other data sources to enhance user profiling and targeted marketing strategies, resulting in a 15% increase in conversion rates.
- Developed a real-time streaming solution using PySpark for marketing analytics projects, resulting in a 30% reduction in data processing time and accurate campaign performance insights, driving a 25% increase in marketing ROI.
Software Engineer
Quantiphi
- Designed and implemented a GCP-hosted microservices platform for speech and recognition analytics, ensuring secure and efficient resource access.
- Deployed data workflows and pipelines on GCP, reducing data processing time by 30% and accelerating model training cycles.
- Collaborated with a data scientist to improve speech recognition accuracy, driving business growth and customer satisfaction.
- Developed back-end services integrating analytics KPIs, such as user interaction metrics and speech analytics use cases, providing useful data features for enhancing model performance, resulting in a 25% increase in speech recognition accuracy.
- Demonstrated expertise in data engineering, MLOps, and cloud infrastructure to deliver impactful solutions aligned with business objectives.
Experience
Chatbot Analytics and Recommendation
The platform was designed to prioritize personalized user interactions and campaign optimization, aiming to improve user engagement and enhance campaign performance through advanced data analysis and machine learning techniques.
Healthcare Data Analytics Platform
I assisted in setting up an automated MLOps pipeline to streamline the deployment, monitoring, and maintenance of machine learning models, ensuring efficient and consistent insight delivery.
Marketing Analytics Platform
Speech Analytics Platform
The platform ensured secure and efficient resource access while integrating back-end services with analytics KPIs, such as user interaction metrics and speech analytics use cases. The integrations provided valuable data features, resulting in a 25% increase in speech recognition accuracy. By deploying optimized data workflows and pipelines on GCP, I reduced data processing time by 30% and accelerated model training cycles.
I collaborated closely with a data scientist and leveraged my expertise in data engineering, MLOps, and cloud infrastructure to deliver impactful solutions aligned with business objectives, driving growth and enhancing customer satisfaction.
Education
Bachelor's Degree in Information Technology
VIT University - Vellore, Tamil Nadu, India
Certifications
Machine Learning for Business
Coursera
Associate Cloud Engineer
Google Cloud
Serverless Data Analysis with Google BigQuery and Cloud Dataflow
Coursera
Big Data Integration and Processing
Coursera
Skills
Libraries/APIs
PySpark, REST APIs, SOAP APIs, YouTube API
Tools
Cloud Scheduler, Apache Airflow, Tableau, Salesforce Sales Cloud, Slack, Jira, Apache Beam
Platforms
Google Cloud Platform (GCP), Software Design Patterns, Vertex AI, Kubeflow, Apache Kafka, Google Analytics 360, Docker, Kubernetes, Amazon Web Services (AWS), Google App Engine, Cloud Run, Shopify
Languages
Python, SQL
Frameworks
Flask, Hadoop, Apache Spark
Paradigms
ETL, Management
Storage
Databases, MongoDB, NoSQL, Data Lakes, Google Cloud SQL, Apache Hive
Other
Google BigQuery, Teamwork, Communication, ELT, Data Build Tool (dbt), Data Modeling, Data Warehousing, Machine Learning Operations (MLOps), Data Analysis, System Design, Data Marts, Model Deployment, Model Monitoring, Google Data Studio, GotoWebiner, Data Loss Prevention (DLP), Delta Lake, FTP Servers, Business Analysis, Data Visualization, Data Masking, Batch and Stream Pipeline, Machine Learning, Big Data, MLflow
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring