Shivayogi is available for hire

Shivayogi Math Doddabasayya

Verified Expert in Product Management

Product Manager

Bengaluru, Karnataka, India

Toptal member since October 15, 2025

Expertise

Agile Product Management AI Product Management

Bio

Shivayogi is a product leader with over 20 years of technology experience, including four years in AI/ML product development. He bridges the gap between cutting-edge AI capabilities and measurable business outcomes, transforming operational challenges into strategic advantages through intelligent automation. Shivayogi has led AI initiatives that significantly reduced operational costs, improved customer satisfaction by 28%, and automated 65% of repetitive workflows.

Project Highlights

ML Model to Optimize Delivery Time & Predict Driver Demand

Inefficient Information Retrieval from Educational Course Documentation

End-to-end MLOps & Kubernetes Deployment for AI-driven Text Analytics

Expertise

App Development
Automation
Cloud
Continuous Deployment
Continuous Integration (CI)
Databases
Kubernetes
Python 3

Work Experience

Senior Lead

2020 - PRESENT

Kyndryl

Modernized legacy applications for cloud deployment for a client that lacked automated CI/CD infrastructure, causing deployment delays and manual errors.
Defined deployment success criteria with stakeholders: zero-downtime releases, automated rollbacks, and an 80% reduction in deployment time. Prioritized Amazon EKS over alternatives based on the client's existing cloud footprint and team skills.
Established monitoring and observability as non-negotiables from day one. Reduced deployment time from hours to minutes. Enabled multiple daily releases vs. monthly deployment windows. Zero production incidents during the rollout phase.

Senior Lead

2020 - PRESENT

Kyndryl

Led product strategy and execution for an intelligent automation platform. Focused on ticket classification first (highest ROI, fastest deployment) before expanding to auto-resolution.
Worked with IT service managers and support teams to define success metrics: resolution time, SLA compliance, and agent productivity. Deployed in stages to build stakeholder confidence and refine based on real usage.
Chose to build custom NLP models when off-the-shelf solutions couldn't meet accuracy requirements for enterprise workflows. Delivered 40% faster resolution times (from 6.8 hours to 4.1 hours), directly improving employee productivity.
Integrated Kubernetes deployments with auto-scaling and secure access using IAM roles and ALB-based Ingress for production-ready environments. Delivered $1.8 million annual cost savings through automation of 65% of routine tickets.
Shifted support teams from reactive firefighting to proactive problem-solving. The system processes over 5,000 tickets daily, with the capacity to scale three times without requiring additional headcount.
Built a multi-database disaster recovery platform supporting MS SQL, MySQL, PostgreSQL, and Oracle. Delivered automated failover capabilities, reducing downtime by 85% and protecting $50+ million in business-critical data for enterprise clients.

Staff Engineer

2017 - 2020

McAfee

Worked with security operations teams to understand their daily workflow pain points. Prioritized precision over recall—better to catch fewer threats accurately than flood teams with noise.
Designed scalable ingestion pipelines on AWS and GCP for real-time data processing. Built iterative feedback loops to continuously improve model performance based on analyst input.
Reduced false positive rate by 60%, allowing analysts to focus on genuine threats. Improved threat detection accuracy while decreasing investigation time. Solution became a key differentiator in client renewals.
Leveraged Ansible for infrastructure automation across disaster recovery environments, orchestrating database deployments and configuration management. Reduced manual provisioning time by 70% and eliminated configuration drift across 500+ servers.

Lead

2014 - 2017

Wipro

Collaborated with clinical stakeholders to define must-have compliance requirements vs. nice-to-have features. Balanced regulatory constraints with innovation—chose an architecture that allowed iterative AI capability additions.
Enabled real-time clinical decision support for medical professionals. Achieved 100% regulatory compliance with HL7 v2.5.1 standards. Reduced image analysis time from hours to minutes.

Lead

2011 - 2014

Zynga

Defined platform requirements with game development teams and operations.
Prioritized plugin architecture for flexibility as new games launched. Measured success by infrastructure uptime, player experience metrics, and operational cost per user. Outcomes: Supported 10+ million concurrent sessions with 99.9% uptime.
Reduced infrastructure management overhead by 50%. Enabled rapid deployment of new game titles without platform rewrites.

Project History

ML Model to Optimize Delivery Time & Predict Driver Demand

83% — Accuracy
75% — Demand Forecast
25% — Churn Reduction

A mid-sized food delivery platform faced critical operational challenges with its legacy hosted system, which resulted in poor customer retention and high costs. Their on-time delivery rate was 65%, meaning 35% of orders arrived late. This caused customer dissatisfaction and a 5% monthly churn rate, leading to $250,000 in monthly lost revenue from 100,000 orders.

PAIN POINTS
• Operational Inefficiency: Manual delivery partner assignment took 10-15 minutes per order, creating peak-hour bottlenecks.
• Inaccurate Time Predictions: Rule-based ETA ignored real-time traffic, weather, and demand, causing late deliveries.
• No Demand Forecasting: The platform couldn’t predict order volume, causing idle partners or delayed orders in various zones.
• Slow Deployment: The legacy system took 4-6 months to onboard new clients, compared to 8-12 weeks for SaaS competitors.
• Limited Scalability: The monolithic architecture restricted expansion and updates, requiring downtime and custom development.

CONTEXT AND DATA
• 15,000+ historical orders across 22 cities
• 65% on-time delivery rate vs. 75-80% industry standard
• 5% churn rate with 500 customers lost per 10,000 base
• An average customer lifetime value of $500
• Customer acquisition cost was at $20-50

This set the stage for building an ML-powered SaaS solution to improve prediction accuracy, automate operations, and reduce time-to-value.

5-STEP SOLUTION

Analyzed 15,000+ orders from my IISc capstone, shadowed 5-6 operations managers, and reviewed support tickets. Discovered rule-based systems ignored traffic or weather, causing only a 65% on-time rate.
Built 20+ input features: distance, real-time traffic with Google Maps API, weather, delivery partner rating, time of day, and festivals. Used sin/cos transformations for cyclical patterns and created lag features for time trends.
Tested linear regression, random forest, and XGBoost won. Used Optuna for hyperparameter tuning across 100+ combinations. Did 5-fold cross-validation with time-based splits, and achieved a mean absolute error of 4.2 minutes.
Deployed a FastAPI microservice with an under 200-millisecond response, Redis caching of 1-hour TTL—reducing monthly API costs from $500 to $200— and Kubernetes auto-scaling for rush hours. A/B-tested the 20% to 100% gradual rollout and created a multi-layer fallback.
Ensured AI tracked model drift, weekly auto-retraining on Sundays with new data, MLflow tracked versions for rollback, and operations managers flagged bad predictions—the system learns from mistakes and continuously improves.

UNIQUE VALUE
• Real ML experience: 83% vs. 70% industry from the capstone project
• Cost-conscious: Redis saved $300 monthly per customer
• Reliability: 3-layer fallback never completely fails
• Business impact: 72% on-time rate, 30% churn reduction, $177,000 yearly savings

Achieved an 83% R² score using XGBoost with Optuna hyperparameter tuning, predicting food delivery times, a 75% driver demand forecast by building an XGBoost Classifier, and a 25% customer churn reduction via accurate delivery time estimates and proactive delay notifications.

RESULTS

Delivery Performance

• 11% improvement for on-time delivery from 65% to 72%
• 65% better prediction accuracy from 12 minutes to 4.2 minutes
• 35% monthly customer complaints reduction from 120 to 78

Business Metrics

• 30% drop in customer churn from 5% to 3.5% equivalent to $75,000 yearly savings
• 50% cut in monthly delivery waste from $20,000 to $10,000, equivalent to $120,000 yearly savings
• 60% faster deployment from 4-6 months to 16 weeks
• $237,000 total annual savings per customer

User Adoption

• Delivery app: 90% within 30 days
• Manager usage: 85% daily
• ML adoption: 75% within 60 days

Technical Excellence:

• 200-millisecond API response
• 99.9% uptime
• 83% R² score in model accuracy

LONG-TERM BENEFITS
A data flywheel—more orders, better data, accurate predictions, happier customers, more orders—an 83% ML accuracy that rule-based competitors can’t match, 18% higher lifetime value, and a cloud that supports 10x growth, brand loyalty—the 72% on-time rate driving 25% repeat orders and 40% more referrals.

Inefficient Information Retrieval from Educational Course Documentation

15% — Answer Accuracy Improvement
40% — Faster Retrieval Workflow
60% — Troubleshooting Time Reduction
50% — Perceived Response Time Boost

Students and instructors in the AI/MLOps program struggled to efficiently extract specific information from multiple assignment instruction PDFs and course materials.

PAIN POINTS
• Time-consuming Manual Search: Students spend 15-20 minutes manually reading multiple PDF documents using Ctrl+F keyword search to find specific assignment requirements, submission guidelines, or grading rubrics, reducing productive learning time.
• Keyword Search Limitations: Traditional PDF search tools cannot understand semantic relationships. For example, searching for “deployment instructions” won’t find related information about “Hugging Face Spaces setup,” even though they refer to the same concept, resulting in 40% of relevant information missing.
• Repetitive Instructor Queries: Teaching assistants receive 50-100 repetitive questions weekly about information already documented in assignment PDFs, consuming 10-15 hours that could’ve been spent on meaningful mentorship.
• Context Loss Across Documents: Students lack a unified query interface to search across all documents simultaneously.
• No Contextual Understanding: There were no direct answers; documents had to be manually read.
• Late Discovery of Requirements: Students often discover critical requirements—team activity, mentor presentation timing, attendance rules—buried in the middle or end of PDFs only after starting work, leading to 30% of submissions having format or process errors.

As a solution, I developed an intelligent document query system using LangChain and LangGraph to orchestrate retrieval workflows with vector embeddings, semantic search, and LLMs.

Ingestion: Used PyPDFLoader and DirectoryLoader to load PDFs and extract text with metadata.
Chunking: Applied RecursiveCharacterTextSplitter with tuned chunk_size and chunk_overlap to retain context.
Embedding and Storage: Generated embeddings via HuggingFace MTEB models; stored in FAISS and tested Chroma and Pinecone.
RAG Pipeline: Built RetrievalQA chain combining LangChain retriever with Llama-2-7B/Mistral-7B. Added ConversationalRetrievalChain for multi-turn Q&A with memory, compared standard vs. MMR retrievers, and used PromptTemplate to ground answers in retrieved content.
Orchestration in LangGraph: Created state machine with query analysis, retrieval, reranking, generation, and validation nodes, and conditional edges reroute low-confidence queries. Added self-reflection for rephrasing and re-retrieval.
Validation: Tested with five realistic questions—grading, HF Spaces deployment, deadlines, project differences, and dataset choices—and evaluated groundedness via QAEvalChain.
Gradio Interface: Wrapped ConversationalRetrievalChain in Gradio with history and “show sources” to display chunks.
Deployment: Containerized and deployed on Hugging Face Spaces with GPU support and enabled streaming responses via callbacks.

Improved query answer accuracy by 15%, achieved a 40% faster retrieval workflow, and increased perceived response time by 50%.

Orchestration Efficiency:
• LangGraph State Management: Complex multi-step retrieval workflows were executed 40% faster than custom pipeline code.
• Conditional Logic: LangGraph’s conditional routing automatically retries with expanded search when initial retrieval confidence is low, improving answer accuracy by 15%.

Developer Productivity:
• LangChain Abstractions: Reduced development time from three weeks to one week using LangChain’s pre-built components.
• Prompt Engineering: LangChain’s PromptTemplate and FewShotPromptTemplate enabled rapid experimentation with 10+ prompt variations.

Maintainability:
• Modular Architecture: LangGraph nodes can be updated independently.
• LangSmith Integration: Logged all LangChain runs for debugging, showing exact chunks retrieved and LLM reasoning, reducing troubleshooting time by 60%.

Advanced Features:
• Conversational Memory: LangChain’s memory buffers enable follow-up questions.
• Agent-Based Routing: LangGraph agents can route queries to different knowledge bases based on query type.

Scalability:
• LangChain LCEL: Parallel retrieval execution from multiple vector stores reduced query latency from five seconds to two seconds.
• Streaming: LangChain streaming callbacks provide token-by-token LLM output, improving perceived response time by 50%.

End-to-end MLOps & Kubernetes Deployment for AI-driven Text Analytics

0.88-0.94 — F1 Score
0.90 — Precision
85-92% — Multi-class Accuracy
5% — Model Drift Detection Accuracy

PROBLEM
Text classification challenges in ticket routing: fastText model limitations in the production environment.

Poor Contextual Understanding

• fastText was a bag-of-words with no semantic context
• “urgent server down” vs. “server down resolved” are in the same category
• Accuracy was at 72% and needed to be at 85%
• Misrouting was at 15% and caused a $180,000 yearly rework

Out-of-vocabulary Failures

• Product codes and error messages had a 20% error rate
• “SAP timeout” vs. “Oracle timeout” were treated identically
• Wrong routing caused 4-6-hour delays

Multilingual Gaps

• English 72%, Mandarin 58%, German 61%
• Separate models per language caused 3x training effort
• Code-mixed tickets failed

Intent Misclassification

• “Critical issue” vs. “routine query” had the same priority
• False escalations were at 25%, wasting senior engineers’ time

TRIED MISTRAL
• 800 milliseconds to 1.2 seconds latency, exceeding the under-500-millisecond SLA
• GPU costs were at $450 monthly per instance
• 8% hallucinations on technical jargon
• Long tickets truncated

WHAT WASN’T WORKING
• fastText: Fast at 50 milliseconds, but was 72% inaccurate
• Mistral: 88% accurate, but slow and expensive
• Need: Balance for 10,000+ tickets per day

SOLUTION
Optimized BERT for production text classification

Week 1-2: Model Selection
• Evaluated BERT-base, DistilBERT, ALBERT vs. fastText
• Selected DistilBERT, which was 40% faster and had 97% BERT accuracy
• Saw a 150-200-millisecond latency, which met the under-500-millisecond SLA at a 60% smaller rate
• Tested 5,000 labeled tickets and validated on real data

Week 3-4: Hybrid Architecture

fastText pre-filter was at 50 milliseconds and handled 60% of simple cases
BERT for complex tickets was at 200 milliseconds for semantic understanding
The result was a 95-millisecond average latency vs. the 200-millisecond full BERT
Infrastructure comprised of AWS EKS and GPU nodes, batch inference, ONNX Runtime speeding up 2.3x, and FP16 quantization reduced memory by 50%

Week 5-6: Training Approach
• 42,000 historical tickets cleaned
• Handled class imbalance: Weighted loss (40% zero-class)
• Pre-trained DistilBERT and three domain-specific epochs
• Created a multilingual mBERT single model covering English, Mandarin, and German

Week 8-10: Deployment and Validation
• A/B tested 10% BERT traffic, monitored accuracy and latency
• Results were 91% accuracy vs. 72% and 4% misrouting vs. 15%
• Used Prometheus/Grafana for real-time performance monitoring
• Auto-scaled with Kubernetes HPA

KEY INNOVATIONS
• A hybrid pipeline for speed and accuracy
• Production-ready ONNX optimization
• mBERT eliminated three separate models
• Applied food delivery zero-inflated data handling

Translated food delivery ML—92% R², 54 clusters, and zero-inflated data—to enterprise text classification. Achieved 91% accuracy and a latency of under 200 milliseconds through a hybrid architecture, delivering 3.1x ROI and 39% faster resolution.

BERT IMPLEMENTATION OUTCOMES

Classification Accuracy

• 26% accuracy improvement from 72% to 91%
• 73% misrouting reduction from 15% to 4%
• Multilingual gaps were unified to 91% and 87% from 72% for English and 58% for Mandarin 58%

Operational Efficiency

• $180,000 yearly savings in manual re-classification costs.
• 39% faster resolution time from 6.2 hours to 3.8 hours
• 450 engineer hours saved per year with false escalations dropping from 25% to only 6%
• 18% better support ticket routing, reducing ticket volume

System Performance

• 120-millisecond hybrid latency achieved vs. 50-millisecond fastText and 800-millisecond Mistral
• 2.4x throughput increase from 5,000 to 12,000 tickets daily
• SLA compliance improved from 71% to 94%

Business Impact

• 62% customer NPS increase from 42% to 68%
• Customer churn rate dropped from 28% to 19% with improved service quality
• 3.1x ROI with $96,000 infrastructure cost and $300,000 savings

LONG-TERM BENEFITS
• Scalability: Single mBERT supports new languages (no retraining)
• Adaptability: Client onboarding dropped from two weeks to three days
• Competitive edge: AI routing vs. competitors’ rule-based systems
• Proactive insights: Semantic clustering detects emerging issues

AI-driven IT Service Desk Automation

92% — Accuracy Ratio
100% — Validation
40% — Reduction in Ticket Resolution Time

Back in 2020, I worked as a traditional infrastructure engineer, managing IT service desk operations. Our team was drowning in repetitive tickets, password resets, system access requests, and basic troubleshooting that consumed 60% of our daily bandwidth. Talented engineers spent their time on mundane tasks instead of solving complex problems that could truly impact business outcomes.

I automated service desk operations by building AI-driven ticket classification and routing using NLP and MLOps pipelines. Repetitive tasks were handled by bots and self-service workflows, cutting manual effort by over 50%. Engineers moved to high-impact work, while monitoring, dashboards, and automated retraining kept models accurate. This improved SLA compliance, reduced backlog, and increased customer satisfaction through faster, more reliable resolutions. It also lowered costs, improved auditability, and built strong leadership trust in AI-driven operations overall value.

• Our BERT-based ticket classification system achieved 92% accuracy.
• Our AI-driven system reduces ticket resolution time by 40%, and our team focuses on strategic initiatives instead of repetitive work.

Education

2024 - 2025

Postgraduate Advanced Certification Program in AI and MLOps

Indian Institute of Science - Bangalore, India

2012 - 2014

Master's Degree in Business Administration – Finance

Symbiosis Institute of Business Management - Pune, India

2000 - 2003

Master's Degree in Computer Science

BMS College of Engineering - Bangalore, India

Certifications

MARCH 2025 - PRESENT

AIMLOps

IISC

JUNE 2017 - JUNE 2018

Certified Scrum Master

Scaled Agile Inc

Skills

Tools

Zoom, Ansible

Paradigms

Continuous Deployment, Agile Product Management, DevOps, Scrum, Azure DevOps

Other

Automation, Python 3, Cloud, Deployment, App Development, Kubernetes, Continuous Integration (CI), Databases, Generative Artificial Intelligence (GenAI), Agile Product Delivery, AIOps, Machine Learning Operations (MLOps), Computer Science, Business Administration, Finance, Security, LangGraph, LangChain, Vector Databases, Transformers, Monitoring, BERT, Terraform, Cloud Automation, Product Ownership, Machine Learning, Artificial Intelligence (AI), Python 2, RESTFul APIs

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring