
Amit Yadav
Verified Expert in Engineering
MLOps Developer
Lucknow, Uttar Pradesh, India
Toptal member since August 16, 2024
Amit is a certified Google Cloud Platform professional cloud architect. He is a senior ML engineer and data scientist with years of experience in MLOps, infrastructure development, and automation and development in Python. Skilled in AI research and analyst environments, Amit exhibits excellent organizational and problem-solving skills and works well in team environments.
Portfolio
Experience
- Machine Learning - 9 years
- Model Monitoring - 8 years
- ML Pipelines - 5 years
- Large Language Models (LLMs) - 4 years
- Kubernetes - 3 years
- Kubeflow - 3 years
- Milvus - 2 years
- Large Language Model Operations (LLMOps) - 2 years
Preferred Environment
Kubeflow, ML Pipelines, Model Monitoring, Model Deployment, MySQL, Deep Learning, Machine Learning, TensorFlow, PyTorch, Python 3
The most amazing...
...thing I've developed is a conversational application that uses LLMs to complete and answer the user's requested output.
Work Experience
Machine Learning Operations (MLOps) Engineer
Publicis Groupe
- Built and fine-tuned large language models (LLMs) with LoRA and generative AI. Developed the SAGE project using the RAG pipeline, LLMs, and prompt engineering. Deployed MLOps pipeline with Kubeflow, TensorFlow Extended (TFX), and Docker.
- Worked with cross-functional teams to integrate AI into the hybrid integration platform (HIP). Automated descriptive analysis using Python, GCP Composer, and unit tests. Designed the automated ML pipeline with Kubeflow, TFX, Docker, and BigQuery.
- Automated 1D CNN pipelines for wind turbine anomaly detection, deployed with Azure CI/CD and Docker. Streamlined data preprocessing using Python scripts.
- Developed the YOLO (you only look once) model for turbine blade defect detection; deployed in the cloud and integrated with drones.
MLOps Engineer (GCP)
General Mills
- Designed and implemented a fully automated ML pipeline using Kubeflow, orchestrated with TensorFlow Extended. Containerized ML components using Docker and utilized BigQuery as the data warehouse for seamless data management.
- Orchestrated data and ML pipelines with Airflow and GCP Composer as managed services for demand forecasting data pipelines. Leveraged BigQuery for data warehousing, ensuring efficient data flow and processing within the pipeline.
- Implemented model serving and monitoring to detect model and data drift, ensuring the models' performance and accuracy remain optimal over time.
- Utilized GCP Cloud Functions for event-driven data processing and orchestration tasks, enhancing pipeline flexibility and scalability.
- Implemented real-time data ingestion and processing using Cloud Pub/Sub, enabling timely and efficient data flow through the ML pipelines.
- Integrated Cloud Storage for scalable and durable storage of datasets and model artifacts, ensuring high availability and accessibility.
MLOps Engineer
Suzlon
- Developed and automated deep learning 1D CNN model pipelines to detect anomalies in wind turbine gearboxes and main bearing failures. Implemented these pipelines in Azure CI/CD with Docker containers for seamless integration and deployment.
- Developed and deployed machine learning and time series models to detect main bearing failures from oil sample data, enhancing predictive maintenance capabilities and orchestrating the pipeline.
- Created automated data preprocessing tasks using Python scripts to streamline the data preparation workflow.
- Designed and implemented reinforcement learning agents using DDPG (Deep Deterministic Policy Gradient) for optimizing real-time wind turbine torque control under varying wind conditions, balancing energy output with mechanical wear reduction.
- Developed discrete action-space agents using Q-learning to model turbine fault mitigation and adaptive maintenance decision policies, enabling proactive interventions in simulated wind farm scenarios.
Data Scientist
thyssenkrupp
- Created CI/CD pipelines in Azure for machine learning models, ensuring efficient deployment and integration into production environments.
- Developed and trained a Facebook Prophet model for forecasting parts of car models for the client “Volkswagen Portugal,” improving inventory management and planning.
- Integrated unit tests in GitLab for MLOps CI/CD pipelines, ensuring robustness and reliability of the machine learning workflows. Developed and deployed a computer vision model using YOLOv5 to detect defects in conveyor belts in cement plants.
- Developed discrete action-space agents using Q-learning to model turbine fault mitigation and adaptive maintenance decision policies, enabling proactive interventions in simulated wind farm scenarios.
Data Analyst
Convergys
- Performed predictive analytics on telecommunication data to forecast trends and improve customer retention strategies. Implemented machine learning concepts, including data processing, supervised learning, and unsupervised learning, to enhance.
- Built a conversational chatbot to provide elementary information to telecommunication customers, utilizing NLP techniques for effective communication.
- Fetched data using MySQL queries from AWS Redshift, ensuring seamless data retrieval for analysis and reporting.
- Collected, interpreted, and analyzed large datasets to derive actionable insights for business decision-making.
Experience
Senior AI/MLOps & Platform Engineering
Multi-agent System for Charles Schwab | Google GenAI
My contributions included:
Architecture and Development: Designed and implemented a complex multi-agent architecture, focusing on creating custom tools to empower agents with specific functionalities. I engineered robust systems for information sharing among sub-agents and handled persistent memory bank management and user session management to ensure contextual and coherent interactions.
Deployment and Evaluation: Successfully deployed the agent system on the Vertex AI Agent Engine platform, utilizing Cloud Run for scalable and efficient operation. A critical part of my role was conducting rigorous performance and accuracy assessments. I developed and executed a detailed evaluation strategy using both automated pytest frameworks and advanced Google GenAI evaluation techniques to validate agent effectiveness and reliability.
AutoFlow Agentic Framework
The UserProxyAgent enabled users to clearly define their news preferences, initiating seamless agent collaboration. The ResearcherAgent employed web scraping and search tools to fetch targeted articles from sources like BBC and TechCrunch, while the SummarizerAgent distilled complex articles into concise summaries through Gemini LLM integration. Quality assurance was managed by the CriticAgent, ensuring summaries closely aligned with user-specified topics and standards. The DesignerAgent structured content into engaging digests, enhancing readability with organized layouts and optional visual elements.
My role encompassed the end-to-end development of agent interactions, integration of AF2’s dynamic communication tools, prompt engineering for specialized agents, and ensuring robust coordination via GroupChatManager. This significantly improved content personalization and operational efficiency.
LangGraph Agentic Framework on GCP
Torque Control Optimization in Wind Turbines Using DDPG
PROBLEM STATEMENT
Traditional torque control logic resulted in inefficiencies in energy output and mechanical wear. A learning-based adaptive control mechanism was needed.
APPROACH
• Designed a custom continuous-action Gym environment simulating torque-wind dynamics.
• Defined a reward function balancing power output vs. bearing stress.
• Trained DDPG agents to learn optimal torque application strategies under noisy wind profiles.
• Integrated the agent into a simulated turbine SCADA environment.
IMPACT
• Improved simulated energy output by 12%.
• Reduced mechanical stress indicators by 8%, as per synthetic benchmark testing.
• Paved the way for self-adjusting turbine control systems.
Fault Mitigation Strategy Using Q-learning in Simulated Wind Farms
PROBLEM STATEMENT
Turbine faults, such as overheating or gearbox failure, led to production loss. Existing logic-based mitigation was reactive.
APPROACH
• Modeled turbine behavior and maintenance schedules using discrete state-action pairs.
• Implemented Q-learning to learn optimal actions like shutdown, continue, or schedule inspection.
• Simulated multiple episodes with stochastic fault occurrences and repair costs.
INNOVATIONS
• Designed fault-specific state representations for Q-table optimization.
• Introduced reward shaping to penalize downtime and promote preventive actions.
RESULTS
• Reduced cumulative maintenance cost by around 20% in simulated runs.
• Increased uptime and aligned maintenance with fault trends.
Demand Forecasting Machine Learning Operations (MLOps) Orchestration
The GitLab CI pipeline containerized each component into a Docker container and involved stages like Docker Build and push to GCR, which was triggered automatically upon a push in the branch and merged with the main branch. In the Kubeflow pipeline orchestration system, the Desired Sensation Level (DSL) Method was used to define each component and the pipeline. Components within the pipeline refer to Docker to orchestrate the pipeline. Upon creation of the pipeline, it is submitted on the Kubeflow dashboard and can be viewed here. The Prometheus and Grafana dashboard monitors the mode, and the retraining pipeline is executed upon model drift.
Generative AI Application
To enhance the accuracy and relevance of the generated code, I implemented a retrieval-augmented generation (RAG) pipeline. This pipeline improves the model's performance by first building a contextual understanding of the user's request from the existing database. The context is then fed into the model, significantly boosting the accuracy and relevance of the responses. The application employs two fine-tuned large language models (LLMs), including Llama and Generative Pre-trained Transformer 2 (GPT-2). These models were fine-tuned explicitly on objective data using low-rank adaptation (LoRA) adapters, which allow for more efficient and targeted learning. As a result, the product is highly effective at generating accurate and context-aware code snippets and solutions tailored to developers' needs, enhancing their productivity and streamlining the coding process.
LLMOps Pipeline with vLLM Serving and Kubeflow on Azure Cluster
Predictive Forecasting & Industrial Operations Optimization
I architected ETL workflows using Azure Data Factory and built scalable data pipelines across Databricks and Snowflake to process high-volume turbine telemetry. Models were containerized with Docker and deployed via Azure CI/CD for production-grade integration. I also built Suzlatics, a predictive monitoring platform visualizing turbine health metrics in near real-time.
This role strengthened my expertise in time-series forecasting, anomaly detection, constraint-based maintenance planning, and operational optimization in high-risk industrial environments.
Demand Forecasting & Cloud Data Pipelines | General Mills
I provisioned GKE clusters and containerized ML components for scalable deployment. Using BigQuery as a centralized warehouse, I developed data pipelines handling structured historical sales data and automated retraining workflows to adapt to seasonality and demand shifts.
I implemented model monitoring and drift-detection strategies to maintain forecast stability in production. CI/CD automation with GitHub Actions ensured reliable testing and deployment.
This role deepened my expertise in cloud-native forecasting architectures, scalable ETL systems, and production-grade MLOps aligned with finance and operational decision-making.
Demand Forecasting for Car's Part | Volksvogen
Education
Master's Degree in Artificial Intelligence
Aegis School of Business - Mumbai, India
Certifications
Professional Cloud Architect
Google Cloud
Skills
Libraries/APIs
TensorFlow, XGBoost, PyTorch, OpenCV
Tools
Grafana, Google AI Platform, GCP Security, BigQuery, Composer, TensorFlow Serving, Apache Airflow, Azure ML Studio, Amazon SageMaker, Tableau, Azure Kubernetes Service (AKS), Windows ADK, ARIMA, SARIMA, Prophet ERP, Google Kubernetes Engine (GKE)
Languages
Python, Python 3, JavaScript, Snowflake, SQL
Platforms
Kubeflow, Vertex AI, Google Cloud Platform (GCP), Docker, Kubernetes, Azure, Firebase, Databricks, Cloud Run
Storage
Google Cloud, MySQL, Azure SQL Databases, Data Lakes, PostgreSQL
Frameworks
Multi-armed Bandits (MABs), Flask, Agentic Frameworks
Paradigms
ETL, Model Context Protocol (MCP)
Other
ML Pipelines, Model Monitoring, Model Deployment, Deep Learning, Machine Learning, Artificial Intelligence (AI), Machine Learning Operations (MLOps), Pipelines, Prometheus, Model Drift, Google Container Registry (GCR), Data Science, Random Forests, Statistical Modeling, Computer Vision, Big Data, Large Language Models (LLMs), Generative Artificial Intelligence (GenAI), Security, Endpoint Creation, Retrieval-augmented Generation (RAG), Llama 2, LoRa, Forecasting, Large Language Model Operations (LLMOps), Milvus, Q-learning, Agentic AI, Generative Pre-trained Transformer 2 (GPT-2), GitHub Actions, CI/CD Pipelines, Google Cloud Functions, Cloud Pub/Sub, MLflow, Amazon Forecast, Azure Databricks, Amazon SageMaker Pipelines, Data Warehousing, Natural Language Processing (NLP), Classification, Regression, Statistics, Autoflow, Gemini API, Prompt Engineering, Vector Search, ChromaDB, multimodel, AI Agents, Gemini, Google BigQuery, Multi-agent Systems, Vector Databases, Multistage LLM Chains, OpenAI GPT-4 API, DDPG, Reinforcement Learning, OpenAI, agent development kit, Multimodal GenAI, agent engine, agent evaluation, Agent Deployment, memory bank, A2A, SCADA, fb prophet, facebook prophet, Demand Forecasting, A/B Testing
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring