
Vaibhav Patel
Verified Expert in Engineering
Back-end Developer
Abu Dhabi, United Arab Emirates
Toptal member since January 6, 2021
Vaibhav is an AI engineer who builds and evaluates production LLM and multi-agent systems in Python, specializing in source-cited retrieval, agentic orchestration (MCP, LangGraph, ADK, and Semantic Kernel), fine-tuning, quantization, and open-weight model serving. He brings a background in computational science and high-performance computing (HPC), including CUDA and multi-core programming.
Portfolio
Experience
- Python - 8 years
- Node.js - 6 years
- Deep Learning - 5 years
- Natural Language Processing (NLP) - 4 years
- C++ - 4 years
- Large Language Models (LLMs) - 3 years
- Retrieval-augmented Generation (RAG) - 2 years
- OpenAI GPT-4 API - 1 year
Preferred Environment
Amazon Web Services (AWS), Python, Node.js, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), C++
The most amazing...
...thing I've created an AI GitHub review system with RAG, performing RCA on code issues, suggesting fixes, and automating seamless integration into workflows.
Work Experience
AI Engineer
Investment Fund
- Designed and deployed a production multi-agent LLM system for executive decision workflows, coordinating planner, retrieval, and task-specific agents to produce source-cited, human-approvable recommendations over heterogeneous enterprise data.
- Built specialized detector-style agents that combine deterministic domain rules with LLM reasoning to surface high-value findings, each carrying a confidence score and exact source citations for downstream human review.
- Owned the evaluation harness: a gold-set regression suite plus an LLM-as-judge pipeline calibrated against human labels, gating releases in CI and catching accuracy regressions before they ship—protecting against silent degradation.
- Orchestrated agentic pipelines with Semantic Kernel, LangGraph-style execution graphs, and MCP tool integrations, enabling dynamic tool use, multi-step reasoning, and reliable routing across structured and unstructured data.
- Engineered the retrieval stack—hybrid dense and BM25 search, cross-encoder reranking, and context compression—to raise factual grounding and reduce hallucination in high-stakes decision workflows.
- Instrumented the system with end-to-end tracing and observability (token usage, latency, tool-call traces, and retrieval diagnostics) to monitor quality and cost in production.
- Implemented data-security controls for sensitive financial data—access scoping, PII handling, and guardrails around tool and context exposure—to meet institutional governance requirements.
- Designed a cost-aware model routing strategy directing work between frontier APIs and self-hosted open-weight models by task complexity, balancing accuracy against latency and token cost.
- Prototyped small-language-model pretraining and transformer internals in PyTorch (tokenization, training loops, and inference) to ground model-behavior and efficiency trade-off decisions in first principles.
- Partnered with cross-functional stakeholders to deploy agents into live operational workflows with the robustness, observability, and auditability that regulated decision-making requires.
Back-end Engineer
ReturnQueen
- Fine-tuned an open-weight LLM with LoRA and PEFT for business-specific email parsing—curating the training set and benchmarking the tuned model against prompt-only baselines to prove a measurable lift in structured-extraction accuracy.
- Quantized and served the fine-tuned model for high-throughput, low-cost inference, integrating it into existing back-end systems under production load.
- Productionized an LLM-powered extraction pipeline that turns messy, unstructured business documents into reliable structured outputs, with high-throughput inference wired into back-end services.
- Built validation and regression checks for extraction quality, catching format and edge-case failures before they reached customer-facing workflows.
- Designed agent orchestration for automated return flows—intent understanding, policy lookup, data extraction, and multi-step execution—coordinating LLM reasoning with back-end tools and business rules.
- Implemented stateful orchestration with retry and fallback handling across multi-step execution paths, managing intermediate decisions and tool outputs.
- Grounded automated decisions in retrieval over return policies, order data, and workflow constraints, keeping actions aligned with operational rules.
- Implemented guardrails and human-review fallback paths for automated actions, improving reliability on edge cases and reducing failure risk in customer-facing flows.
- Built scalable Python back-end services and distributed pipelines (Kafka and Spark) on AWS, sustaining high-throughput ingestion with reliable horizontal scaling under production load.
Research Engineer
Raxter
- Built a robust document-ingestion pipeline for messy, heterogeneous research-paper PDFs—combining and reconciling the outputs of multiple PDF parsers into clean, structured text; deployed on AWS.
- Created and deployed a document-understanding service performing structured, sentence-level extraction over full paper text—classifying each sentence into buckets such as research goal, novelty, and limitations; served on EC2 and AWS Lambda.
- Productionized a figure- and layout-extraction library for non-text document elements, deployed on EC2 with health-check alarms, auto-scaling, and reporting.
- Developed and deployed advanced NLP and text-to-speech models, enhancing the platform's capability to convert PDFs and web pages into audio notes with high accuracy and naturalness.
- Led rigorous testing and validation of NLP and speech models, reducing errors and output mismatches by around 20% before release.
- Developed and optimized NLP and multilingual text-to-speech models in PyTorch (English, Spanish, and Mandarin), improving output quality by approximately 30% and lifting user engagement by approximately 25% once integrated into the platform.
- Containerized model deployments with Docker for portability and scaling across environments.
Full-stack Developer
Lyearn
- Created an internal npm library for reporting. Fetched and formatted data from Elastic Search, S3, and DynamoDB.
- Worked on a new logic in Express.js, specifically Node.js, when an asset is granted or revoked to sub-account, which reduced the computation and database cost 100 fold.
- Worked on the dark theme, live training, reporting, and live class on the platform.
Research Engineer and Full-stack Engineer
Visionion
- Provided services for prototyping, development, and deployment of machine learning algorithms.
- Worked on image classification, object detection, and image super-resolution.
- Contributed to an IoT project in Python and React, which uses serial communication and picoscope hardware.
Machine Learning Engineer
Infocusp
- Explored various deep learning algorithms to predict next frames using the existing user-drawn sketch.
- Proposed a novel algorithm pipeline using optical flow, RNN, and CNN to solve the problem.
- Implemented the algorithm using TensorFlow, Keras, and PyTorch. Trained and deployed the models on AWS EC2.
Experience
GitHub Pull Requests Analysis System with RCA and RAG Integration
Classification of Plant Disease Using Convolutional Neural Networks
Motion-based Image Super Resolution
Tone Mapping HDR Images
Rice Classification Using CNNs
GPU Kernel Optimization and Competitive GPU Programming (CUDA)
I apply that depth to modern LLM serving by understanding the GPU substrate behind quantization (INT8/INT4, GGUF/AWQ), KV-cache management, and continuous batching, as well as the latency-throughput trade-offs that drive cost in vLLM/TGI-style inference.
Education
Bachelor's Degree in Computer Science
DA-IICT - Gandhinagar, India
Certifications
Neural Networks for Machine Learning
Coursera
Machine Learning
Coursera
Skills
Libraries/APIs
Node.js, PyTorch, REST APIs, Claude API, React, SpaCy, TensorFlow, Keras, Scikit-learn, NumPy, SciPy, OpenCV, Matplotlib, SQLAlchemy, Playwright, LSTM, Pandas, MobX, llama.cpp, vLLM
Tools
Jupyter, ChatGPT, AI Prompts, Claude, Azure OpenAI Service, Claude Code, Git, Zapier, n8n, Codex, Amazon Elastic Container Service (ECS), Apache Airflow, Claude Agent SDK
Languages
Python, SQL, C++, Python 3, C, JavaScript 6, JavaScript, CSS, HTML, HTML5, TypeScript
Paradigms
Model Context Protocol (MCP), Event-driven Architecture, Microservices Architecture, Serverless Architecture, Microservices, API Architecture, High-performance Computing (HPC)
Platforms
NVIDIA CUDA, Amazon Web Services (AWS), AWS Lambda, Docker, Databricks, Kubernetes, Azure, Google Cloud Platform (GCP)
Frameworks
Flask, Selenium, LangGraph, Caffe, Redux, Angular, OpenCL, LlamaIndex
Storage
Amazon DynamoDB, Elasticsearch, MySQL, Data Pipelines, MongoDB, PostgreSQL, PostGIS
Other
Deep Learning, Computer Vision, Machine Learning, Natural Language Processing (NLP), Artificial Intelligence (AI), Optical Character Recognition (OCR), Generative Pre-trained Transformers (GPT), Data Analysis, Data Engineering, BERT, Large Language Models (LLMs), Text-to-Speech (TTS), GPU Computing, OpenAI GPT-4 API, OpenAI GPT-3 API, Retrieval-augmented Generation (RAG), LangChain, Multi-agent Systems, Prompt Engineering, OpenAI, AI Agents, CI/CD Pipelines, Agentic AI, RAG Architecture, FastAPI, Software Architecture, Agentic RAG Systems, APIs, Anthropic, Machine Learning (ML) APIs, AI Integration, API Integration, Workflow Automation, Architecture, RAG Pipelines, Solution Architecture, Vector Databases, Cursor AI, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Pattern Recognition, Image Classification, Computer Vision Algorithms, Recommendation Systems, Pinecone, Vector Search, Data Scraping, AI Tools, Reinforcement Learning from Human Feedback (RLHF), Data Anonymization, Supabase, AI Chatbots, Large Language Model Operations (LLMOps), Financial System Implementation, Finance, Fine-tuning, Data Security, Observability, AI Assistants, Workflow Automation & System Integration, Mathematics, Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs), Big Data, Data Science, Data Architecture, Custom BERT, Document Processing, Object Detection, Unsupervised Learning, Object Tracking, Video Analysis, OpenCL/GPU, Time Series, Time Series Analysis, Neural Networks, Business Requirements, modal, Phonemes, Serverless GPUs, Image Processing, Generative Artificial Intelligence (GenAI), Full-stack, Chatbots, LoRa, CUDA Kernel, Quantization, Optimization, Llama 3, Meta Llama
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring