Vaibhav is available for hire

Vaibhav Patel

Verified Expert in Engineering

Back-end Developer

Abu Dhabi, United Arab Emirates

Toptal member since January 6, 2021

Expertise

Machine Learning Deep Learning Artificial Intelligence NLP Data Engineering Computer Vision Data Analysis LLM RAG Prompt Engineering OpenAI Python Node.js AWS

Bio

Vaibhav is an AI engineer who builds and evaluates production LLM and multi-agent systems in Python, specializing in source-cited retrieval, agentic orchestration (MCP, LangGraph, ADK, and Semantic Kernel), fine-tuning, quantization, and open-weight model serving. He brings a background in computational science and high-performance computing (HPC), including CUDA and multi-core programming.

Portfolio

Investment Fund

Artificial Intelligence (AI), Python, Docker...

ReturnQueen

Python, Large Language Models (LLMs), Natural Language Processing (NLP)...

Raxter

Artificial Intelligence (AI), Document Processing, Custom BERT, MySQL...

Experience

Python - 8 years
Node.js - 6 years
Deep Learning - 5 years
Natural Language Processing (NLP) - 4 years
C++ - 4 years
Large Language Models (LLMs) - 3 years
Retrieval-augmented Generation (RAG) - 2 years
OpenAI GPT-4 API - 1 year

Preferred Environment

Amazon Web Services (AWS), Python, Node.js, Large Language Models (LLMs), Retrieval-augmented Generation (RAG), C++

The most amazing...

...thing I've created an AI GitHub review system with RAG, performing RCA on code issues, suggesting fixes, and automating seamless integration into workflows.

Work Experience

AI Engineer

2026 - PRESENT

Investment Fund

Designed and deployed a production multi-agent LLM system for executive decision workflows, coordinating planner, retrieval, and task-specific agents to produce source-cited, human-approvable recommendations over heterogeneous enterprise data.
Built specialized detector-style agents that combine deterministic domain rules with LLM reasoning to surface high-value findings, each carrying a confidence score and exact source citations for downstream human review.
Owned the evaluation harness: a gold-set regression suite plus an LLM-as-judge pipeline calibrated against human labels, gating releases in CI and catching accuracy regressions before they ship—protecting against silent degradation.
Orchestrated agentic pipelines with Semantic Kernel, LangGraph-style execution graphs, and MCP tool integrations, enabling dynamic tool use, multi-step reasoning, and reliable routing across structured and unstructured data.
Engineered the retrieval stack—hybrid dense and BM25 search, cross-encoder reranking, and context compression—to raise factual grounding and reduce hallucination in high-stakes decision workflows.
Instrumented the system with end-to-end tracing and observability (token usage, latency, tool-call traces, and retrieval diagnostics) to monitor quality and cost in production.
Implemented data-security controls for sensitive financial data—access scoping, PII handling, and guardrails around tool and context exposure—to meet institutional governance requirements.
Designed a cost-aware model routing strategy directing work between frontier APIs and self-hosted open-weight models by task complexity, balancing accuracy against latency and token cost.
Prototyped small-language-model pretraining and transformer internals in PyTorch (tokenization, training loops, and inference) to ground model-behavior and efficiency trade-off decisions in first principles.
Partnered with cross-functional stakeholders to deploy agents into live operational workflows with the robustness, observability, and auditability that regulated decision-making requires.

Technologies: Artificial Intelligence (AI), Python, Docker, Retrieval-augmented Generation (RAG), Kubernetes, Azure, Large Language Models (LLMs), LoRa, Claude, OpenAI GPT-4 API, OpenAI, Azure OpenAI Service, Model Context Protocol (MCP), Agentic RAG Systems, Claude API, Anthropic, Claude Agent SDK, AI Integration, API Integration, Architecture, Large Language Model Operations (LLMOps), RAG Pipelines, Event-driven Architecture, Financial System Implementation, Solution Architecture, Vector Databases, GPU Computing, Finance, Claude Code, Cursor AI, Codex, Fine-tuning, Data Security, vLLM, Playwright, AI Assistants, Workflow Automation & System Integration

Back-end Engineer

2021 - 2026

ReturnQueen

Fine-tuned an open-weight LLM with LoRA and PEFT for business-specific email parsing—curating the training set and benchmarking the tuned model against prompt-only baselines to prove a measurable lift in structured-extraction accuracy.
Quantized and served the fine-tuned model for high-throughput, low-cost inference, integrating it into existing back-end systems under production load.
Productionized an LLM-powered extraction pipeline that turns messy, unstructured business documents into reliable structured outputs, with high-throughput inference wired into back-end services.
Built validation and regression checks for extraction quality, catching format and edge-case failures before they reached customer-facing workflows.
Designed agent orchestration for automated return flows—intent understanding, policy lookup, data extraction, and multi-step execution—coordinating LLM reasoning with back-end tools and business rules.
Implemented stateful orchestration with retry and fallback handling across multi-step execution paths, managing intermediate decisions and tool outputs.
Grounded automated decisions in retrieval over return policies, order data, and workflow constraints, keeping actions aligned with operational rules.
Implemented guardrails and human-review fallback paths for automated actions, improving reliability on edge cases and reducing failure risk in customer-facing flows.
Built scalable Python back-end services and distributed pipelines (Kafka and Spark) on AWS, sustaining high-throughput ingestion with reliable horizontal scaling under production load.

Technologies: Python, Large Language Models (LLMs), Natural Language Processing (NLP), OpenAI GPT-4 API, Retrieval-augmented Generation (RAG), Prompt Engineering, Generative Artificial Intelligence (GenAI), Pinecone, Vector Search, OpenAI, AI Agents, Data Scraping, CI/CD Pipelines, AI Tools, Agentic AI, ChatGPT, LlamaIndex, AI Prompts, Kubernetes, Claude, LangGraph, FastAPI, Software Architecture, Azure OpenAI Service, Model Context Protocol (MCP), APIs, Reinforcement Learning from Human Feedback (RLHF), Claude API, Anthropic, Machine Learning (ML) APIs, AI Integration, Data Anonymization, API Integration, Workflow Automation, Zapier, n8n, Supabase, AI Chatbots, Architecture, RAG Pipelines, Event-driven Architecture, Solution Architecture, Vector Databases, Llama 3, llama.cpp, Meta Llama, Cursor AI, Codex, Fine-tuning, Data Security, Observability

Research Engineer

2020 - 2021

Raxter

Built a robust document-ingestion pipeline for messy, heterogeneous research-paper PDFs—combining and reconciling the outputs of multiple PDF parsers into clean, structured text; deployed on AWS.
Created and deployed a document-understanding service performing structured, sentence-level extraction over full paper text—classifying each sentence into buckets such as research goal, novelty, and limitations; served on EC2 and AWS Lambda.
Productionized a figure- and layout-extraction library for non-text document elements, deployed on EC2 with health-check alarms, auto-scaling, and reporting.
Developed and deployed advanced NLP and text-to-speech models, enhancing the platform's capability to convert PDFs and web pages into audio notes with high accuracy and naturalness.
Led rigorous testing and validation of NLP and speech models, reducing errors and output mismatches by around 20% before release.
Developed and optimized NLP and multilingual text-to-speech models in PyTorch (English, Spanish, and Mandarin), improving output quality by approximately 30% and lifting user engagement by approximately 25% once integrated into the platform.
Containerized model deployments with Docker for portability and scaling across environments.

Technologies: Artificial Intelligence (AI), Document Processing, Custom BERT, MySQL, SQLAlchemy, Recommendation Systems, Amazon Web Services (AWS), Python, Flask, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), SpaCy, Docker, Google Cloud Platform (GCP), Time Series, Time Series Analysis, Neural Networks, Data Analysis, Selenium, Data Engineering, Databricks, modal, Phonemes, Data Scraping, CI/CD Pipelines, Kubernetes, FastAPI, Software Architecture, APIs, Machine Learning (ML) APIs, API Integration, Architecture, Event-driven Architecture, Solution Architecture, Data Security

Full-stack Developer

2018 - 2020

Lyearn

Created an internal npm library for reporting. Fetched and formatted data from Elastic Search, S3, and DynamoDB.
Worked on a new logic in Express.js, specifically Node.js, when an asset is granted or revoked to sub-account, which reduced the computation and database cost 100 fold.
Worked on the dark theme, live training, reporting, and live class on the platform.

Technologies: JavaScript, Data Architecture, API Architecture, REST APIs, Microservices, JavaScript 6, React, AWS Lambda, Amazon DynamoDB, Node.js, Azure, Business Requirements, Data Pipelines, SQL, Data Engineering, CI/CD Pipelines, Full-stack, MongoDB, Software Architecture, APIs, API Integration, Event-driven Architecture, Solution Architecture, Data Security

Research Engineer and Full-stack Engineer

2018 - 2019

Visionion

Provided services for prototyping, development, and deployment of machine learning algorithms.
Worked on image classification, object detection, and image super-resolution.
Contributed to an IoT project in Python and React, which uses serial communication and picoscope hardware.

Technologies: Object Detection, Optical Character Recognition (OCR), Amazon Web Services (AWS), Keras, PyTorch, TensorFlow, Machine Learning, Deep Learning, Python, Object Tracking, Video Analysis, Artificial Intelligence (AI), Data Pipelines, Data Analysis, Data Engineering, CI/CD Pipelines, Agentic AI, ChatGPT, LlamaIndex, Chatbots, LoRa, RAG Architecture, APIs, Machine Learning (ML) APIs, API Integration

Machine Learning Engineer

2018 - 2018

Infocusp

Explored various deep learning algorithms to predict next frames using the existing user-drawn sketch.
Proposed a novel algorithm pipeline using optical flow, RNN, and CNN to solve the problem.
Implemented the algorithm using TensorFlow, Keras, and PyTorch. Trained and deployed the models on AWS EC2.

Technologies: Python, Computer Vision, Deep Learning, Artificial Intelligence (AI), NVIDIA CUDA, OpenCL, OpenCL/GPU, Data Analysis, APIs, Machine Learning (ML) APIs, API Integration

Experience

GitHub Pull Requests Analysis System with RCA and RAG Integration

I developed an advanced GitHub-based solution that integrates large language models (LLMs) and retrieval-augmented generation (RAG) using LangChain to analyze and review pull requests (PRs). The system performs automated root cause analysis (RCA) on issues linked to PRs, leveraging multi-agent systems for comprehensive analysis and resolution suggestions. It also has direct access to repositories, enabling it to suggest and implement fixes autonomously, streamlining the development process for complex systems.

Classification of Plant Disease Using Convolutional Neural Networks

As part of the Summer Innovation Challenge 2017 hosted by the Gujarat State Government, I trained a CNN with transfer learning to classify a plant's disease from its images. I implemented flexible, fast code that supports training on the line.

Motion-based Image Super Resolution

Proposed and implemented a general-purpose deep learning architecture that can learn multi-image to multi-image mapping. I also investigated its application in image registration, image super-resolution, and photometric stereo.

Tone Mapping HDR Images

Tone-mapped high dynamic range (HDR) images using the generative adversarial network (GAN). I proposed a GAN architecture to tone-map HDR images from the output of multiple TMO algorithms present. I used an extension of an image translation network called pix2pix.

Rice Classification Using CNNs

Proposed a convolutional neural network architecture with transfer learning that outperforms state-of-the-art methods. I also trained a 5-class model for classifying basmati rice using 4,000 training images.

GPU Kernel Optimization and Competitive GPU Programming (CUDA)

I practice LeetGPU, writing and optimizing CUDA kernels for throughput and memory efficiency by applying memory coalescing, shared-memory tiling, occupancy tuning, and warp-level primitives to push kernels toward the hardware's limits. The work builds on a computational science and HPC foundation (CUDA, multi-core Fortran, and parallel algorithms) from my academic background.

I apply that depth to modern LLM serving by understanding the GPU substrate behind quantization (INT8/INT4, GGUF/AWQ), KV-cache management, and continuous batching, as well as the latency-throughput trade-offs that drive cost in vLLM/TGI-style inference.

Education

2014 - 2018

Bachelor's Degree in Computer Science

DA-IICT - Gandhinagar, India

Certifications

AUGUST 2017 - PRESENT

Neural Networks for Machine Learning

Coursera

SEPTEMBER 2016 - PRESENT

Machine Learning

Coursera

Skills

Libraries/APIs

Node.js, PyTorch, REST APIs, Claude API, React, SpaCy, TensorFlow, Keras, Scikit-learn, NumPy, SciPy, OpenCV, Matplotlib, SQLAlchemy, Playwright, LSTM, Pandas, MobX, llama.cpp, vLLM

Tools

Jupyter, ChatGPT, AI Prompts, Claude, Azure OpenAI Service, Claude Code, Git, Zapier, n8n, Codex, Amazon Elastic Container Service (ECS), Apache Airflow, Claude Agent SDK

Languages

Python, SQL, C++, Python 3, C, JavaScript 6, JavaScript, CSS, HTML, HTML5, TypeScript

Paradigms

Model Context Protocol (MCP), Event-driven Architecture, Microservices Architecture, Serverless Architecture, Microservices, API Architecture, High-performance Computing (HPC)

Platforms

NVIDIA CUDA, Amazon Web Services (AWS), AWS Lambda, Docker, Databricks, Kubernetes, Azure, Google Cloud Platform (GCP)

Frameworks

Flask, Selenium, LangGraph, Caffe, Redux, Angular, OpenCL, LlamaIndex

Storage

Amazon DynamoDB, Elasticsearch, MySQL, Data Pipelines, MongoDB, PostgreSQL, PostGIS

Other

Deep Learning, Computer Vision, Machine Learning, Natural Language Processing (NLP), Artificial Intelligence (AI), Optical Character Recognition (OCR), Generative Pre-trained Transformers (GPT), Data Analysis, Data Engineering, BERT, Large Language Models (LLMs), Text-to-Speech (TTS), GPU Computing, OpenAI GPT-4 API, OpenAI GPT-3 API, Retrieval-augmented Generation (RAG), LangChain, Multi-agent Systems, Prompt Engineering, OpenAI, AI Agents, CI/CD Pipelines, Agentic AI, RAG Architecture, FastAPI, Software Architecture, Agentic RAG Systems, APIs, Anthropic, Machine Learning (ML) APIs, AI Integration, API Integration, Workflow Automation, Architecture, RAG Pipelines, Solution Architecture, Vector Databases, Cursor AI, Deep Neural Networks (DNNs), Convolutional Neural Networks (CNNs), Pattern Recognition, Image Classification, Computer Vision Algorithms, Recommendation Systems, Pinecone, Vector Search, Data Scraping, AI Tools, Reinforcement Learning from Human Feedback (RLHF), Data Anonymization, Supabase, AI Chatbots, Large Language Model Operations (LLMOps), Financial System Implementation, Finance, Fine-tuning, Data Security, Observability, AI Assistants, Workflow Automation & System Integration, Mathematics, Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs), Big Data, Data Science, Data Architecture, Custom BERT, Document Processing, Object Detection, Unsupervised Learning, Object Tracking, Video Analysis, OpenCL/GPU, Time Series, Time Series Analysis, Neural Networks, Business Requirements, modal, Phonemes, Serverless GPUs, Image Processing, Generative Artificial Intelligence (GenAI), Full-stack, Chatbots, LoRa, CUDA Kernel, Quantization, Optimization, Llama 3, Meta Llama

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring