Chirag is currently unavailable

Chirag Kalra

Verified Expert in Engineering

ML/AI Engineer and Developer

Gurugram, Haryana, India

Toptal member since February 18, 2026

Expertise

Machine Learning Computer Vision Neural Network Deep Learning Artificial Intelligence Python PyTorch TensorFlow NumPy Grafana Kubernetes Docker WebRTC Web Scraping

Bio

Chirag is a senior computer vision engineer specializing in production-grade AI infrastructure and real-time streaming. He architects scalable GPU inference systems for GenAI, driving decisions that significantly reduce latency and infrastructure costs. Expert in Python, C++, and Kubernetes, Chirag transforms experimental models into reliable production systems. He focuses on high-performance deployment, optimization, and system reliability for enterprise clients.

Portfolio

Alethia.AI

PyTorch, Python, Real Time Streaming, FastAPI...

Alethia.AI

Computer Vision, Convolutional Neural Networks (CNNs)...

Experience

System Architecture - 3 years
Kubernetes - 3 years
PyTorch - 3 years
Computer Vision - 3 years
Real Time Streaming - 3 years
Machine Learning Operations (MLOps) - 3 years
C++ - 3 years
NVIDIA TensorRT - 3 years

Preferred Environment

Linux, Python, PyTorch, Computer Vision, FastAPI

The most amazing...

...result I delivered was reducing generative AI video inference costs by 10x while reducing model latency fivefold from six to 1.2 seconds in real-time contexts.

Work Experience

Senior Computer Vision Engineer

2025 - PRESENT

Alethia.AI

Architected and deployed a modular GPU autoscaling platform on cost-effective cloud GPU marketplaces, maintaining 99% production uptime and enabling six-figure annualized infrastructure savings versus Kubernetes-based GPU orchestration.
Cut end-to-end lipsync latency from six seconds to 1.2 seconds (5x reduction) by implementing asynchronous I/O operations to remove bottlenecks, and increased model throughput by 80% by building and deploying a custom TensorRT inference engine.
Architected and owned a real-time RTMP streaming API from scratch using coroutines, multithreading, and multiprocessing across CPUs and GPUs to handle network, disk, and compute workloads with low latency, HD output, and real-time visual effects.

Technologies: PyTorch, Python, Real Time Streaming, FastAPI, Machine Learning Operations (MLOps), Machine Learning, Computer Vision, NVIDIA Triton, NVIDIA TensorRT, AI Model Training, Model Evaluation, Fine-tuning, Deep Learning, Artificial Intelligence (AI), Azure, NumPy, Pandas, Python API, Amazon Web Services (AWS), System Architecture, Google Cloud Platform (GCP), Linux, Grafana, Prometheus, AI Pipeline, Solution Architecture, Architecture, Kubernetes, Generative Adversarial Networks (GANs), C++, Docker, Docker Compose, CoreWeave, Vast.AI, RunPod, WebRTC, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI-generated Video, Workflows, 3D Pose Estimation, Object Detection, Image Generation, Full-stack Development, Diffusion Models, Image Processing, Video Processing, ComfyUI

Computer Vision Engineer

2023 - 2025

Alethia.AI

Awarded "Employee of the Quarter" for significant contributions to AI research and engineering.
Led company-wide architectural decisions for image generation by benchmarking and integrating third-party APIs, reducing internal GPU deployment/maintenance overhead while improving scalability and end-user visual quality.
Designed a scalable end-to-end architecture for server-to-client streaming, reducing first-frame latency by over 95% using AWS Kinesis WebRTC streaming, offering an overall smoother, more responsive experience for the end users.
Optimized animation and lipsync inference pipelines, reducing inference time by more than 75% and achieving sustained 30+ FPS through batching, JIT compilation, efficient video encoding, and auxiliary model optimizations.

Technologies: Computer Vision, Convolutional Neural Networks (CNNs), Generative Adversarial Networks (GANs), Deep Learning, AI Model Training, RunPod, CoreWeave, Vast.AI, WebRTC, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI-generated Video, Workflows, 3D Pose Estimation, Object Detection, Image Generation, Full-stack Development, Diffusion Models, Image Processing, Video Processing, ComfyUI

Experience

Dust It | Android Gallery App

https://github.com/ChiragKalra/DustIt

I architected an Android Gallery App that automatically cleans up junk images on the user's phone, and it runs a model on a device that achieved 96% precision and 90% recall in image classification by implementing a hybrid MobileNet V3-based model.

Fitter | Fitness App

https://github.com/ChiragKalra/Fitter

Fitter tracks calorie intake by detecting the food type from the user’s camera, and it achieved 95% top-5 accuracy in classifying Indian food using transfer learning, fine-tuning, and hyperparameter optimization, and I also built a custom dataset by scraping over 200,000 images from online repositories, covering 300 unique classes of fruits and dishes to be used as the training data.

Organiso | SMS Organiser

https://organiso.web.app/

Minimalist SMS app to automatically organise messages from the user’s inbox into useful categories, where I trained on 10,300 samples to classify messages into five categories, like spam/promotions, with an accuracy of 93%. Developed a Discord bot to streamline data labeling, improving efficiency by 500% through a user-friendly interface.

Education

2019 - 2023

Bachelor's Degree in Information Technology

J.C. Bose University of Science and Technology - Faridabad, India

Certifications

AUGUST 2024 - PRESENT

GANs Specialization

DeepLearning.AI

JANUARY 2022 - PRESENT

Deep Learning Specialisation

DeepLearning.AI

Skills

Libraries/APIs

PyTorch, TensorFlow, NumPy, Pandas, Python API, WebRTC

Tools

Grafana, Docker Compose, ComfyUI

Languages

Python, C++, SQL, Kotlin

Platforms

Kubernetes, Docker, RunPod, Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), Android, Linux

Other

Machine Learning, Computer Vision, Generative Adversarial Networks (GANs), FastAPI, Real Time Streaming, Neural Networks, Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Deep Learning, AI Model Training, Model Evaluation, Fine-tuning, Artificial Intelligence (AI), System Architecture, Prometheus, AI Pipeline, Solution Architecture, Architecture, CoreWeave, Vast.AI, Generative Artificial Intelligence (GenAI), AI-generated Video, Workflows, 3D Pose Estimation, Object Detection, Sequence Models, Image Generation, Diffusion Models, Image Processing, Video Processing, Machine Learning Operations (MLOps), NVIDIA Triton, NVIDIA TensorRT, Web Scraping, Edge AI, Large Language Models (LLMs), Full-stack Development

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring