
Chirag Kalra
Verified Expert in Engineering
ML/AI Engineer and Developer
Gurugram, Haryana, India
Toptal member since February 18, 2026
Chirag is a senior computer vision engineer specializing in production-grade AI infrastructure and real-time streaming. He architects scalable GPU inference systems for GenAI, driving decisions that significantly reduce latency and infrastructure costs. Expert in Python, C++, and Kubernetes, Chirag transforms experimental models into reliable production systems. He focuses on high-performance deployment, optimization, and system reliability for enterprise clients.
Portfolio
Experience
- System Architecture - 3 years
- Kubernetes - 3 years
- PyTorch - 3 years
- Computer Vision - 3 years
- Real Time Streaming - 3 years
- Machine Learning Operations (MLOps) - 3 years
- C++ - 3 years
- NVIDIA TensorRT - 3 years
Preferred Environment
Linux, Python, PyTorch, Computer Vision, FastAPI
The most amazing...
...result I delivered was reducing generative AI video inference costs by 10x while reducing model latency fivefold from six to 1.2 seconds in real-time contexts.
Work Experience
Senior Computer Vision Engineer
Alethia.AI
- Architected and deployed a modular GPU autoscaling platform on cost-effective cloud GPU marketplaces, maintaining 99% production uptime and enabling six-figure annualized infrastructure savings versus Kubernetes-based GPU orchestration.
- Cut end-to-end lipsync latency from six seconds to 1.2 seconds (5x reduction) by implementing asynchronous I/O operations to remove bottlenecks, and increased model throughput by 80% by building and deploying a custom TensorRT inference engine.
- Architected and owned a real-time RTMP streaming API from scratch using coroutines, multithreading, and multiprocessing across CPUs and GPUs to handle network, disk, and compute workloads with low latency, HD output, and real-time visual effects.
Computer Vision Engineer
Alethia.AI
- Awarded "Employee of the Quarter" for significant contributions to AI research and engineering.
- Led company-wide architectural decisions for image generation by benchmarking and integrating third-party APIs, reducing internal GPU deployment/maintenance overhead while improving scalability and end-user visual quality.
- Designed a scalable end-to-end architecture for server-to-client streaming, reducing first-frame latency by over 95% using AWS Kinesis WebRTC streaming, offering an overall smoother, more responsive experience for the end users.
- Optimized animation and lipsync inference pipelines, reducing inference time by more than 75% and achieving sustained 30+ FPS through batching, JIT compilation, efficient video encoding, and auxiliary model optimizations.
Experience
Dust It | Android Gallery App
https://github.com/ChiragKalra/DustItFitter | Fitness App
https://github.com/ChiragKalra/FitterOrganiso | SMS Organiser
https://organiso.web.app/Education
Bachelor's Degree in Information Technology
J.C. Bose University of Science and Technology - Faridabad, India
Certifications
GANs Specialization
DeepLearning.AI
Deep Learning Specialisation
DeepLearning.AI
Skills
Libraries/APIs
PyTorch, TensorFlow, NumPy, Pandas, Python API, WebRTC
Tools
Grafana, Docker Compose, ComfyUI
Languages
Python, C++, SQL, Kotlin
Platforms
Kubernetes, Docker, RunPod, Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), Android, Linux
Other
Machine Learning, Computer Vision, Generative Adversarial Networks (GANs), FastAPI, Real Time Streaming, Neural Networks, Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Deep Learning, AI Model Training, Model Evaluation, Fine-tuning, Artificial Intelligence (AI), System Architecture, Prometheus, AI Pipeline, Solution Architecture, Architecture, CoreWeave, Vast.AI, Generative Artificial Intelligence (GenAI), AI-generated Video, Workflows, 3D Pose Estimation, Object Detection, Sequence Models, Image Generation, Diffusion Models, Image Processing, Video Processing, Machine Learning Operations (MLOps), NVIDIA Triton, NVIDIA TensorRT, Web Scraping, Edge AI, Large Language Models (LLMs), Full-stack Development
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring