Shivam Garg
Verified Expert in Engineering
Computer Vision Engineer and Developer
Delhi, India
Toptal member since August 1, 2023
Shivam is a senior AI engineer with 5+ years of hands-on experience in deep learning and artificial intelligence. Proficient in various deep learning frameworks such as PyTorch, Hugging Face, TensorFlow, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam stands out for his extensive expertise in classical computer vision and large language models.
Portfolio
Experience
Availability
Preferred Environment
Python, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Large Language Models (LLMs), LangChain, Machine Learning, AI Agents, OpenAI GPT-4 API, English, Amazon Web Services (AWS), Retrieval-augmented Generation (RAG), LangGraph, TypeScript, Vector Databases, Hugging Face, Llama 3, Full-stack, Kubernetes, Minimum Viable Product (MVP), Team Leadership, Video Processing, Azure Databricks, Data Modeling, AWS Lambda, Model Tuning, Serverless, Amazon Bedrock, Node.js, SQL, Snowflake, Google Colaboratory (Colab), Jupiter, Kubeflow, Data Annotation, Keras, Amazon RDS, Data Cleansing, GraphDB, Generative Adversarial Networks (GANs), Text to Image AI, DreamBooth, SpaCy, Google Vision API, Neural Networks, Transformers, API Integration, Data Engineering
The most amazing...
...generative AI model I've delivered uses Stable Diffusion and LLMs to animate stories from news articles. It helped secure Y Combinator funding.
Work Experience
AI Engineer 3
Self-Employed
- Developed a travel assistant using retrieval-augmented generation (RAG) and agentic workflows. Implemented chain-of-thought prompting and context-aware templates for trip planning, flight bookings, and Q&A.
- Engineered a dynamic lead scoring system for a cloud-based project management SaaS. Utilized RHFL tunned fine-tuned Llama 2 with few-shot learning to analyze prospect interactions and prioritize high-value leads.
- Developed a virtual house agent via CrewAI and fine-tuned Llama 3 via LoRa to reduce home maintenance costs by 30% and improve service efficiency. AI processes homeowner requests, then generates quick estimates and schedules professionals.
- Implemented a personal finance assistant using LangChain and PEFT fine-tuned Mistral AI to increase savings by 15-20%. AI categorizes expenses, creates budgets, and optimizes investment portfolios based on risk profiles and market conditions.
- Developed an AI supply chain optimizer using AutoGen, GPT-4, and RAG. Enhanced inventory management and disruption prediction, reducing costs by 20% and improving delivery efficiency by 15%.
- Fine-tuned Llama 3 8B using instruction tuning, preference alignment (ORPO), and hyperparameter optimization for improved recipe recommendations.
- Created an audit bot for auditing clinical trials for J&J utilizing chain of thought prompting, data-driven prompting, and ElasticDB-based retrieval-augmented generation (RAG).
- Created an AI-powered language exchange platform using React, FastAPI, and NLP models for real-time translation and analysis. Increased employee language proficiency by 45% and improved cross-cultural communication efficiency by 30%.
- Developed an AI-powered language learning platform using transformer models and SpaCy. Implemented adaptive difficulty and personalized feedback, boosting engagement by 55% and retention by 40%. Integrated 20+ language corpora for diverse content.
- Developed a real-time social media sentiment analyzer using RoBERTa and Kafka. Achieved 92% accuracy in multilingual emotion detection, reducing customer response time by 33% for major brands.
Senior LLM Engineer
ContractPodAi
- Collaborated with a sales intelligence platform to build an RAG system using LangGraph. Optimized LLaMA-3 8B models reduced size by 75% and latency by 50%, maintaining 0.95 recall and 30 BLEU score.
- Developed a personalized sales and marketing bot using LangChain with few-shot and chain-of-thought prompting. Improved lead qualification by 40% and content personalization by 25%.
- Developed an end-to-end multi-agent contract management tool for extracting, drafting, and managing legal contracts. Utilized a fine-tuned Llama 3 model, Graph RAG with Neo4j via LangChain, and quantized with QLora. It was deployed using vLLM.
- Developed a legal copilot using a multi-level RAG approach with MongoDB Atlas and OpenAI Assistant API. The system queries over 100,000 documents for efficient and accurate contract insights, achieving a BeLU score 0.95.
- Led a team of five developers to architect a multi-tenant AI back end, supporting over 10,000 concurrent users. Set up an MLOps pipeline, reducing deployment time from two hours to one hour, and a CI/CD pipeline for faster software delivery.
- Designed a system to automate legal contract drafting and review by fine-tuning Llama 3 with LoRA. It generates templates, suggests clauses, and detects inconsistencies, reducing drafting time by 50% and errors by 35% compared to traditional methods.
- Worked with the solutions and sales teams to rapidly prototype and build AI proof of concepts (POCs) and responses to proposal requests (RFPs). Some of the successful POCs and RFPs include finance assistants, trading bots, and legal advisors.
- Created an insurance copilot using a fine-tuned OpenAI GPT-4 API. This enhanced customer interactions by providing accurate, real-time responses, automating routine tasks, and ensuring a seamless user experience.
- Implemented security measures like AES-256 encryption, TLS 1.3, RBAC, MFA, and data privacy for detailed documentation and training for effective system use, ensuring robust data security and user efficiency.
- Partnered with a financial advisory firm to create an AI assistant using GPT-4. Developed prompt templates with constrained generation for tailored investment advice and market insights.
Senior Machine Learning Engineer
IBM
- Built a weather forecasting system for The Weather Company (IBM) using Mistral. The LLM enhanced real-time weather insights, improving forecast accuracy to 85% and reducing alert processing time by 35%, ensuring timely and precise weather updates.
- Created a fashion sales forecasting system for an eCommerce platform using CLIP embeddings and GPT-3. Analyzed product images and descriptions to predict trends, improving inventory management and boosting quarterly sales by 18%.
- Created a tool to search for similar patents using LangChain's OpenAI ADA model embeddings and pinecone for improved indexing. This resulted in a 90% precision rate for relevant patent identification.
- Utilized AWS for cloud computing, cutting model deployment time by 50% and improving resource efficiency by 40%. Enhanced model streaming and observability, boosting real-time performance tracking by 30% and reducing operational overhead by 25%.
- Implemented keystroke dynamics for eCommerce authentication with Python for feature extraction and TensorFlow with LSTM networks for deep learning. Achieved 95% accuracy, cut authentication time by 30%, and reduced security breaches by 45%.
- Implemented and managed MLOps practices to ensure efficient lifecycle management of ML models. Worked with GitHub Actions to enhance the code integration and deployment process automation for CI/CD.
- Developed and managed AI/ML solutions with skills in ML, MLOps, cloud computing, and CI/CD practices.
- Contributed to the development of LLMs to advance natural language processing and AI. Applied technologies and methodologies to drive innovation and enhance ML capabilities.
- Developed the automated Personality Assessment and Reporting System (PARS) using LLMs and retrieval-augmented generation (RAG); Mistral, fine-tuned via PEFT, RAG with LangChain, and Pinecone for data extraction and cleaning.
- Utilized OpenAI GPT-4 with the chain of thought and LangChain with Pinecone to compare traits and create visual reports. The model was refined with BitFit, cross-validation, and A/B testing, achieving 95% accuracy.
AI Engineer 3
Avatarin Inc
- Created a system to assist human Kanji writing through imitation learning and OpenCV using Kanji videos to generate Kanji images predicting poses for robotic arms.
- Implemented a model that detects suspicious activity at airports using VideoMAE. It prioritized high accuracy, low latency, and efficient deployment on the client's Linux server.
- Shot detection using YOLOv5, OpenCV for object detection, and VideoMAE for shot recognition in TT Games for World Table Tennis Organization.
- Integrated an infrared object detection system using the YOLO architecture with DeepSORT, achieving high accuracy with an mAP of 0.88 for detecting and tracking vehicles.
- Created a satellite image segmentation system using U-Net and Mask R-CNN models, achieving a Dice score of 0.94, greatly enhancing agricultural field detection and analysis.
Senior AI Engineer
AlphaICs
- Implemented a motion transfer system using a first-order model, achieving high-quality motion transfer between faces while preserving the identity and facial expressions of the target face.
- Built a quantization software development kit (SDK) for 4-bit and 8-bit quantization, enabling the efficient implementation and optimization of deep learning models on Edge (CPU-based) hardware, which enhanced performance and capabilities.
- Benchmarked different computer vision and generative models with custom quantization and optimization SDK for IOT and custom Edge devices.
- Worked on brain image segmentation using deep learning, which involves training neural networks to accurately identify and classify structures in brain images linked to Alzheimer's disease. I've used segmentation and computer vision techniques.
- Rolled out a 3D object detection and tracking system for autonomous vehicles using lidar data and the VoxelNet algorithm, enhancing the vehicles' perception and tracking capabilities in a 3D environment.
- Developed an infrared object detection system using the You Only Look Once (YOLO) architecture, achieving high accuracy in detecting objects in infrared images and providing reliable identification and tracking capabilities.
- Created a satellite image segmentation system for detecting agricultural fields using a cascade of U-Net and Mask R-CNN models, improving agricultural analysis and decision-making processes.
Machine Learning Engineer
UnrealAI
- Built a real-time Yoga pose estimation system for Android and iOS using OpenPifPaf. It achieves 95% accuracy and offers detailed feedback to refine users' yoga practice.
- Created a topic modeling model, utilizing LDA and NMF algorithms for latent topic extraction from text corpora, and applied clustering algorithms to group similar topics, providing a better understanding and organization of the text documents.
- Built a computer vision system for accurately detecting items in the kitchen, with high accuracy and low latency. The system was optimized for real-time performance on mobile devices.
- Detected income tax fraud using an ensemble of supervised anomaly detection, unsupervised clusterin, and rule-based backtracking.
Experience
Text-to-video Generation for Mathematical Equations
Legal Law Chatbot with RAG, Pinecone Integration, Streamlit UI, and GPT-4
Text to Video
https://drive.google.com/drive/folders/15vLHASESD4HWIQ5nY3OtxrvzSsYb_IDeSystem and Method for On-device Edge Learning
https://patents.justia.com/patent/20230386194The method includes training an artificial intelligence (AI) model for extracting the visual embeddings with pre-trained visual deployment networks; checking the performance of the AI model by feeding real-time data and by performing an inference; initiating edge learning; extracting visual embeddings with pre-trained visual deployment networks; performing the inference and adding a text image embedding; taking the text embeddings using text embedders embeddings; converting the text to image embeddings to generate augmented image embeddings and adding text embeddings
News to Infographics
The process begins with news articles being first summarized using GPT-3.5 Turbo and Davinci, facilitated by LangChain. Subsequently, videos are generated using the fine-tuned Stable Diffusion 2.1 technique, resulting in engaging and dynamic visual representations of the news stories.
System and Method for Integer-only Quantization-aware Training for Edge
https://patents.justia.com/patent/20230342613I developed the pseudo-cross entropy loss function and designed the quantization scheme for integer-only quantization-aware training. Additionally, an SDK was developed to enable the utilization of this system on low-power edge computing devices. The SDK has been successfully used to quantize models on Jetson and the vendors' custom hardware.
Personalized Art Generation Bot
Interactive NFT (2D-3D)
Selective 3D inpainting involves the advanced process of filling in missing or damaged regions in the 2D images, resulting in a complete and visually appealing 3D representation. This technique helps to enhance the overall quality and realism of the generated 3D models.
Depth estimation is another critical component of the system as it enables the determination of the spatial depth information from 2D images. This depth information is essential for creating a sense of depth and perspective in the resulting 3D models.
By leveraging Stable Diffusion, the system ensures a stable and consistent generation process, delivering high-quality and accurate 3D representations of the NFTs from their 2D counterparts. The resulting 3D models can significantly enrich users' viewing and interaction experience in various applications, ranging from virtual galleries to augmented reality environments.
Yoga Pose Correction
The trained model was thoughtfully quantized and converted to a TensorFlow Lite format to enhance usability and integration. This conversion facilitated the easy incorporation of the model into Android applications, providing a user-friendly tool for yoga enthusiasts to refine their practice and gain a deeper understanding of different postures.
Toonify Pets
Real-time Table Tennis Tracking and Shot Detection System
https://drive.google.com/file/d/1Mic7692BmlfOSGTcIbefIBM9ElFPZoBd/view?usp=sharingFake News Classification
The project involved preprocessing text data, employing the SetFit model and LSTM, and developing an ensemble of SetFit and LSTM to identify fake news accurately.
Additionally, k-means clustering was used to cluster the type of fake news. The end goal was to create a reliable tool to combat the spread of misinformation. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.
Text and Voice Assistant for Students with Dyslexia
By fine-tuning the Mixtral-8x7B-Instruct model using Reinforcement Learning from Human Feedback (RLHF) and leveraging Azure DevOps for CI/CD and deployment, with document ingestion handled via Azure, the system significantly improved its ability to manage atypical spelling and grammar. We implemented RAG to enhance data extraction and query responses, further improving comprehension of unusual queries. We also integrated text-to-speech functionality to aid students with reading difficulties, ensuring a more accessible and supportive learning experience.
ASL-to-Speech for Healthcare
The system employs MediaPipe for precise hand tracking, extracting key landmarks from video input. Custom convolutional neural networks (CNNs) process these landmarks, capturing spatial relationships crucial for sign language interpretation. A bidirectional long short-term memory (LSTM) network models the temporal sequences of gestures, enabling context-aware translation. The translated text is then converted to speech using Tacotron 2 for mel-spectrogram generation and WaveNet as a vocoder, producing high-quality, natural-sounding audio output. This end-to-end pipeline achieves 92% accuracy in ASL translation, significantly enhancing communication between deaf individuals and hearing individuals in healthcare settings.
The system's modular architecture allows for easy updates and improvements to individual components, ensuring scalability and adaptability to different sign languages or use cases.
Education
Bachelor of Technology Degree in Computer Science
University School of Information, Communication and Technology - Dwarka, Delhi, India
Skills
Libraries/APIs
PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, NumPy, React, Node.js, OpenAI API, Google Vision API, Keras, Fast.ai, Gradio, OpenAI Assistants API
Tools
You Only Look Once (YOLO), Git, ChatGPT, Notion, Haystack, Azure Machine Learning, Whisper, Jira, Microsoft Power BI, Amazon Athena, Google Sheets, AI Prompts, Amazon Textract, Make, Zapier, Amazon SageMaker, Google Bard, Open Neural Network Exchange (ONNX), GNU Autogen
Languages
Python, C++, Falcon, JavaScript, TypeScript, SQL, Snowflake, Regex, Bash Script, Python 3
Frameworks
TensorFlow Lite, Flask, LlamaIndex, Django, DSPy, LangGraph, Unity, Streamlit, JSON Web Tokens (JWT), AWS Serverless Application Model (SAM)
Paradigms
ETL, Azure DevOps, Agile, Rapid Prototyping, Continuous Development (CD), Continuous Integration (CI), HIPAA Compliance, Search Engine Optimization (SEO)
Platforms
Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, Amazon Web Services (AWS), HubSpot, Azure, Kubernetes, Azure AI Studio, Kubeflow, CrewAI, Jupyter Notebook, Firebase, iOS, Linux, Civitai, Vae, LangSmith, Ollama, Apache Kafka
Storage
MySQL, MongoDB, Neo4j, Google Cloud, Graph Databases, PostgreSQL, Databases
Other
Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, Machine Learning, LangChain, Statistics, Data Science, Depth Estimation, DreamBooth, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), Generative Adversarial Networks (GANs), Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks (CNNs), Image Processing, OpenAI GPT-4 API, OpenAI GPT-3 API, Machine Learning Operations (MLOps), LoRa, Text to Image, Diffusion Models, NLU, Deep Neural Networks (DNNs), Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, Prompt Engineering, OpenAI, APIs, HubSpot CRM, Retrieval-augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), Speech to Text AI, AI Agents, Llama 3, English, Gemini, Claude, Scalable Vector Databases, Vector Databases, Mistral AI, Full-stack, Full-stack Development, Multi-agent Systems, Reinforcement Learning from Human Feedback (RLHF), Data Analytics, Sentiment Analysis, Computer Vision Algorithms, Minimum Viable Product (MVP), Team Leadership, Plugins, Video Processing, Azure Data Factory, Azure Databricks, Data Modeling, Biometrics, Model Tuning, Serverless, Amazon Bedrock, Google Colaboratory (Colab), Jupiter, Modeling, Medical Imaging, Speech Analytics, Voice Analysis, Data Annotation, Amazon RDS, Data Cleansing, GraphDB, Technical Leadership, Human Resources (HR), Generative Pre-trained Transformer 4 (GPT-4), Text to Image AI, Web Scraping, AI Art Visualization, DALL-E, Hand Tracking, Open-source LLMs, Handwriting Recognition, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Deep Reinforcement Learning, Data Engineering, NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, Outbound Marketing, Trading, Binary Option Trading, Futures & Options, Option Pricing, Options Trading, Sign Language, K-means Clustering, Edge AI, Prunning, Benchmarking, Object Detection, Product Matching, ControlNet, Videos, Conversational AI, Multimodal GenAI, ChatGPT Prompts, CI/CD Pipelines, GitHub Actions, Large Language Model Operations (LLMOps), MORA, Object Tracking, DeepSORT, StyleGAN, VideoMAE, OpenMidas, Mixtral, Estimations, VLLM, Qdrant, RoBERTa
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring