Shivam is available for hire

Shivam Garg

Verified Expert in Engineering

Computer Vision Engineer and Developer

Location

Delhi, India

Toptal Member Since

August 1, 2023

Shivam is a senior AI engineer with 4+ years of hands-on experience in deep learning and artificial intelligence. Proficient in various deep learning frameworks such as TensorFlow, PyTorch, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam stands out for his extensive expertise in classical computer vision and machine learning.

Portfolio

Self-employed

Python, Generative Artificial Intelligence (GenAI), Stable Diffusion...

Avatarin Inc

3D Reconstruction, Python, Computer Vision, OCR...

AlphaICs

Python, Deep Learning, Quantization, Computer Vision, NVIDIA TensorRT...

Experience

Python - 5 years Natural Language Processing (NLP) - 4 years Computer Vision - 4 years PyTorch - 4 years Generative Artificial Intelligence (GenAI) - 4 years Stable Diffusion - 2 years Large Language Models (LLMs) - 1 year LangChain - 1 year

Availability

Part-time

Preferred Environment

Python, PyTorch, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Docker, LangChain, Large Language Models (LLMs), Machine Learning, Data Science, Image Generation, Chatbots, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Notion, APIs, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, 2D, JavaScript, Text to Speech (TTS), Generative AI

The most amazing...

...generative AI model I've delivered uses Stable Diffusion and LLMs to animate stories from news articles and helped secure Y Combinator funding.

Work Experience

Senior AI Consultant

2023 - PRESENT

Self-employed

Developed a Stable Diffusion model with ControlNet to convert a sketch into a photorealistic image conditioned with pose inputs. Cross-attention layers were tuned by Lora to optimize the space requirements of the trained model.
Delivered a generative AI model using Stable Diffusion and LLMs, capable of generating animated stories from news articles, which secured Y Combinator fundraising for the client.
Developed a unique approach to transform animal images into animated cartoons by training a GAN on unpaired animal images, leveraging StyleGAN architecture, and enhancing the output with CLIP and a feature extractor.
Built a system to convert 2D images of non-fungible tokens (NFTs) into 3D models using selective 3D inpainting via Stable Diffusion and depth estimation.
Developed a text-to-art system using techniques such as fine-tuning, autoencoders, and prompt engineering, successfully generating visually appealing art from text descriptions.
Created a system to detect and classify fake news in India using ML and natural language processing (NLP). Preprocessed text data, employed SetFit and long short-term memory (LSTM) models, and created an ensemble for precise identification.
Built a tool that searches similar patents on the United States Patent and Trademark Office (USPTO) database using Langchain's OpenAI ada model embeddings and FAISS improved indexing and search of patent embeddings.
Created an eCommerce product matching system by comparing visual embeddings from the CLIP model with OCR-derived textual embeddings via LLM (ada model), enhancing accuracy and efficiency.

Technologies: Python, Generative Artificial Intelligence (GenAI), Stable Diffusion, Deep Learning, Computer Vision, Natural Language Processing (NLP), PyTorch, TensorFlow, Docker, LangChain, Generative Pre-trained Transformers (GPT), AWS IoT, Git, Generative Adversarial Networks (GANs), Artificial Intelligence (AI), OCR, Google Cloud Platform (GCP), Convolutional Neural Networks (CNN), ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Search Engine Optimization (SEO), OpenCV, Machine Learning Operations (MLOps), Amazon Web Services (AWS), Product Matching, LoRa, Large Language Models (LLMs), Diffusion Models, NLU, Deep Neural Networks, Language Models, MySQL, Machine Learning, Statistical Analysis, Data Analysis, Image Analysis, Data Science, MongoDB, Image Generation, Chatbots, LlamaIndex, Django, Pandas, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Video & Audio Processing, OpenAI, Notion, APIs, Haystack, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, JavaScript, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure Machine Learning, Azure DevOps, Text to Speech (TTS), Whisper, Generative AI

AI Engineer 3

2022 - 2023

Avatarin Inc

Created a system to assist human Kanji writing through imitation learning and OpenCV using Kanji videos to generate Kanji images predicting poses for robotic arms.
Automated health records and invoices for Yale University, leveraging OCR and OpenCV to extract text from diverse health documents and their transition to digital formats.
Implemented a model that detects suspicious activity at airports using VideoMAE. It prioritized high accuracy, low latency, and efficient deployment on the client's Linux server.
Shot detection using YOLOv5, OpenCV for object detection, and VideoMAE for shot recognition in TT Games for World Table Tennis Organization.

Technologies: 3D Reconstruction, Python, Computer Vision, OCR, Natural Language Processing (NLP), Object Detection, Image Processing, Benchmarking, OpenCV, Amazon Web Services (AWS), Text to Image, Large Language Models (LLMs), Diffusion Models, Deep Neural Networks, ChatGPT, OpenAI GPT-4 API, Language Models, MySQL, Machine Learning, Statistical Analysis, Data Analysis, Image Analysis, Data Science, MongoDB, Image Generation, Chatbots, LangChain, LlamaIndex, Django, Pandas, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure Machine Learning, Azure DevOps, Text to Speech (TTS), Generative AI

Senior AI Engineer

2020 - 2022

AlphaICs

Implemented a motion transfer system using a first-order model, achieving high-quality motion transfer between faces while preserving the identity and facial expressions of the target face.
Built a quantization software development kit (SDK) for 4-bit and 8-bit quantization, enabling the efficient implementation and optimization of deep learning models on Edge (CPU-based) hardware, which enhanced performance and capabilities.
Benchmarked different computer vision and generative models with custom quantization and optimization SDK for IOT and custom Edge devices.
Worked on brain image segmentation using deep learning, which involves training neural networks to accurately identify and classify structures in brain images linked to Alzheimer's disease. I've used segmentation and computer vision techniques.
Rolled out a 3D object detection and tracking system for autonomous vehicles using lidar data and the VoxelNet algorithm, enhancing the vehicles' perception and tracking capabilities in a 3D environment.
Developed an infrared object detection system using the You Only Look Once (YOLO) architecture, achieving high accuracy in detecting objects in infrared images and providing reliable identification and tracking capabilities.
Created a satellite image segmentation system for detecting agricultural fields using a cascade of U-Net and Mask R-CNN models, improving agricultural analysis and decision-making processes.

Technologies: Python, Deep Learning, Quantization, Computer Vision, NVIDIA TensorRT, Continuous Integration (CI), Continuous Development (CD), Models, PyTorch, TensorFlow, Keras, FastAPI, Fast.ai, GPT, You Only Look Once (YOLO), Artificial Intelligence (AI), Google Cloud Platform (GCP), Convolutional Neural Networks (CNN), Image Processing, Benchmarking, Amazon Web Services (AWS), Large Language Models (LLMs), Text to Image, Diffusion Models, Deep Neural Networks, Language Models, MySQL, Machine Learning, ETL, Statistical Analysis, Data Analysis, Image Analysis, Data Science, OpenCV, iOS, Image Generation, Chatbots, Pandas, Generative Pre-trained Transformers (GPT), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, JavaScript, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure DevOps, Text to Speech (TTS), Generative AI

Machine Learning Engineer

2019 - 2020

UnrealAI

Developed and deployed real-time yoga pose estimation on Android using OpenPifPaf, achieving accurate results for Indian yoga poses. Optimized inference speed and converted the model into TensorFlow Lite format for seamless integration.
Created a topic modeling model, utilizing LDA and NMF algorithms for latent topic extraction from text corpora, and applied clustering algorithms to group similar topics, providing a better understanding and organization of the text documents.
Built a computer vision system for accurately detecting items in the kitchen, with high accuracy and low latency. The system was optimized for real-time performance on mobile devices.
Detected income tax fraud using an ensemble of supervised anomaly detection, unsupervised clusterin, and rule-based backtracking.

Technologies: Computer Vision, PyTorch, TensorFlow, TensorFlow Light, Continuous Integration (CI), Continuous Development (CD), Flask, Deep Learning, Pose Estimation, Open Neural Network Exchange (ONNX), Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Convolutional Neural Networks (CNN), Deep Neural Networks, Language Models, MySQL, ETL, Statistical Analysis, Data Analysis, Image Analysis, Python, Large Language Models (LLMs), Data Science, MongoDB, OpenCV, iOS, Image Generation, Django, Pandas, Text Analytics, Video & Audio Processing, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, Reinforcement Learning, Falcon, 2D, JavaScript, Google Speech-to-Text API, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative AI

Experience

Legal Law Chatbot with RAG, Pinecone Integration, Streamlit UI, and GPT-4

In this project, we developed a legal chatbot leveraging OpenAI's GPT-4, LangChain, and the retrieval-augmented generation (RAG) model, integrated with a Pinecone database and developed using Streamlit for the user interface, all built on a scalable Azure architecture. This chatbot is designed to provide precise and context-sensitive legal advice, utilizing the Azure OpenAI GPT-4 series, GPT-35-Turbo series, embeddings series models for natural language understanding, and LangChain for seamless conversational AI. We fine-tuned the model on Azure AI Studio and enhanced the model capability by connecting the LLMs with other Azure services, like Azure AI Search.

Personalized Art Generation Bot

Developed a bot to assist users in generating custom art based on their interactions with the bot and the images provided by the users. To accomplish this, a large language model (LLM), specifically GPT-3.5, was employed as the basis for the bot. Further, a soft prompt pipeline is Implemented, considering the users' prior interactions to capture the user's tone accurately. Notably, the system demonstrated the capability to handle user-specific data, including NSFW and adult content, all the while maintaining strict user privacy. In terms of image generation, Stable Diffusion 2.1 was fine-tuned using Lora, incorporating themes and prompts recommended by the LLM.

NFT Image to Immersive 3D

Developed a system that converts 2D images of NFTs into immersive 3D models using a combination of selective 3D inpainting via Stable Diffusion and depth estimation techniques.

Selective 3D inpainting involves the advanced process of filling in missing or damaged regions in the 2D images, resulting in a complete and visually appealing 3D representation. This technique helps to enhance the overall quality and realism of the generated 3D models.

Depth estimation is another critical component of the system as it enables the determination of the spatial depth information from 2D images. This depth information is essential for creating a sense of depth and perspective in the resulting 3D models.

By leveraging Stable Diffusion, the system ensures a stable and consistent generation process, delivering high-quality and accurate 3D representations of the NFTs from their 2D counterparts. The resulting 3D models can significantly enrich users' viewing and interaction experience in various applications, ranging from virtual galleries to augmented reality environments.

News to Infographics

Successfully delivered a generative AI model, utilizing Stable Diffusion and LLMs technologies. This model is capable of animating stories sourced from news articles and has helped the client secure successful fundraising with Y Combinator.

The process begins with news articles being first summarized using GPT-3.5 Turbo and Davinci, facilitated by LangChain. Subsequently, videos are generated using the fine-tuned Stable Diffusion 2.1 technique, resulting in engaging and dynamic visual representations of the news stories.

Yoga Pose Correction

Developed and deployed a real-time yoga pose estimation and correction system for the Android platform, utilizing the OpenPifPaf model. The primary objective was to achieve precise and reliable recognition of various Indian yoga poses. A major focus was dedicated to optimizing the system's inference speed to ensure seamless and real-time performance during live yoga sessions.

The trained model was thoughtfully quantized and converted to a TensorFlow Lite format to enhance usability and integration. This conversion facilitated the easy incorporation of the model into Android applications, providing a user-friendly tool for yoga enthusiasts to refine their practice and gain a deeper understanding of different postures.

System and Method for Integer-only Quantization-aware Training for Edge

Developed a system for integer-only quantization-aware training. The system enhances the speed and performance of deep learning networks on low-precision devices.
I developed the pseudo-cross entropy loss function and designed the quantization scheme for integer-only quantization-aware training. Additionally, an SDK was developed that enables the utilization of this system on low-power edge compute devices. The SDK has been successfully used to quantize models on Jetson and the vendors' custom hardware.

Fake News Classification

Developed a system to detect and classify fake news articles in India using machine learning and natural language processing techniques.

The project involved preprocessing text data, employing the SetFit model and LSTM, and developing an ensemble of SetFit and LSTM to identify fake news accurately.

Additionally, k-means clustering was used to cluster the type of fake news. The end goal was to create a reliable tool to combat the spread of misinformation. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.

Text-to-video Generation for Mathematical Equations

Developed a robust diffusion model capable of interpreting English text descriptions of mathematical equations and generating accurate, coherent video representations. I built a tool that can assist in educational settings, providing students and educators with a visual aid to better understand and communicate complex mathematical concepts. My work also included implementing advanced optimization techniques to improve the model's performance in terms of latency and memory footprint, as well as making it more efficient and accessible for real-time applications.

Education

2016 - 2020

Bachelor of Technology Degree in Computer Science

University School of Information, Communication and Technology - Dwarka, Delhi, India

Skills

Libraries/APIs

PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, Keras, Fast.ai

Tools

You Only Look Once (YOLO), Git, ChatGPT, Notion, Haystack, Azure Machine Learning, Whisper, Amazon SageMaker, Google Bard

Frameworks

Flask, LlamaIndex, Django, Streamlit

Languages

Python, C++, Falcon, JavaScript, Bash Script

Paradigms

Data Science, ETL, Azure DevOps, Continuous Development (CD), Continuous Integration (CI), Search Engine Optimization (SEO)

Platforms

Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, HubSpot, iOS, Linux, Amazon Web Services (AWS), Civitai, Azure

Storage

MySQL, MongoDB, Databases

Other

Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, TensorFlow Light, Machine Learning, LangChain, Statistics, Depth Estimation, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), GPT, Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks (CNN), Image Processing, OpenAI GPT-4 API, OpenAI GPT-3 API, Text to Image, Diffusion Models, NLU, Deep Neural Networks, Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, OpenAI, APIs, HubSpot CRM, Retrieval-augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative AI, NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, DreamBooth, LoRa, Generative Adversarial Networks (GANs), K-means Clustering, Edge AI, Open Neural Network Exchange (ONNX), Prunning, Benchmarking, Object Detection, Machine Learning Operations (MLOps), Product Matching, Prompt Engineering, ControlNet, Gradio, Videos, Conversational AI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring