Shivam Garg, Developer in Delhi, India
Shivam is available for hire
Hire Shivam

Shivam Garg

Verified Expert  in Engineering

Bio

Shivam is a senior AI engineer with 5+ years of hands-on experience in deep learning and artificial intelligence. Proficient in various deep learning frameworks such as PyTorch, Hugging Face, TensorFlow, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam stands out for his extensive expertise in classical computer vision and large language models.

Portfolio

Self-Employed
AI Agents, Python 3, PyTorch, Large Language Models (LLMs), CrewAI, LangGraph...
ContractPodAi
Natural Language Processing (NLP), Data Science, LSTM, Machine Learning...
IBM
CI/CD Pipelines, GitHub Actions, Large Language Models (LLMs)...

Experience

Availability

Full-time

Preferred Environment

Python, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Large Language Models (LLMs), LangChain, Machine Learning, AI Agents, OpenAI GPT-4 API, English, Amazon Web Services (AWS), Retrieval-augmented Generation (RAG), LangGraph, TypeScript, Vector Databases, Hugging Face, Llama 3, Full-stack, Kubernetes, Minimum Viable Product (MVP), Team Leadership, Video Processing, Azure Databricks, Data Modeling, AWS Lambda, Model Tuning, Serverless, Amazon Bedrock, Node.js, SQL, Snowflake, Google Colaboratory (Colab), Jupiter, Kubeflow, Data Annotation, Keras, Amazon RDS, Data Cleansing, GraphDB, Generative Adversarial Networks (GANs), Text to Image AI, DreamBooth, SpaCy, Google Vision API, Neural Networks, Transformers, API Integration, Data Engineering

The most amazing...

...generative AI model I've delivered uses Stable Diffusion and LLMs to animate stories from news articles. It helped secure Y Combinator funding.

Work Experience

AI Engineer 3

2024 - PRESENT
Self-Employed
  • Developed a travel assistant using retrieval-augmented generation (RAG) and agentic workflows. Implemented chain-of-thought prompting and context-aware templates for trip planning, flight bookings, and Q&A.
  • Engineered a dynamic lead scoring system for a cloud-based project management SaaS. Utilized RHFL tunned fine-tuned Llama 2 with few-shot learning to analyze prospect interactions and prioritize high-value leads.
  • Developed a virtual house agent via CrewAI and fine-tuned Llama 3 via LoRa to reduce home maintenance costs by 30% and improve service efficiency. AI processes homeowner requests, then generates quick estimates and schedules professionals.
  • Implemented a personal finance assistant using LangChain and PEFT fine-tuned Mistral AI to increase savings by 15-20%. AI categorizes expenses, creates budgets, and optimizes investment portfolios based on risk profiles and market conditions.
  • Developed an AI supply chain optimizer using AutoGen, GPT-4, and RAG. Enhanced inventory management and disruption prediction, reducing costs by 20% and improving delivery efficiency by 15%.
  • Fine-tuned Llama 3 8B using instruction tuning, preference alignment (ORPO), and hyperparameter optimization for improved recipe recommendations.
  • Created an audit bot for auditing clinical trials for J&J utilizing chain of thought prompting, data-driven prompting, and ElasticDB-based retrieval-augmented generation (RAG).
  • Created an AI-powered language exchange platform using React, FastAPI, and NLP models for real-time translation and analysis. Increased employee language proficiency by 45% and improved cross-cultural communication efficiency by 30%.
  • Developed an AI-powered language learning platform using transformer models and SpaCy. Implemented adaptive difficulty and personalized feedback, boosting engagement by 55% and retention by 40%. Integrated 20+ language corpora for diverse content.
  • Developed a real-time social media sentiment analyzer using RoBERTa and Kafka. Achieved 92% accuracy in multilingual emotion detection, reducing customer response time by 33% for major brands.
Technologies: AI Agents, Python 3, PyTorch, Large Language Models (LLMs), CrewAI, LangGraph, LangChain, Retrieval-augmented Generation (RAG), Llama 3, OpenAI GPT-4 API, Amazon Bedrock, VLLM, Python, GNU Autogen, Qdrant, Neo4j, Natural Language Processing (NLP), Vector Databases, FastAPI, Machine Learning Operations (MLOps), React, Sentiment Analysis, BERT, RoBERTa, Apache Kafka, AI Prompts, DALL-E, Hand Tracking, Sign Language, Unity, Open-source LLMs, SpaCy, Google Vision API, Handwriting Recognition, Regex, Amazon Textract, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Make, Firebase, Zapier, Deep Reinforcement Learning, Data Engineering

Senior LLM Engineer

2024 - 2024
ContractPodAi
  • Collaborated with a sales intelligence platform to build an RAG system using LangGraph. Optimized LLaMA-3 8B models reduced size by 75% and latency by 50%, maintaining 0.95 recall and 30 BLEU score.
  • Developed a personalized sales and marketing bot using LangChain with few-shot and chain-of-thought prompting. Improved lead qualification by 40% and content personalization by 25%.
  • Developed an end-to-end multi-agent contract management tool for extracting, drafting, and managing legal contracts. Utilized a fine-tuned Llama 3 model, Graph RAG with Neo4j via LangChain, and quantized with QLora. It was deployed using vLLM.
  • Developed a legal copilot using a multi-level RAG approach with MongoDB Atlas and OpenAI Assistant API. The system queries over 100,000 documents for efficient and accurate contract insights, achieving a BeLU score 0.95.
  • Led a team of five developers to architect a multi-tenant AI back end, supporting over 10,000 concurrent users. Set up an MLOps pipeline, reducing deployment time from two hours to one hour, and a CI/CD pipeline for faster software delivery.
  • Designed a system to automate legal contract drafting and review by fine-tuning Llama 3 with LoRA. It generates templates, suggests clauses, and detects inconsistencies, reducing drafting time by 50% and errors by 35% compared to traditional methods.
  • Worked with the solutions and sales teams to rapidly prototype and build AI proof of concepts (POCs) and responses to proposal requests (RFPs). Some of the successful POCs and RFPs include finance assistants, trading bots, and legal advisors.
  • Created an insurance copilot using a fine-tuned OpenAI GPT-4 API. This enhanced customer interactions by providing accurate, real-time responses, automating routine tasks, and ensuring a seamless user experience.
  • Implemented security measures like AES-256 encryption, TLS 1.3, RBAC, MFA, and data privacy for detailed documentation and training for effective system use, ensuring robust data security and user efficiency.
  • Partnered with a financial advisory firm to create an AI assistant using GPT-4. Developed prompt templates with constrained generation for tailored investment advice and market insights.
Technologies: Natural Language Processing (NLP), Data Science, LSTM, Machine Learning, TensorFlow, BERT, Python, OpenAI, Reinforcement Learning, Large Language Models (LLMs), Language Models, ChatGPT, Falcon, PEFT, AI Agents, Llama 3, OpenAI Assistants API, MongoDB, FastAPI, Python 3, JSON Web Tokens (JWT), Machine Learning Operations (MLOps), Prompt Engineering, ChatGPT Prompts, English, Gemini, Claude, DSPy, Outbound Marketing, Amazon Web Services (AWS), Retrieval-augmented Generation (RAG), LangGraph, Neo4j, Scalable Vector Databases, TypeScript, Vector Databases, Agile, Mistral AI, Jira, Google Cloud, Hugging Face, Full-stack, Full-stack Development, React, Rapid Prototyping, Multi-agent Systems, Reinforcement Learning from Human Feedback (RLHF), Data Analytics, Sentiment Analysis, Computer Vision Algorithms, Kubernetes, Azure AI Studio, Minimum Viable Product (MVP), Team Leadership, Plugins, Azure Data Factory, Azure Databricks, Data Modeling, LangSmith, Ollama, AWS Lambda, Model Tuning, Serverless, Amazon Bedrock, SQL, Google Colaboratory (Colab), Jupiter, Modeling, Kubeflow, Medical Imaging, Speech to Text AI, Speech Analytics, Voice Analysis, Data Annotation, Microsoft Power BI, Keras, Amazon RDS, OpenAI API, CrewAI, Data Cleansing, Graph Databases, GraphDB, Technical Leadership, Human Resources (HR), Estimations, PostgreSQL, Google Sheets, Jupyter Notebook, Generative Pre-trained Transformer 4 (GPT-4), Generative Adversarial Networks (GANs), LoRa, Text to Image AI, DreamBooth, Web Scraping, OpenAI GPT-4 API, AI Prompts, DALL-E, Sign Language, Open-source LLMs, SpaCy, Google Vision API, Handwriting Recognition, Regex, Amazon Textract, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Make, Firebase, Deep Reinforcement Learning, Data Engineering

Senior Machine Learning Engineer

2023 - 2024
IBM
  • Built a weather forecasting system for The Weather Company (IBM) using Mistral. The LLM enhanced real-time weather insights, improving forecast accuracy to 85% and reducing alert processing time by 35%, ensuring timely and precise weather updates.
  • Created a fashion sales forecasting system for an eCommerce platform using CLIP embeddings and GPT-3. Analyzed product images and descriptions to predict trends, improving inventory management and boosting quarterly sales by 18%.
  • Created a tool to search for similar patents using LangChain's OpenAI ADA model embeddings and pinecone for improved indexing. This resulted in a 90% precision rate for relevant patent identification.
  • Utilized AWS for cloud computing, cutting model deployment time by 50% and improving resource efficiency by 40%. Enhanced model streaming and observability, boosting real-time performance tracking by 30% and reducing operational overhead by 25%.
  • Implemented keystroke dynamics for eCommerce authentication with Python for feature extraction and TensorFlow with LSTM networks for deep learning. Achieved 95% accuracy, cut authentication time by 30%, and reduced security breaches by 45%.
  • Implemented and managed MLOps practices to ensure efficient lifecycle management of ML models. Worked with GitHub Actions to enhance the code integration and deployment process automation for CI/CD.
  • Developed and managed AI/ML solutions with skills in ML, MLOps, cloud computing, and CI/CD practices.
  • Contributed to the development of LLMs to advance natural language processing and AI. Applied technologies and methodologies to drive innovation and enhance ML capabilities.
  • Developed the automated Personality Assessment and Reporting System (PARS) using LLMs and retrieval-augmented generation (RAG); Mistral, fine-tuned via PEFT, RAG with LangChain, and Pinecone for data extraction and cleaning.
  • Utilized OpenAI GPT-4 with the chain of thought and LangChain with Pinecone to compare traits and create visual reports. The model was refined with BitFit, cross-validation, and A/B testing, achieving 95% accuracy.
Technologies: CI/CD Pipelines, GitHub Actions, Large Language Models (LLMs), Large Language Model Operations (LLMOps), AWS Serverless Application Model (SAM), Amazon SageMaker, Biometrics, AWS Lambda, Amazon Web Services (AWS), Model Tuning, Serverless, Amazon Bedrock, Node.js, SQL, Snowflake, Google Colaboratory (Colab), Jupiter, Trading, Binary Option Trading, Futures & Options, Option Pricing, Options Trading, Modeling, Kubeflow, Speech to Text AI, Speech Analytics, Voice Analysis, Data Annotation, Microsoft Power BI, Amazon Athena, Keras, Amazon RDS, OpenAI API, CrewAI, Data Cleansing, Graph Databases, GraphDB, Technical Leadership, Human Resources (HR), Estimations, PostgreSQL, Google Sheets, Jupyter Notebook, Generative Pre-trained Transformer 4 (GPT-4), Generative Adversarial Networks (GANs), LoRa, Text to Image AI, DreamBooth, Web Scraping, OpenAI GPT-4 API, AI Prompts, DALL-E, Open-source LLMs, SpaCy, Google Vision API, Handwriting Recognition, Regex, Amazon Textract, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Make, Firebase, Zapier, Data Engineering

AI Engineer 3

2022 - 2023
Avatarin Inc
  • Created a system to assist human Kanji writing through imitation learning and OpenCV using Kanji videos to generate Kanji images predicting poses for robotic arms.
  • Implemented a model that detects suspicious activity at airports using VideoMAE. It prioritized high accuracy, low latency, and efficient deployment on the client's Linux server.
  • Shot detection using YOLOv5, OpenCV for object detection, and VideoMAE for shot recognition in TT Games for World Table Tennis Organization.
  • Integrated an infrared object detection system using the YOLO architecture with DeepSORT, achieving high accuracy with an mAP of 0.88 for detecting and tracking vehicles.
  • Created a satellite image segmentation system using U-Net and Mask R-CNN models, achieving a Dice score of 0.94, greatly enhancing agricultural field detection and analysis.
Technologies: 3D Reconstruction, Python, Computer Vision, OCR, Natural Language Processing (NLP), Object Detection, Image Processing, Benchmarking, OpenCV, Amazon Web Services (AWS), Text to Image, Large Language Models (LLMs), Diffusion Models, Deep Neural Networks (DNNs), ChatGPT, Language Models, MySQL, Machine Learning, Statistical Analysis, Data Analysis, Image Analysis, Data Science, MongoDB, Image Generation, Chatbots, LangChain, LlamaIndex, Django, Pandas, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure Machine Learning, Azure DevOps, Text to Speech (TTS), Generative Artificial Intelligence (GenAI), NumPy, Prompt Engineering, English, TypeScript, Agile, Jira, Google Cloud, Hugging Face, Llama 3, Full-stack, Full-stack Development, HIPAA Compliance, Rapid Prototyping, Data Analytics, Machine Learning Operations (MLOps), Sentiment Analysis, Computer Vision Algorithms, Kubernetes, Minimum Viable Product (MVP), Team Leadership, Plugins, Video Processing, Azure Data Factory, Azure Databricks, Data Modeling, Biometrics, Object Tracking, DeepSORT, AWS Lambda, Model Tuning, Serverless, Node.js, SQL, Snowflake, Google Colaboratory (Colab), Jupiter, Modeling, Kubeflow, Medical Imaging, Speech to Text AI, Speech Analytics, Voice Analysis, Data Annotation, Microsoft Power BI, Keras, Amazon RDS, Data Cleansing, Graph Databases, GraphDB, Technical Leadership, Estimations, PostgreSQL, Google Sheets, Jupyter Notebook, Generative Adversarial Networks (GANs), LoRa, Text to Image AI, DreamBooth, DALL-E, Unity, Handwriting Recognition, Regex, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Make, Data Engineering

Senior AI Engineer

2020 - 2022
AlphaICs
  • Implemented a motion transfer system using a first-order model, achieving high-quality motion transfer between faces while preserving the identity and facial expressions of the target face.
  • Built a quantization software development kit (SDK) for 4-bit and 8-bit quantization, enabling the efficient implementation and optimization of deep learning models on Edge (CPU-based) hardware, which enhanced performance and capabilities.
  • Benchmarked different computer vision and generative models with custom quantization and optimization SDK for IOT and custom Edge devices.
  • Worked on brain image segmentation using deep learning, which involves training neural networks to accurately identify and classify structures in brain images linked to Alzheimer's disease. I've used segmentation and computer vision techniques.
  • Rolled out a 3D object detection and tracking system for autonomous vehicles using lidar data and the VoxelNet algorithm, enhancing the vehicles' perception and tracking capabilities in a 3D environment.
  • Developed an infrared object detection system using the You Only Look Once (YOLO) architecture, achieving high accuracy in detecting objects in infrared images and providing reliable identification and tracking capabilities.
  • Created a satellite image segmentation system for detecting agricultural fields using a cascade of U-Net and Mask R-CNN models, improving agricultural analysis and decision-making processes.
Technologies: Python, Deep Learning, Quantization, Computer Vision, NVIDIA TensorRT, Continuous Integration (CI), Continuous Development (CD), Models, PyTorch, TensorFlow, Keras, FastAPI, Fast.ai, You Only Look Once (YOLO), Artificial Intelligence (AI), Google Cloud Platform (GCP), Convolutional Neural Networks (CNNs), Image Processing, Benchmarking, Amazon Web Services (AWS), Large Language Models (LLMs), Text to Image, Diffusion Models, Deep Neural Networks (DNNs), Language Models, MySQL, Machine Learning, ETL, Statistical Analysis, Data Analysis, Image Analysis, Data Science, OpenCV, iOS, Image Generation, Chatbots, Pandas, Generative Pre-trained Transformers (GPT), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, JavaScript, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure DevOps, Text to Speech (TTS), Generative Artificial Intelligence (GenAI), NumPy, Azure, English, Agile, Jira, Google Cloud, Llama 3, Full-stack, Full-stack Development, Rapid Prototyping, Data Analytics, Machine Learning Operations (MLOps), Sentiment Analysis, Computer Vision Algorithms, Kubernetes, Azure AI Studio, Minimum Viable Product (MVP), Team Leadership, Video Processing, Biometrics, AWS Lambda, Model Tuning, Serverless, Node.js, SQL, Snowflake, Google Colaboratory (Colab), Jupiter, Trading, Binary Option Trading, Futures & Options, Option Pricing, Options Trading, Modeling, Kubeflow, Medical Imaging, Speech Analytics, Voice Analysis, Data Annotation, Microsoft Power BI, Amazon Athena, Amazon RDS, Data Cleansing, Technical Leadership, Jupyter Notebook, Generative Adversarial Networks (GANs), LoRa, Text to Image AI, DALL-E, Hand Tracking, Handwriting Recognition, Regex, Tesseract, Neural Networks, API Integration, Firebase, Deep Reinforcement Learning, Data Engineering

Machine Learning Engineer

2019 - 2020
UnrealAI
  • Built a real-time Yoga pose estimation system for Android and iOS using OpenPifPaf. It achieves 95% accuracy and offers detailed feedback to refine users' yoga practice.
  • Created a topic modeling model, utilizing LDA and NMF algorithms for latent topic extraction from text corpora, and applied clustering algorithms to group similar topics, providing a better understanding and organization of the text documents.
  • Built a computer vision system for accurately detecting items in the kitchen, with high accuracy and low latency. The system was optimized for real-time performance on mobile devices.
  • Detected income tax fraud using an ensemble of supervised anomaly detection, unsupervised clusterin, and rule-based backtracking.
Technologies: Computer Vision, PyTorch, TensorFlow, TensorFlow Lite, Continuous Integration (CI), Continuous Development (CD), Flask, Deep Learning, Pose Estimation, Open Neural Network Exchange (ONNX), Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Convolutional Neural Networks (CNNs), Deep Neural Networks (DNNs), Language Models, MySQL, ETL, Statistical Analysis, Data Analysis, Image Analysis, Python, Large Language Models (LLMs), Data Science, MongoDB, OpenCV, iOS, Image Generation, Django, Pandas, Text Analytics, Video & Audio Processing, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, Reinforcement Learning, Falcon, 2D, JavaScript, Google Speech-to-Text API, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative Artificial Intelligence (GenAI), NumPy, English, TypeScript, Agile, Machine Learning Operations (MLOps), Sentiment Analysis, Kubernetes, Minimum Viable Product (MVP), Video Processing, Biometrics, Model Tuning, Serverless, Node.js, SQL, Google Colaboratory (Colab), Jupiter, Trading, Binary Option Trading, Futures & Options, Option Pricing, Options Trading, Modeling, Medical Imaging, Speech to Text AI, Data Annotation, Amazon Athena, Data Cleansing, PostgreSQL, Jupyter Notebook, Hand Tracking, SpaCy, Regex, Neural Networks, Firebase, Zapier, Deep Reinforcement Learning

Text-to-video Generation for Mathematical Equations

Developed a robust diffusion model capable of interpreting English text descriptions of mathematical equations and generating accurate, coherent video representations. I built a tool that can assist in educational settings, providing students and educators with a visual aid to better understand and communicate complex mathematical concepts. My work also included implementing advanced optimization techniques to improve the model's performance in terms of latency and memory footprint, as well as making it more efficient and accessible for real-time applications.

Legal Law Chatbot with RAG, Pinecone Integration, Streamlit UI, and GPT-4

In this project, we developed a legal chatbot leveraging OpenAI's GPT-4, LangChain, and the retrieval-augmented generation (RAG) model, integrated with a Pinecone database and developed using Streamlit for the user interface, all built on a scalable Azure architecture. This chatbot is designed to provide precise and context-sensitive legal advice, utilizing the Azure OpenAI GPT-4 series, GPT-35-Turbo series, embeddings series models for natural language understanding, and LangChain for seamless conversational AI. We fine-tuned the model on Azure AI Studio and enhanced the model capability by connecting the LLMs with other Azure services, like Azure AI Search.

Text to Video

https://drive.google.com/drive/folders/15vLHASESD4HWIQ5nY3OtxrvzSsYb_IDe
Implemented a text-to-video system for a film-making client using MORA (a framework or technique for multimedia generation) and fine-tuning the iv2GenXL model with LoRA (a technique for efficient model adaptation). This system converts text descriptions into short video reels while preserving character consistency across scenes. It achieved a high 0.9 CLIP score, indicating strong alignment between text and generated video content.

System and Method for On-device Edge Learning

https://patents.justia.com/patent/20230386194
I developed and patented a system for device edge learning.

The method includes training an artificial intelligence (AI) model for extracting the visual embeddings with pre-trained visual deployment networks; checking the performance of the AI model by feeding real-time data and by performing an inference; initiating edge learning; extracting visual embeddings with pre-trained visual deployment networks; performing the inference and adding a text image embedding; taking the text embeddings using text embedders embeddings; converting the text to image embeddings to generate augmented image embeddings and adding text embeddings

News to Infographics

Successfully delivered a generative AI model, utilizing Stable Diffusion and LLMs technologies where this model is capable of animating stories sourced from news articles and has helped the client secure successful fundraising with Y Combinator.

The process begins with news articles being first summarized using GPT-3.5 Turbo and Davinci, facilitated by LangChain. Subsequently, videos are generated using the fine-tuned Stable Diffusion 2.1 technique, resulting in engaging and dynamic visual representations of the news stories.

System and Method for Integer-only Quantization-aware Training for Edge

https://patents.justia.com/patent/20230342613
I developed and patented a system for integer-only quantization-aware training. The system enhances the speed and performance of deep learning networks on low-precision devices.

I developed the pseudo-cross entropy loss function and designed the quantization scheme for integer-only quantization-aware training. Additionally, an SDK was developed to enable the utilization of this system on low-power edge computing devices. The SDK has been successfully used to quantize models on Jetson and the vendors' custom hardware.

Personalized Art Generation Bot

Developed a bot to assist users in generating custom art based on their interactions with the bot and the images provided by the users. To accomplish this, a large language model (LLM), specifically GPT-3.5, was employed as the basis for the bot. Further, a soft prompt pipeline is Implemented, considering the users' prior interactions to capture the user's tone accurately. Notably, the system demonstrated the capability to handle user-specific data, including NSFW and adult content, all the while maintaining strict user privacy. In terms of image generation, Stable Diffusion 2.1 was fine-tuned using Lora, incorporating themes and prompts recommended by the LLM.

Interactive NFT (2D-3D)

Developed a system that converts 2D images of NFTs into immersive 3D models using a combination of selective 3D inpainting via Stable Diffusion and depth estimation techniques.

Selective 3D inpainting involves the advanced process of filling in missing or damaged regions in the 2D images, resulting in a complete and visually appealing 3D representation. This technique helps to enhance the overall quality and realism of the generated 3D models.

Depth estimation is another critical component of the system as it enables the determination of the spatial depth information from 2D images. This depth information is essential for creating a sense of depth and perspective in the resulting 3D models.

By leveraging Stable Diffusion, the system ensures a stable and consistent generation process, delivering high-quality and accurate 3D representations of the NFTs from their 2D counterparts. The resulting 3D models can significantly enrich users' viewing and interaction experience in various applications, ranging from virtual galleries to augmented reality environments.

Yoga Pose Correction

Developed and deployed a real-time yoga pose estimation and correction system for the Android platform, utilizing the OpenPifPaf model. The primary objective was to achieve precise and reliable recognition of various Indian yoga poses. A major focus was dedicated to optimizing the system's inference speed to ensure seamless and real-time performance during live yoga sessions.

The trained model was thoughtfully quantized and converted to a TensorFlow Lite format to enhance usability and integration. This conversion facilitated the easy incorporation of the model into Android applications, providing a user-friendly tool for yoga enthusiasts to refine their practice and gain a deeper understanding of different postures.

Toonify Pets

Developed a system to convert animal images into animated cartoons by training a GAN on unpaired images, as utilizing the StyleGAN architecture, the system was enhanced with CLIP for better alignment with textual descriptions and a feature extractor for refined details and this approach generated high-quality cartoons, achieving a notable FID score of 12, demonstrating strong visual fidelity and consistency in the animated outputs.

Real-time Table Tennis Tracking and Shot Detection System

https://drive.google.com/file/d/1Mic7692BmlfOSGTcIbefIBM9ElFPZoBd/view?usp=sharing
Developed a system for the World Table Tennis Organisation using OpenCV for ball detection and depth estimation via OpenMiDaS, achieving a smooth 60 frames per second (fps). This project involved precise camera calibration and the integration of depth estimation techniques to accurately trace the ball's trajectory. Additionally, he implemented a deep learning approach using VideoMAE for shot detection, achieving a remarkable 95% accuracy rate in predicting forehand and backhand shots. The client also used this system to broadcast on television.

Fake News Classification

Developed a system to detect and classify fake news articles in India using machine learning and natural language processing techniques.

The project involved preprocessing text data, employing the SetFit model and LSTM, and developing an ensemble of SetFit and LSTM to identify fake news accurately.

Additionally, k-means clustering was used to cluster the type of fake news. The end goal was to create a reliable tool to combat the spread of misinformation. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.

Text and Voice Assistant for Students with Dyslexia

Developed a text and voice-based assistant for dyslexic students at an edtech startup as an MVP.

By fine-tuning the Mixtral-8x7B-Instruct model using Reinforcement Learning from Human Feedback (RLHF) and leveraging Azure DevOps for CI/CD and deployment, with document ingestion handled via Azure, the system significantly improved its ability to manage atypical spelling and grammar. We implemented RAG to enhance data extraction and query responses, further improving comprehension of unusual queries. We also integrated text-to-speech functionality to aid students with reading difficulties, ensuring a more accessible and supportive learning experience.

ASL-to-Speech for Healthcare

Developed an advanced system that translates American Sign Language (ASL) gestures into natural-sounding speech using a combination of computer vision and deep learning techniques.

The system employs MediaPipe for precise hand tracking, extracting key landmarks from video input. Custom convolutional neural networks (CNNs) process these landmarks, capturing spatial relationships crucial for sign language interpretation. A bidirectional long short-term memory (LSTM) network models the temporal sequences of gestures, enabling context-aware translation. The translated text is then converted to speech using Tacotron 2 for mel-spectrogram generation and WaveNet as a vocoder, producing high-quality, natural-sounding audio output. This end-to-end pipeline achieves 92% accuracy in ASL translation, significantly enhancing communication between deaf individuals and hearing individuals in healthcare settings.

The system's modular architecture allows for easy updates and improvements to individual components, ensuring scalability and adaptability to different sign languages or use cases.
2016 - 2020

Bachelor of Technology Degree in Computer Science

University School of Information, Communication and Technology - Dwarka, Delhi, India

Libraries/APIs

PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, NumPy, React, Node.js, OpenAI API, Google Vision API, Keras, Fast.ai, Gradio, OpenAI Assistants API

Tools

You Only Look Once (YOLO), Git, ChatGPT, Notion, Haystack, Azure Machine Learning, Whisper, Jira, Microsoft Power BI, Amazon Athena, Google Sheets, AI Prompts, Amazon Textract, Make, Zapier, Amazon SageMaker, Google Bard, Open Neural Network Exchange (ONNX), GNU Autogen

Languages

Python, C++, Falcon, JavaScript, TypeScript, SQL, Snowflake, Regex, Bash Script, Python 3

Frameworks

TensorFlow Lite, Flask, LlamaIndex, Django, DSPy, LangGraph, Unity, Streamlit, JSON Web Tokens (JWT), AWS Serverless Application Model (SAM)

Paradigms

ETL, Azure DevOps, Agile, Rapid Prototyping, Continuous Development (CD), Continuous Integration (CI), HIPAA Compliance, Search Engine Optimization (SEO)

Platforms

Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, Amazon Web Services (AWS), HubSpot, Azure, Kubernetes, Azure AI Studio, Kubeflow, CrewAI, Jupyter Notebook, Firebase, iOS, Linux, Civitai, Vae, LangSmith, Ollama, Apache Kafka

Storage

MySQL, MongoDB, Neo4j, Google Cloud, Graph Databases, PostgreSQL, Databases

Other

Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, Machine Learning, LangChain, Statistics, Data Science, Depth Estimation, DreamBooth, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), Generative Adversarial Networks (GANs), Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks (CNNs), Image Processing, OpenAI GPT-4 API, OpenAI GPT-3 API, Machine Learning Operations (MLOps), LoRa, Text to Image, Diffusion Models, NLU, Deep Neural Networks (DNNs), Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, Prompt Engineering, OpenAI, APIs, HubSpot CRM, Retrieval-augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), Speech to Text AI, AI Agents, Llama 3, English, Gemini, Claude, Scalable Vector Databases, Vector Databases, Mistral AI, Full-stack, Full-stack Development, Multi-agent Systems, Reinforcement Learning from Human Feedback (RLHF), Data Analytics, Sentiment Analysis, Computer Vision Algorithms, Minimum Viable Product (MVP), Team Leadership, Plugins, Video Processing, Azure Data Factory, Azure Databricks, Data Modeling, Biometrics, Model Tuning, Serverless, Amazon Bedrock, Google Colaboratory (Colab), Jupiter, Modeling, Medical Imaging, Speech Analytics, Voice Analysis, Data Annotation, Amazon RDS, Data Cleansing, GraphDB, Technical Leadership, Human Resources (HR), Generative Pre-trained Transformer 4 (GPT-4), Text to Image AI, Web Scraping, AI Art Visualization, DALL-E, Hand Tracking, Open-source LLMs, Handwriting Recognition, Tesseract, Diffusion-based AI Models, Neural Networks, Transformers, API Integration, Deep Reinforcement Learning, Data Engineering, NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, Outbound Marketing, Trading, Binary Option Trading, Futures & Options, Option Pricing, Options Trading, Sign Language, K-means Clustering, Edge AI, Prunning, Benchmarking, Object Detection, Product Matching, ControlNet, Videos, Conversational AI, Multimodal GenAI, ChatGPT Prompts, CI/CD Pipelines, GitHub Actions, Large Language Model Operations (LLMOps), MORA, Object Tracking, DeepSORT, StyleGAN, VideoMAE, OpenMidas, Mixtral, Estimations, VLLM, Qdrant, RoBERTa

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring