Rahul Singh Inda, Developer in Bengaluru, Karnataka, India
Rahul is available for hire
Hire Rahul

Rahul Singh Inda

Verified Expert  in Engineering

Data Scientist and AI Developer

Bengaluru, Karnataka, India

Toptal member since November 24, 2021

Bio

Rahul is a data scientist with three years of professional experience and an engineer's degree focused on big data analytics. He has delivered several edtech solutions, and his areas of expertise include NLP, including state-of-the-art attention-based models, and computer vision using deep learning and classical machine learning techniques.

Portfolio

Giotto.ai
Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP)...
Skuad
APIs, Data Science, Docker, Generative Pre-trained Transformers (GPT)...
Embibe
PyTorch, Generative Pre-trained Transformers (GPT)...

Experience

  • Python 3 - 5 years
  • Generative Pre-trained Transformers (GPT) - 5 years
  • PyTorch - 5 years
  • Natural Language Processing (NLP) - 5 years
  • Large Language Models (LLMs) - 5 years
  • OpenAI - 5 years
  • Retrieval-augmented Generation (RAG) - 4 years
  • LangChain - 2 years

Availability

Part-time

Preferred Environment

Ubuntu, Visual Studio Code (VS Code), Python 3, PyTorch, Python, Data Science, Machine Learning, Deep Learning, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Computer Vision

The most amazing...

...thing I've achieved so far was ranking #241 among global data scientists at the Kaggle competition.

Work Experience

NLP Engineer

2021 - PRESENT
Giotto.ai
  • Leveraged LLMs such as Gemini to architect retrieval-augmented generation (RAG) systems, significantly advancing the capabilities of internal medical document question-answering systems.
  • Created text classification models using semantic similarity to classify documents into 100+ label categories.
  • Built question-answering (QA) models using BERT and deployed them on GPU using GCP.
  • Managed models in production, including logging and error handling using Google Cloud, Docker, and Grafana.
Technologies: Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Natural Language Toolkit (NLTK), Docker, Data Science, Large Language Models (LLMs), FastAPI, Gemini, OpenAI, LangChain, Google Cloud, Retrieval-augmented Generation (RAG), Artificial Intelligence (AI), ChatGPT, Prompt Engineering

Lead Data Scientist

2021 - 2022
Skuad
  • Built a neural search engine to solve users' queries using deep learning and FAISS, improving the CTR by 8%.
  • Deployed and optimized search to production with around 65,000 to 80,000 daily queries and improved query auto-solves by 25%. Deployed model to production using Google Cloud.
  • Implemented a pipeline to group user data based on topics and a deduplication pipeline for content and queries.
Technologies: APIs, Data Science, Docker, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Python, PyTorch, Deep Learning, Machine Learning, Retrieval-augmented Generation (RAG), Large Language Models (LLMs), Artificial Intelligence (AI), Prompt Engineering, ChatGPT, Google Cloud

Data Scientist

2019 - 2021
Embibe
  • Implemented metadata tagging for academic content with graph nodes for consumer consumption, saving hundreds of person-hours.
  • Built an NLP model to tag 10,000+ concepts and derive learning entities using vector-based inferencing to maximize the value of GPU and reduce response time.
  • Developed an algorithm for knowledge tracing to model students' knowledge using graph embeddings. The goal is to accurately predict how students will perform in future interactions based on learning activities. The algorithm improved accuracy by 12%.
  • Developed the process and led two junior employees in the launch of a doubt resolutions product for students. A vector-based search algorithm returns top-matched questions to users using text and images.
  • Built a pipeline for concept tagging YouTube videos. It can download and fetch video transcripts using a text-to-speech API and create a classification model.
  • Worked on Google Cloud Run to build a distributed architecture to solve the scalable deployment of deep learning models. Deployed the models with a logging and monitoring dashboard for real-time and batch inference mode.
Technologies: PyTorch, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPU Computing, Computer Vision Algorithms, Scikit-learn, NumPy, Pandas, PySpark, Artificial Intelligence (AI)

Experience

Cornell Birdcall Identification

https://www.kaggle.com/rsinda/training-efficientnet-model
Participated in a Kaggle competition to identify a wide variety of bird vocalizations in natural landscape recordings. A recording can include multiple bird species, and the task was to predict all the species in the recordings. The audio clips contained noise from the surrounding landscape, and I built a robust model for this competition.

You can read the full description at https://www.kaggle.com/c/birdsong-recognition.

Product Classification API

https://github.com/rsinda/product-classification
A multi-class classifier to classify product categories from a given set of features. To build this, I trained a random forest model, used TfidfVectorizer to build textual features, dealt with class imbalance, dockerized the entire application, and developed an API.

Identify Placement of Tubes in Chest X-rays

https://www.kaggle.com/rsinda/38th-place-solution-0-972-single-model-5-fold
A Kaggle competition to classify the presence and correct placement of tubes on chest x-rays to save lives. The dataset contained 40,000 images, and the task was to categorize a tube that was poorly placed. I won the silver medal in this competition.

Education

2015 - 2019

Bachelor's Degree in Computer Science

U. V. Patel College of Engineering - Ahmedabad, Gujarat, India

Certifications

FEBRUARY 2020 - PRESENT

CutShort Certified Deep Learning - Advanced

cutshort

SEPTEMBER 2019 - PRESENT

Convolutional Neural Networks

Coursera

AUGUST 2019 - PRESENT

Neural Networks and Deep Learning

Coursera

Skills

Libraries/APIs

PyTorch, NumPy, Pandas, Natural Language Toolkit (NLTK), Scikit-learn, PySpark

Tools

ChatGPT

Languages

Python 3, Python

Platforms

Ubuntu, Docker

Storage

MongoDB, Google Cloud

Other

Random Forests, Data Science, Machine Learning, Computer Vision, Retrieval-augmented Generation (RAG), Artificial Intelligence (AI), Vector Databases, Natural Language Processing (NLP), GPU Computing, Computer Vision Algorithms, Deep Learning, Neural Networks, Long Short-term Memory (LSTM), Recurrent Neural Networks (RNNs), FastAPI, APIs, Big Data, Generative Pre-trained Transformers (GPT), Large Language Models (LLMs), Gemini, OpenAI, LangChain, Prompt Engineering, Speech Recognition

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring