Shashank Gupta, Developer in New York, NY, United States
Shashank is available for hire
Hire Shashank

Shashank Gupta

Verified Expert  in Engineering

Data Scientist and Developer

New York, NY, United States

Toptal member since September 30, 2024

Bio

Shashank is a senior data scientist with five years of experience developing data-driven solutions across the oil and gas, hospitality, pharma, and healthcare industries. He is highly skilled in Python, R, and SQL and has a strong foundation in machine learning and data mining. Shashank holds a master's degree from Rutgers University and a bachelor's from the Indian Institute of Technology, Kanpur.

Portfolio

Sanofi
Python 3, SQL, Azure Databricks, MLflow, PySpark, Data Science...
LTIMindtree
Python 3, Requirements Analysis, Machine Learning, Predictive Maintenance...

Experience

  • Machine Learning - 6 years
  • Data Science - 5 years
  • Jupyter Notebook - 5 years
  • Git - 5 years
  • Python 3 - 5 years
  • Azure Databricks - 4 years
  • Generative Artificial Intelligence (GenAI) - 2 years
  • Open-source LLMs - 2 years

Availability

Part-time

Preferred Environment

Open-source LLMs, Python 3, R, SQL, PostgreSQL, Rust, Azure Databricks, Power BI Desktop, Git, Jupyter Notebook

The most amazing...

...things I've achieved are ranking in the top 5% of data scientists on the Kaggle platform and ranking within the top 1% in the IIT JEE Advanced exam.

Work Experience

Data Scientist II

2023 - 2024
Sanofi
  • Reduced root mean squared errors (RMSE) by 25% and manual modeling time by 16-fold by developing and deploying a scalable AutoML app in bioprocess manufacturing using Python and Streamlit—significantly lowering operational costs.
  • Boosted product yield KPI by 15% and achieved scalability across 2 to 10,000-liter bioreactors by streamlining manufacturing with Scikit-learn-based cross-scale ML models—reducing scale-to-scale variability.
  • Increased processing efficiency by 40% and enhanced data-driven decision-making by designing and developing data pipelines using Azure: Databricks, Data Lake Storage, and Data Factory; Python; and SQL to generate drug trial analytics.
Technologies: Python 3, SQL, Azure Databricks, MLflow, PySpark, Data Science, Machine Learning, Chemometrics, Bayesian Statistics, Open-source LLMs, Jupyter Notebook, Generative Artificial Intelligence (GenAI), Git, Machine Learning Operations (MLOps), R Programming, Azure Data Lake Storage

Senior Data Scientist

2019 - 2022
LTIMindtree
  • Reduced maintenance costs by 40% and improved the mean time between failures (MTBF) KPI by 20% by designing and deploying an ML-based predictive maintenance model that uses logistic regression to predict steam generator failures.
  • Teamed with three members to devise a scalable real-time pump health monitoring solution to plan the pumps' preventive maintenance, leveraging R programming for remaining useful life (RUL) modeling and Power BI for visualization and dashboarding.
  • Developed a data solution to automate the process of equipment name extraction from industrial CAD drawings, employing PyTorch, Python-Tesseract OCR, and CV2 packages, eliminating manual interventions.
  • Managed a team of three graduate engineer trainees and helped set goals and objectives, providing feedback and support.
Technologies: Python 3, Requirements Analysis, Machine Learning, Predictive Maintenance, Predictive Modeling, MLflow, SQL, Jupyter Notebook, R, Git, Machine Learning Operations (MLOps), R Programming, Azure Databricks, Data Science

Experience

NASA Turbojet Engine Failure Prediction

https://github.com/Sha661nk/NASA-Jet-Engine-Failure-Prediction
A project focusing on reducing unplanned equipment downtimes and associated maintenance costs by leveraging predictive maintenance.

Ensuring optimal performance and avoiding unexpected failures are essential in critical applications like NASA Turbojets. This project utilizes historical engine data to develop a predictive model that can forecast potential failures, enabling timely maintenance and increasing operational reliability.

Document Query Bot

https://github.com/Sha661nk/DocQueryBot
A Streamlit-based web application that allows users to engage in conversational interactions with the content of uploaded PDF documents.

This tool simplifies document review by enabling users to ask questions and receive direct responses based on the information within the PDFs. The application leverages retrieval-augmented generation and generative AI models to generate accurate responses using prior information stored in vector databases, ensuring precise, contextually relevant answers drawn from the uploaded documents.

KYC Automation System

https://github.com/Sha661nk/KYC-Automation
A system designed to streamline the know your customer (KYC) validation process by automating key aspects of customer onboarding.

This system allows users to submit their personal details, photographs, and Aadhaar card scans through a user-friendly web form. It uses facial recognition to match the user's photo with the Aadhaar card and employs Tesseract OCR to extract and validate information from the card against the Unique Identification Authority of India (UIDAI) database. The system ensures that customers are onboarded only after all data points are successfully verified, reducing manual intervention and improving accuracy and efficiency.

Education

2022 - 2023

Master's Degree in Business Analytics

Rutgers University - New Brunswick, NJ, USA

2015 - 2019

Bachelor's Degree in Electrical Engineering

Indian Institute of Technology Kanpur - Kanpur, India

Skills

Libraries/APIs

PySpark, PyTorch

Tools

Git, Power BI Desktop, Postman

Languages

Python 3, R, SQL, Rust, C++

Platforms

Jupyter Notebook, Azure Data Lake Storage

Storage

PostgreSQL

Frameworks

Streamlit

Paradigms

Requirements Analysis

Other

Open-source LLMs, Azure Databricks, Data Science, Machine Learning, Generative Artificial Intelligence (GenAI), Machine Learning Operations (MLOps), Predictive Maintenance, Neural Networks, Retrieval-augmented Generation (RAG), AIOps, R Programming, MLflow, Chemometrics, Bayesian Statistics, Predictive Modeling, Optical Character Recognition (OCR)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring