
Shashank Gupta
Verified Expert in Engineering
Data Scientist and Developer
New York, NY, United States
Toptal member since September 30, 2024
Shashank is a senior data scientist with five years of experience developing data-driven solutions across the oil and gas, hospitality, pharma, and healthcare industries. He is highly skilled in Python, R, and SQL and has a strong foundation in machine learning and data mining. Shashank holds a master's degree from Rutgers University and a bachelor's from the Indian Institute of Technology, Kanpur.
Portfolio
Experience
- Machine Learning - 6 years
- Data Science - 5 years
- Jupyter Notebook - 5 years
- Git - 5 years
- Python 3 - 5 years
- Azure Databricks - 4 years
- Generative Artificial Intelligence (GenAI) - 2 years
- Open-source LLMs - 2 years
Availability
Preferred Environment
Open-source LLMs, Python 3, R, SQL, PostgreSQL, Rust, Azure Databricks, Power BI Desktop, Git, Jupyter Notebook
The most amazing...
...things I've achieved are ranking in the top 5% of data scientists on the Kaggle platform and ranking within the top 1% in the IIT JEE Advanced exam.
Work Experience
Data Scientist II
Sanofi
- Reduced root mean squared errors (RMSE) by 25% and manual modeling time by 16-fold by developing and deploying a scalable AutoML app in bioprocess manufacturing using Python and Streamlit—significantly lowering operational costs.
- Boosted product yield KPI by 15% and achieved scalability across 2 to 10,000-liter bioreactors by streamlining manufacturing with Scikit-learn-based cross-scale ML models—reducing scale-to-scale variability.
- Increased processing efficiency by 40% and enhanced data-driven decision-making by designing and developing data pipelines using Azure: Databricks, Data Lake Storage, and Data Factory; Python; and SQL to generate drug trial analytics.
Senior Data Scientist
LTIMindtree
- Reduced maintenance costs by 40% and improved the mean time between failures (MTBF) KPI by 20% by designing and deploying an ML-based predictive maintenance model that uses logistic regression to predict steam generator failures.
- Teamed with three members to devise a scalable real-time pump health monitoring solution to plan the pumps' preventive maintenance, leveraging R programming for remaining useful life (RUL) modeling and Power BI for visualization and dashboarding.
- Developed a data solution to automate the process of equipment name extraction from industrial CAD drawings, employing PyTorch, Python-Tesseract OCR, and CV2 packages, eliminating manual interventions.
- Managed a team of three graduate engineer trainees and helped set goals and objectives, providing feedback and support.
Experience
NASA Turbojet Engine Failure Prediction
https://github.com/Sha661nk/NASA-Jet-Engine-Failure-PredictionEnsuring optimal performance and avoiding unexpected failures are essential in critical applications like NASA Turbojets. This project utilizes historical engine data to develop a predictive model that can forecast potential failures, enabling timely maintenance and increasing operational reliability.
Document Query Bot
https://github.com/Sha661nk/DocQueryBotThis tool simplifies document review by enabling users to ask questions and receive direct responses based on the information within the PDFs. The application leverages retrieval-augmented generation and generative AI models to generate accurate responses using prior information stored in vector databases, ensuring precise, contextually relevant answers drawn from the uploaded documents.
KYC Automation System
https://github.com/Sha661nk/KYC-AutomationThis system allows users to submit their personal details, photographs, and Aadhaar card scans through a user-friendly web form. It uses facial recognition to match the user's photo with the Aadhaar card and employs Tesseract OCR to extract and validate information from the card against the Unique Identification Authority of India (UIDAI) database. The system ensures that customers are onboarded only after all data points are successfully verified, reducing manual intervention and improving accuracy and efficiency.
Education
Master's Degree in Business Analytics
Rutgers University - New Brunswick, NJ, USA
Bachelor's Degree in Electrical Engineering
Indian Institute of Technology Kanpur - Kanpur, India
Skills
Libraries/APIs
PySpark, PyTorch
Tools
Git, Power BI Desktop, Postman
Languages
Python 3, R, SQL, Rust, C++
Platforms
Jupyter Notebook, Azure Data Lake Storage
Storage
PostgreSQL
Frameworks
Streamlit
Paradigms
Requirements Analysis
Other
Open-source LLMs, Azure Databricks, Data Science, Machine Learning, Generative Artificial Intelligence (GenAI), Machine Learning Operations (MLOps), Predictive Maintenance, Neural Networks, Retrieval-augmented Generation (RAG), AIOps, R Programming, MLflow, Chemometrics, Bayesian Statistics, Predictive Modeling, Optical Character Recognition (OCR)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring