Sanket Gupta, Data Scientist and Developer in New York, NY, United States
Sanket Gupta

Data Scientist and Developer in New York, NY, United States

Member since January 28, 2019
Sanket has worked on several high impact data science and machine learning projects. He likes to work with business impact and product needs in mind. He is also a thought leader and has a popular blog and a podcast. He is skilled in Python, Pandas, SQL, AWS, Keras, building APIs, and deploying machine learning models to the cloud. He is also proficient at statistical thinking and A/B testing techniques. He is a Certified AWS Developer.
Sanket is now available for hire




New York, NY, United States



Preferred Environment

MacOS, Github, PyCharm, Jupyter

The most amazing...

...project I've built was when I helped large fashion retailers to build machine learning models for accurate size recommendations for customers shopping online.


  • Data Scientist

    2019 - PRESENT
    Bold Metrics (via Toptal)
    • Built machine learning models to help retailers in the fashion industry predict accurate size recommendations for online shoppers.
    • Built data analysis, visualization, modeling, training, and testing pipelines using Python, TensorFlow, Keras, Scikit-Learn, and Pandas.
    • Deployed and helped maintain models in the cloud using AWS SageMaker.
    • Used AWS S3, SageMaker, EC2, and CloudWatch with TensorFlow Serving for end to end model deployment at scale.
    Technologies: Python, TensorFlow, Keras, Scikit-Learn, Pandas
  • Data Scientist

    2016 - PRESENT
    Y Combinator Company
    • Created a spelling correction system in search experience which resulted in $800k incremental annual revenue.
    • Implemented a machine learning system to automatically categorize 3M+ products based on their descriptions.
    • Built an anomaly detection system to identify anomalies in pricing data using statistical methods.
    • Used features like spelling correction, aggregations, filtering, matching, and other advanced search capabilities of ElasticSearch like edge grams. Helped in building the best capabilities out of the system and suggest search improvement functionalities.
    • Developed category intent prediction algorithm.
    • Cultivated an A/B Testing practice including use of hypothesis.
    • Analyzed user activity data including search logs and click data to build analytics tooling.
    • Developed a recommendation engine to find alternate products that are cheaper and better.
    • Built a Python Flask web app to get training data for search relevance- this system helped improve product ranking and click-through rates.
    • Built systems for marketing analytics including cohort analysis.
    Technologies: Data Analysis, Python, Pandas, SQL, Scikit-learn, Golang, Vue.js, Statistics
  • Design Engineer

    2011 - 2014
    Marvell Technology Group
    • Created multiple data analysis tooling that analyzed data of circuits and systems.
    • Built systems to predict performance of different circuits and systems.
    • Used statistical thinking to analyze system failures and implement ideas on how to fix them.
    • Presented design ideas to large audiences.
    • Developed skills for product thinking and thinking about needs of large user base.
    Technologies: Custom Data Analysis and Data Science Tools
  • Financial Data Analyst Intern

    2010 - 2010
    Credit Suisse
    • Analyzed large finance data for stock markets and dividend performance.
    • Supported some of the large Credit Suisse customers in their portfolio performance reporting.
    Technologies: Data Analysis, Excel, VBA
  • Marketing Data Analyst Intern

    2009 - 2010
    Exxon Mobil
    • Analyzed marketing data of Exxon Mobil division and supported the team in making business decisions.
    • Presented to management about how Exxon Mobil can direct and target customers.
    Technologies: C++, VBA


  • Large Data Analysis Projects (Development)

    Used SQL and Python Pandas to analyze large datasets such as traffic police data and TED talks data.

  • Host of The Data Life Podcast (Other amazing things)

    Host of a data science podcast and am able to be a thought leader in front of international audiences. Have talked about natural language processing, A/B testing, statistics, machine learning tips, text classifiers, data analysis, and more.

  • Calorie Tracker Single Page Responsive Web App (Development)

    Built a calorie tracker web app for nutritional living that was fully implemented as a single page web app using SQL, Python, Flask, and Vue.JS using REST APIs to power the front-end.

  • Mining Twitter Data for Sentiment Analysis of Events (Development)

    Built a sentiment analysis system to detect the live sentiment of events as they unfold. Used Streaming API of Twitter to load data in Python, save it into an SQLite database and used Pandas and Matplotlib to plot the sentiment.

  • Machine Learning Language Classifier from Written Scripts (Development)

    Built a multi-class machine learning classifier to predict the language of a written script. This classifier can detect Chinese, Korean, English, Italian, Hindi, and up to 56 different languages. This system has an average F-1 score of 90%.

    It uses a cascade of linear classifier followed by a neural network. The first stage is used to detect if the language has a Roman script or not - this determines the character n-grams that the system would build for training. A linear classifier is followed by a neural network that can detect the exact script from a written language based on features from the first stage.

  • Recommendation Engine Using Collaborative Filtering and SVD (Development)

    Built a recommendation engine using collaborative filtering to recommend different movies based on users' tastes. Used movie ratings by different users to build vectors to calculate cosine similarity between different items. I also used SVD to build a lower-dimensional representation of the user-movie matrix to help recommend movies in a faster way.

  • Image Classification System using CNNs (Development)

    Built an image classification system using convolutional neural networks to detect CIFAR-10 images as well as SVHN images. Built the entire system in Keras and Python.

  • Video Course on Fundamentals of Data Science (Other amazing things)

    Built a video course on data science concepts, including building a tweet bot, analyzing movie reviews, building a movie review system using collaborative filtering and time series analysis.


  • Languages

    Python, SQL, Go, GraphQL
  • Frameworks

  • Libraries/APIs

    Pandas, Keras, NLTK, REST APIs, Scikit-learn, Vue.js, NumPy, React
  • Tools

    PyCharm, Amazon SageMaker, ELK (Elastic Stack)
  • Paradigms

    Data Science, Object-oriented Programming (OOP)
  • Platforms

    Jupyter Notebook, MacOS, Amazon Web Services (AWS), Linux, AWS EC2
  • Storage

    PostgreSQL, SQLite, MySQL, AWS S3, Elasticsearch
  • Other

    Machine Learning, Deep Learning, Recurrent Neural Networks, Data Analysis, Data Mining, Data Analyst, Artificial Intelligence (AI), Convolutional Neural Networks


  • Master's degree in Engineering
    2014 - 2015
    Columbia University - New York
  • Bachelor's degree in Engineering
    2007 - 2011
    Nanyang Technological University - Singapore


  • AWS Certified Cloud Practitioner
    APRIL 2020 - APRIL 2023
  • Natural Language Processing Certificate
    APRIL 2019 - PRESENT
  • Deep Learning Certificate

To view more profiles

Join Toptal
Share it with others