Sanket Gupta, Developer in New York, NY, United States
Sanket is available for hire
Hire Sanket

Sanket Gupta

Verified Expert  in Engineering

Data Scientist and Developer

Location
New York, NY, United States
Toptal Member Since
June 7, 2019

Sanket has worked on several high impact data science and machine learning projects. He likes to work with business impact and product needs in mind. He is also a thought leader and has a popular blog and a podcast. He is skilled in Python, Pandas, SQL, AWS, Keras, building APIs, and deploying machine learning models to the cloud. He is also proficient at statistical thinking and A/B testing techniques. He is a Certified AWS Developer.

Portfolio

Y Combinator Company
Statistics, Vue, GoLand, Go, Scikit-learn, SQL, Pandas, Python, Data Analysis
Credit Suisse
Visual Basic for Applications (VBA), Microsoft Excel, Data Analysis
Exxon Mobil
Visual Basic for Applications (VBA), C++

Experience

Availability

Part-time

Preferred Environment

Jupyter, PyCharm, GitHub, MacOS

The most amazing...

...project I've built was when I helped large fashion retailers to build machine learning models for accurate size recommendations for customers shopping online.

Work Experience

Data Scientist

2016 - PRESENT
Y Combinator Company
  • Created a spelling correction system in search experience which resulted in $800k incremental annual revenue.
  • Implemented a machine learning system to automatically categorize 3M+ products based on their descriptions.
  • Built an anomaly detection system to identify anomalies in pricing data using statistical methods.
  • Used features like spelling correction, aggregations, filtering, matching, and other advanced search capabilities of ElasticSearch like edge grams. Helped in building the best capabilities out of the system and suggest search improvement functionalities.
  • Developed category intent prediction algorithm.
  • Cultivated an A/B Testing practice including use of hypothesis.
  • Analyzed user activity data including search logs and click data to build analytics tooling.
  • Developed a recommendation engine to find alternate products that are cheaper and better.
  • Built a Python Flask web app to get training data for search relevance- this system helped improve product ranking and click-through rates.
  • Built systems for marketing analytics including cohort analysis.
Technologies: Statistics, Vue, GoLand, Go, Scikit-learn, SQL, Pandas, Python, Data Analysis

Design Engineer

2011 - 2014
Marvell Technology Group
  • Created multiple data analysis tooling that analyzed data of circuits and systems.
  • Built systems to predict performance of different circuits and systems.
  • Used statistical thinking to analyze system failures and implement ideas on how to fix them.
  • Presented design ideas to large audiences.
  • Developed skills for product thinking and thinking about needs of large user base.

Financial Data Analyst Intern

2010 - 2010
Credit Suisse
  • Analyzed large finance data for stock markets and dividend performance.
  • Supported some of the large Credit Suisse customers in their portfolio performance reporting.
Technologies: Visual Basic for Applications (VBA), Microsoft Excel, Data Analysis

Marketing Data Analyst Intern

2009 - 2010
Exxon Mobil
  • Analyzed marketing data of Exxon Mobil division and supported the team in making business decisions.
  • Presented to management about how Exxon Mobil can direct and target customers.
Technologies: Visual Basic for Applications (VBA), C++

Large Data Analysis Projects

Used SQL and Python Pandas to analyze large datasets such as traffic police data and TED talks data.

Host of The Data Life Podcast

https://podcasts.apple.com/us/podcast/the-data-life-podcast/id1453716761
Host of a data science podcast and am able to be a thought leader in front of international audiences. Have talked about natural language processing, A/B testing, statistics, machine learning tips, text classifiers, data analysis, and more.

Calorie Tracker Single Page Responsive Web App

Built a calorie tracker web app for nutritional living that was fully implemented as a single page web app using SQL, Python, Flask, and Vue.JS using REST APIs to power the front-end.

Mining Twitter Data for Sentiment Analysis of Events

Built a sentiment analysis system to detect the live sentiment of events as they unfold. Used Streaming API of Twitter to load data in Python, save it into an SQLite database and used Pandas and Matplotlib to plot the sentiment.

Machine Learning Language Classifier from Written Scripts

https://github.com/sanketg10/language-identifier-nlp
Built a multi-class machine learning classifier to predict the language of a written script. This classifier can detect Chinese, Korean, English, Italian, Hindi, and up to 56 different languages. This system has an average F-1 score of 90%.

It uses a cascade of linear classifier followed by a neural network. The first stage is used to detect if the language has a Roman script or not - this determines the character n-grams that the system would build for training. A linear classifier is followed by a neural network that can detect the exact script from a written language based on features from the first stage.

Recommendation Engine Using Collaborative Filtering and SVD

Built a recommendation engine using collaborative filtering to recommend different movies based on users' tastes. Used movie ratings by different users to build vectors to calculate cosine similarity between different items. I also used SVD to build a lower-dimensional representation of the user-movie matrix to help recommend movies in a faster way.

Image Classification System using CNNs

Built an image classification system using convolutional neural networks to detect CIFAR-10 images as well as SVHN images. Built the entire system in Keras and Python.

Video Course on Fundamentals of Data Science

Built a video course on data science concepts, including building a tweet bot, analyzing movie reviews, building a movie review system using collaborative filtering and time series analysis.

Languages

Python, SQL, Go, GraphQL, Visual Basic for Applications (VBA), C++

Frameworks

Flask

Libraries/APIs

Pandas, Keras, Natural Language Toolkit (NLTK), REST APIs, Scikit-learn, Vue, NumPy, TensorFlow, React

Tools

PyCharm, Amazon SageMaker, ELK (Elastic Stack), GitHub, Jupyter, GoLand, Microsoft Excel, Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Container Registry (ECR)

Paradigms

Data Science, Object-oriented Programming (OOP)

Platforms

Jupyter Notebook, MacOS, Amazon Web Services (AWS), Linux, Amazon EC2, AWS Lambda

Storage

PostgreSQL, SQLite, MySQL, Amazon S3 (AWS S3), Elasticsearch

Other

Statistics, Machine Learning, Deep Learning, Recurrent Neural Networks (RNNs), Data Analysis, Data Mining, Artificial Intelligence (AI), Convolutional Neural Networks (CNN), Amazon Comprehend, Amazon API Gateway

2014 - 2015

Master's Degree in Engineering

Columbia University - New York

2007 - 2011

Bachelor's Degree in Engineering

Nanyang Technological University - Singapore

APRIL 2020 - APRIL 2023

AWS Certified Cloud Practitioner

Amazon

APRIL 2019 - PRESENT

Natural Language Processing Certificate

Udacity

FEBRUARY 2018 - PRESENT

Deep Learning Certificate

Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring