Sanket Gupta, Data Analysis Developer in New York, NY, United States
Sanket Gupta

Data Analysis Developer in New York, NY, United States

Member since January 28, 2019
Sanket has worked on several high impact data analysis, data science, statistics projects. He is proficient at statistical thinking and A/B testing techniques. He likes to work with business impact and product needs in mind. He is also a thought leader and has a popular blog and a podcast. He is skilled in Python, Pandas, SQL, AWS, Vue.js, building APIs, and data science architectures.
Sanket is now available for hire

Portfolio

Experience

  • Python, 5 years
  • Data Science, 4 years
  • SQL, 4 years
  • Data Analysis, 4 years
  • Flask, 4 years
  • Pandas, 4 years
  • Machine Learning, 4 years

Location

New York, NY, United States

Availability

Part-time

Preferred Environment

MacOS, Github, PyCharm, Jupyter

The most amazing...

...project I have built is a calorie tracker app that enabled nutritional living. It was implemented using Vue, Flask, Rest APIs and data analysis techniques.

Employment

  • Data Scientist

    2016 - PRESENT
    Y Combinator Startup
    • Built a hierarchical multi-class text classification system using product descriptions- this system improved categorization and resulted in higher conversions.
    • Built a Python Flask web app to get training data for search relevance- this system helped improve product ranking and click-through rates.
    • Developed a recommendation engine to find alternate products which are cheaper and better.
    • Built an anomaly detection system to identify anomalies in pricing data using statistical methods.
    • Built full stack web apps with Vue, Golang, Python, HTML and CSS.
    • Built information retrieval systems in search to improve precision and recall.
    Technologies: Data Analysis, Python, Pandas, SQL, Scikit-learn, Golang, Vue.js, Statistics
  • Design Engineer

    2011 - 2014
    Marvell Technology Group
    • Created multiple data analysis tooling that analyzed data of circuits and systems.
    • Built systems to predict performance of different circuits and systems.
    • Used statistical thinking to analyze system failures and implement ideas on how to fix them.
    • Presented design ideas to large audiences.
    • Developed skills for product thinking and thinking about needs of large user base.
    Technologies: Custom Data Analysis and Data Science Tools
  • Financial Data Analyst Intern

    2010 - 2010
    Credit Suisse
    • Analyzed large finance data for stock markets and dividend performance.
    • Supported some of the large Credit Suisse customers in their portfolio performance reporting.
    Technologies: Data Analysis, Excel, VBA
  • Marketing Data Analyst Intern

    2009 - 2010
    Exxon Mobil
    • Analyzed marketing data of Exxon Mobil division and supported the team in making business decisions.
    • Presented to management about how Exxon Mobil can direct and target customers.
    Technologies: C++, VBA

Experience

  • Large Data Analysis Projects (Development)

    Used SQL and Python Pandas to analyze large datasets such as traffic police data and TED talks data.

  • Host of The Data Life Podcast (Other amazing things)
    https://podcasts.apple.com/us/podcast/the-data-life-podcast/id1453716761

    Host of a data science podcast and am able to be a thought leader in front of international audiences. Have talked about natural language processing, A/B testing, statistics, machine learning tips, text classifiers, data analysis, and more.

  • Calorie Tracker Single Page Responsive Web App (Development)

    Built a calorie tracker web app for nutritional living that was fully implemented as a single page web app using SQL, Python, Flask, and Vue.JS using REST APIs to power the front-end.

  • Mining Twitter Data for Sentiment Analysis of Events (Development)
    https://towardsdatascience.com/mining-live-twitter-data-for-sentiment-analysis-of-events-d69aa2d136a1

    Built a sentiment analysis system to detect the live sentiment of events as they unfold. Used Streaming API of Twitter to load data in Python, save it into an SQLite database and used Pandas and Matplotlib to plot the sentiment.

  • Machine Learning Language Classifier from Written Scripts (Development)
    https://github.com/sanketg10/language-identifier-nlp

    Built a multi-class machine learning classifier to predict the language of a written script. This classifier can detect Chinese, Korean, English, Italian, Hindi, and up to 56 different languages. This system has an average F-1 score of 90%.

    It uses a cascade of linear classifier followed by a neural network. The first stage is used to detect if the language has a Roman script or not - this determines the character n-grams that the system would build for training. A linear classifier is followed by a neural network that can detect the exact script from a written language based on features from the first stage.

  • Recommendation Engine Using Collaborative Filtering and SVD (Development)
    https://github.com/sanketg10/the-data-life-podcast/blob/master/Overview%20of%20Recommendation%20Engines.ipynb

    Built a recommendation engine using collaborative filtering to recommend different movies based on users' tastes. Used movie ratings by different users to build vectors to calculate cosine similarity between different items. I also used SVD to build a lower-dimensional representation of the user-movie matrix to help recommend movies in a faster way.

  • Image Classification System using CNNs (Development)

    Built an image classification system using convolutional neural networks to detect CIFAR-10 images as well as SVHN images. Built the entire system in Keras and Python.

  • Video Course on Fundamentals of Data Science (Other amazing things)
    https://www.packtpub.com/big-data-and-business-intelligence/hands-fundamentals-data-science-go-video

    Built a video course on data science concepts, including building a tweet bot, analyzing movie reviews, building a movie review system using collaborative filtering and time series analysis.

Skills

  • Languages

    Python, SQL, Go
  • Frameworks

    Flask
  • Libraries/APIs

    Pandas, Keras, NLTK, REST APIs, Scikit-learn, Vue.js, NumPy
  • Tools

    PyCharm
  • Paradigms

    Data Science, Object-oriented Programming (OOP)
  • Platforms

    Jupyter Notebook, MacOS, Amazon Web Services (AWS), Linux
  • Storage

    PostgreSQL, SQLite, MySQL, Elasticsearch
  • Other

    Machine Learning, Deep Learning, Recurrent Neural Networks, Data Analysis, Data Mining, Artificial Intelligence (AI), Convolutional Neural Networks

Education

  • Master's degree in Engineering
    2014 - 2015
    Columbia University - New York
  • Bachelor's degree in Engineering
    2007 - 2011
    Nanyang Technological University - Singapore
Certifications
  • Natural Language Processing Certificate
    APRIL 2019 - PRESENT
    Udacity
  • Deep Learning Certificate
    FEBRUARY 2018 - PRESENT
    Coursera

To view more profiles

Join Toptal
I really like this profile
Share it with others