Dev Sharma, Data Scientist and AI Developer in New York, NY, United States
Dev Sharma

Data Scientist and AI Developer in New York, NY, United States

Member since November 25, 2020
Dev is a versatile data scientist and developer who specializes in building predictive AI models that are exceptionally accurate. He focuses on using statistics, deep learning, and data engineering to strategize and optimize the role of data within organizations. Dev's expertise and hands-on experience are backed by a master's degree in applied analytics from Columbia University in New York City, where he also teaches almost all facets of data science at the graduate level.
Dev is now available for hire

Portfolio

  • Columbia University
    Natural Language Processing (NLP), Python, Deep Learning, Machine Learning...
  • Insight Data Science
    Amazon Web Services (AWS), Python, PyTorch, Scikit-learn, Word Embedding...
  • Dotin
    LSTM, Python, PyTorch, Classification, Neural Networks, Model Optimization...

Experience

Location

New York, NY, United States

Availability

Part-time

Preferred Environment

Teams, Linux, PyCharm, Visual Studio Code (VS Code), Slack App, Jupyter Notebook, Slack, MacOS

The most amazing...

...project I've been proud to put my name on was working with Infosys and Stanford Labs to land on the global leaderboard of natural language understanding models.

Employment

  • Instructor

    2020 - PRESENT
    Columbia University
    • Instructed graduate students in programming, statistics, databases, front-end development, business intelligence tools, hypothesis testing, machine learning, and other analytical skills.
    • Led and established a collaborative culture where each member of our four-person instructional staff is fully committed to the success of each student.
    • Consistently achieved high student satisfaction scores (4.5+/5).
    Technologies: Natural Language Processing (NLP), Python, Deep Learning, Machine Learning, Hypothesis Testing, Front-end Development, Databases, Statistics, Programming
  • Artificial Intelligence Researcher

    2020 - 2020
    Insight Data Science
    • Built an intelligent search product for textbooks that uses ALBERT, a lightweight deep learning model, to translate students' search queries into results 100x faster than traditional table-of-contents methods. I was the sole developer.
    • Served the model and information retriever by building a containerized web app (textbookqa.com) in Docker and AWS.
    • Delivered an MVP within the four-week deadline and presented the product to stakeholders.
    Technologies: Amazon Web Services (AWS), Python, PyTorch, Scikit-learn, Word Embedding, Neural Networks, Learning to Rank, Natural Language Processing (NLP), Deep Learning, Model Tuning, Model Optimization, Models, Machine Learning
  • Data Scientist (Capstone)

    2019 - 2020
    Dotin
    • Predicted the validity of paid surveys with an accuracy of around 76% by building a long short-term memory (LSTM)-based architecture to use survey recipients’ mouse movements to help identify and recoup unjust survey costs.
    • Achieved a peer-reviewed publication for our team’s research on validating survey responses (arxiv.org/abs/2006.14054). Commercialization of the survey validation product is in progress.
    • Worked within an Agile framework in a team of eight.
    Technologies: LSTM, Python, PyTorch, Classification, Neural Networks, Model Optimization, Models, Machine Learning, Consulting, Data Science
  • Machine Learning Intern

    2019 - 2019
    Infosys
    • Integrated a state-of-the-art NLP model (RoBERTa) with Stanford’s slicing functionalities to achieve top results on Stanford’s SuperGLUE, a leading NLP benchmark for evaluating general natural language understanding models.
    • Placed as the first runner up out of 32 teams in the Annual InStep Hackathon, personalizing the user’s learning journey by implementing an innovative sequential recommender system for educational content.
    • Detected fraudulent healthcare providers with an accuracy of 95% and recall of 90% by implementing a neural network architecture (PyTorch), outperforming the firm’s existing rule-based classifier by around 46%.
    Technologies: Transformers, Python, NLTK, PyTorch, Word Embedding, Classification, Neural Networks, Natural Language Generation (NLG), Natural Language Processing (NLP), Deep Learning, Models, Machine Learning
  • Data Science Intern

    2018 - 2019
    Byteflow Dynamics
    • Built machine learning models to use news with time-series data to classify future stock price performance with 61% accuracy.
    • Developed a Python crawler to extract around 5,500 financial news articles on a weekly basis for 100 tickers.
    • Performed sentiment analysis of stocks by cleaning raw data using Regex and utilizing rule-based financial lexicons.
    Technologies: NLTK, PyTorch, Neural Networks, Data Science, Consulting, Regex, Python, Models, Machine Learning
  • Co-founder | Vice President

    2016 - 2018
    Ummid A Hope Foundation
    • Raised $75,000+ to benefit abandoned girls in Udaipur, India, helping to build the core team and a global network of 1,000+ donors.
    • Coordinated team meetings and the team technology stack to facilitate the organization's global outreach.
    • Organized several local fundraising events to retain existing donors and attract new ones.
    Technologies: Nonprofits, Business Management
  • Business Analyst

    2014 - 2018
    Zodiac21 Solutions
    • Managed datasets with SQL, Excel, and Tableau to track KPIs, present dashboards, and discover actionable insights.
    • Increased the average customer retention rate from 35% to 64% by leading a cross-functional, five-member team to develop web and kiosk applications for instantaneous customer-to-staff feedback.
    • Implemented and trained 50+ staff members in using the latest tools for automation to enable digital reporting, cloud-based time tracking, and task management.
    Technologies: Tableau, Microsoft Excel, SQL

Experience

  • AskAi
    https://github.com/devkosal/askai

    A complete question answering application for extracting answers from textbooks. Modern information retrieval techniques are successful in retrieving information from smaller documents. However, when it comes to larger documents, current options fall short.

    This repository attempts to solve the problem of performing question answering on large documents. This requires a two-part approach. In one part, ALBERT is trained on the Standford Question Answering Dataset (SQuAD) QA dataset. In the other, we fragment a textbook into multiple sections using a rule-based approach. We can then compare user question embeddings to the embeddings of the sections to find the most relevant section(s).

    I am the sole contributor—from product conceptualization to deployment—and the repository is currently in an MVP state.

  • RoBERTa with Fast.ai
    https://medium.com/analytics-vidhya/using-roberta-with-fastai-for-nlp-7ed3fed21f6c

    Implementing the current state-of-the-art NLP model in fast.ai. The concept of transfer learning is still somewhat new to NLP and one that is growing at a very rapid pace. A model such as RoBERTa performs incredibly well on the SuperGLUE benchmark across several varying NLP tasks. This project facilitates the usage of RoBERTa with fast.ai.

    I am the sole developer—from conceptualization to complete cross-integration—and the integrated model is available for use.

  • Survey Validation With Mouse Movements
    https://github.com/dachosen1/Dotin-Columbia-Capstone-Team-Alpha-

    Thirty percent of users fill out psychometric surveys falsely. This is a problem for organizations that expect valid survey results for their survey expense. This project creates multiple models to find an approach that outperforms current validation methods. In the end, the aim is to reduce survey costs by at least 30%.

    This project was built by a team of eight. I took ownership of building the complete pipeline for our LSTM approach, which yielded 80% accuracy and an F1 score of .76 on the validation set. The end deliverables are model weights that can be used locally to test predictions. Future goals for this project are to create an API for the LSTM model, which can be sent requests to identify false survey responses.

  • Fight Detection
    https://github.com/devkosal/fight_detection

    A deep learning computer vision model to detect fights in videos. By using five to ten frames from a two-second sample of frames, we extract features using a residual neural network (ResNet) model and then passing the extracted features to an LSTM (trained from scratch) to classify whether a fight is occurring in a video. My model is able to predict whether a fight is occurring with 90%+ accuracy on a balanced dataset. The accuracy is 71% on surveillance camera footage.

    I am the sole contributor. The core development phase is complete and the next step is deployment.

  • Text Generator Web App

    A text generator web app built in under 50 lines of Python, using PyTorch. Within PyTorch, we use the transformers library to import the pre-trained OpenGPT-2 model. Secondly, the PyViz Panel library is used to fully create a web app, using just Python.

    I am the sole contributor to this app. It is complete and intended to educate others on building complete text generation applications.

Skills

  • Languages

    Python, JavaScript, R, Visual Basic for Applications (VBA), SQL, HTML
  • Frameworks

    Selenium
  • Libraries/APIs

    PyTorch, Pandas, Matplotlib, SQLAlchemy, Beautiful Soup, Node.js, React, Scikit-learn, NLTK, LSTM, Fast.ai
  • Tools

    NGINX, Tableau
  • Platforms

    Google Cloud Platform (GCP), Docker, Amazon Web Services (AWS)
  • Other

    Regular Expressions, Gunicorn, Version Control, Neural Networks, Transformers, BERT, Recurrent Neural Networks (RNN), Convolutional Neural Networks, Regression, Clustering, SVMs, Models, Model Optimization, Model Tuning, Deep Learning, Natural Language Processing (NLP), Learning to Rank, Classification, Word Embedding, Natural Language Generation (NLG), Computer Vision, Computer Science, Business Management, Nonprofits, Teams, Consulting, Machine Learning
  • Paradigms

    Business Intelligence (BI), Data Science

Education

  • Master's Degree in Applied Analytics
    2018 - 2020
    Columbia University - New York, NY, USA
  • Bachelor's Degree in Business Administration
    2009 - 2013
    University of Memphis - Memphis, TN, USA

Certifications

  • SQL Aptitude Test (https://app.testdome.com/cert/6a938ba738ac4fd587aa1808cc2de863)
    SEPTEMBER 2020 - PRESENT
    TestDome
  • Python Aptitude Test (https://app.testdome.com/cert/98109584b10e44f68312e8114cdad0fd)
    SEPTEMBER 2020 - PRESENT
    TestDome
  • Introduction to Computer Science and Programming Using Python
    AUGUST 2018 - PRESENT
    Massachusetts Institute of Technology | via edX

To view more profiles

Join Toptal
Share it with others