Verified Expert in Engineering
Data Scientist and AI Developer
Dev is a versatile data scientist and developer who specializes in building predictive AI models that are exceptionally accurate. He focuses on using statistics, deep learning, and data engineering to strategize and optimize the role of data within organizations. Dev's expertise and hands-on experience are backed by a master's degree in applied analytics from Columbia University in New York City, where he also teaches almost all facets of data science at the graduate level.
Teams, Linux, PyCharm, Visual Studio Code (VS Code), Slack App, Jupyter Notebook, Slack, MacOS
The most amazing...
...project I've been proud to put my name on was working with Infosys and Stanford Labs to land on the global leaderboard of natural language understanding models.
- Instructed graduate students in programming, statistics, databases, front-end development, business intelligence tools, hypothesis testing, machine learning, and other analytical skills.
- Led and established a collaborative culture where each member of our four-person instructional staff is fully committed to the success of each student.
- Consistently achieved high student satisfaction scores (4.5+/5).
Artificial Intelligence Researcher
Insight Data Science
- Built an intelligent search product for textbooks that uses ALBERT, a lightweight deep learning model, to translate students' search queries into results 100x faster than traditional table-of-contents methods. I was the sole developer.
- Served the model and information retriever by building a containerized web app (textbookqa.com) in Docker and AWS.
- Delivered an MVP within the four-week deadline and presented the product to stakeholders.
Data Scientist (Capstone)
- Predicted the validity of paid surveys with an accuracy of around 76% by building a long short-term memory (LSTM)-based architecture to use survey recipients’ mouse movements to help identify and recoup unjust survey costs.
- Achieved a peer-reviewed publication for our team’s research on validating survey responses (arxiv.org/abs/2006.14054). Commercialization of the survey validation product is in progress.
- Worked within an Agile framework in a team of eight.
Machine Learning Intern
- Integrated a state-of-the-art NLP model (RoBERTa) with Stanford’s slicing functionalities to achieve top results on Stanford’s SuperGLUE, a leading NLP benchmark for evaluating general natural language understanding models.
- Placed as the first runner up out of 32 teams in the Annual InStep Hackathon, personalizing the user’s learning journey by implementing an innovative sequential recommender system for educational content.
- Detected fraudulent healthcare providers with an accuracy of 95% and recall of 90% by implementing a neural network architecture (PyTorch), outperforming the firm’s existing rule-based classifier by around 46%.
Data Science Intern
- Built machine learning models to use news with time-series data to classify future stock price performance with 61% accuracy.
- Developed a Python crawler to extract around 5,500 financial news articles on a weekly basis for 100 tickers.
- Performed sentiment analysis of stocks by cleaning raw data using Regex and utilizing rule-based financial lexicons.
Co-founder | Vice President
Ummid A Hope Foundation
- Raised $75,000+ to benefit abandoned girls in Udaipur, India, helping to build the core team and a global network of 1,000+ donors.
- Coordinated team meetings and the team technology stack to facilitate the organization's global outreach.
- Organized several local fundraising events to retain existing donors and attract new ones.
- Managed datasets with SQL, Excel, and Tableau to track KPIs, present dashboards, and discover actionable insights.
- Increased the average customer retention rate from 35% to 64% by leading a cross-functional, five-member team to develop web and kiosk applications for instantaneous customer-to-staff feedback.
- Implemented and trained 50+ staff members in using the latest tools for automation to enable digital reporting, cloud-based time tracking, and task management.
This repository attempts to solve the problem of performing question answering on large documents. This requires a two-part approach. In one part, ALBERT is trained on the Standford Question Answering Dataset (SQuAD) QA dataset. In the other, we fragment a textbook into multiple sections using a rule-based approach. We can then compare user question embeddings to the embeddings of the sections to find the most relevant section(s).
I am the sole contributor—from product conceptualization to deployment—and the repository is currently in an MVP state.
RoBERTa with Fast.aihttps://medium.com/analytics-vidhya/using-roberta-with-fastai-for-nlp-7ed3fed21f6c
I am the sole developer—from conceptualization to complete cross-integration—and the integrated model is available for use.
Survey Validation With Mouse Movementshttps://github.com/dachosen1/Dotin-Columbia-Capstone-Team-Alpha-
This project was built by a team of eight. I took ownership of building the complete pipeline for our LSTM approach, which yielded 80% accuracy and an F1 score of .76 on the validation set. The end deliverables are model weights that can be used locally to test predictions. Future goals for this project are to create an API for the LSTM model, which can be sent requests to identify false survey responses.
I am the sole contributor. The core development phase is complete and the next step is deployment.
Text Generator Web App
I am the sole contributor to this app. It is complete and intended to educate others on building complete text generation applications.
PyTorch, Pandas, Matplotlib, SQLAlchemy, Beautiful Soup, Node.js, React, Scikit-learn, Natural Language Toolkit (NLTK), LSTM, Fast.ai
Google Cloud Platform (GCP), Docker, Amazon Web Services (AWS)
Regular Expressions, Gunicorn, Version Control, Neural Networks, Transformers, BERT, Recurrent Neural Networks (RNN), Convolutional Neural Networks, Regression, Clustering, SVMs, Models, Model Tuning, Deep Learning, Natural Language Processing (NLP), Learning to Rank, Classification, Word Embedding, Natural Language Generation (NLG), Computer Vision, Computer Science, Business Management, Nonprofits, Teams, Consulting, Machine Learning, GPT, Generative Pre-trained Transformers (GPT)
Business Intelligence (BI), Data Science
Master's Degree in Applied Analytics
Columbia University - New York, NY, USA
Bachelor's Degree in Business Administration
University of Memphis - Memphis, TN, USA
SQL Aptitude Test (https://app.testdome.com/cert/6a938ba738ac4fd587aa1808cc2de863)
Python Aptitude Test (https://app.testdome.com/cert/98109584b10e44f68312e8114cdad0fd)
Introduction to Computer Science and Programming Using Python
Massachusetts Institute of Technology | via edX