Data Scientist | Machine Learning Engineer Contractor2019 - PRESENTParabolica Labs
Technologies: XGBoost, NLTK, SpaCy, Dask, Databricks, Hadoop, Scikit-learn, Pandas, NumPy, SQL, Cython, Python
- Provided machine learning and data science solutions with a specialization in natural language processing (NLP), time series analysis problems, applications of machine learning to sensor data, deep learning models, and Bayesian statistical models.
- Developed a production-grade conversational AI/chatbot for a Fortune 500 healthcare company (Rasa, spaCy) that allowed users to interact with their account and plans features through conversation.
- Performed in-depth analysis (SpaCy, NLTK, Textacy) and created data visualizations on large-scale text datasets to extract semantics, keywords, phrases, intents, and other linguistic features for product development and market exploration.
- Developed production models for classifying cohorts of users (Cython, Databricks, SciPy, Scikit-learn, XGBoost) using matrix factorization and prior behavioral data.
- Developed distributed data pipelines for processing and transforming large sets of financial time series data using Cython, C++, Dask, and Parquet.
Machine Learning Engineer2018 - 2019Twosense, Inc.
Technologies: Amazon Web Services (AWS), Git, AWS, SciPy, Scikit-learn, Pandas, NumPy, SQL, Cython, Python
- Researched, developed, and deployed a suite of machine learning models (NumPy, SciPy, scikit-learn, XGBoost, Cython) that authenticated users based on behavioral biometrics collected from sensors on phones and computers.
- Wrote large-scale data processing scripts that consumed real-time biometric data for model training and testing using NumPy, AWS Redshift, and Pandas.
- Produced Jupyter notebooks visualizing model validation metrics, data transformations, and critical data analysis.
- Guided best practices, led technical sessions, collaborated on project specifications, and wrote significant amounts of research documentation. I was hired as the first member of the machine learning and data science team.
- Wrote suites of unit tests for data processing, feature, extraction, and model validation.
- Completed feature extraction tasks from large, disparate datasets for model development.
Senior Data Scientist2017 - 2018Skedaddle
Technologies: SciPy, Scikit-learn, NumPy, Pandas, Snowflake, SQL, Flask, Python
- Developed a production-grade API for pricing algorithms using NumPy, scikit-learn, Lambda, and API Gateway.
- Built and maintained complete data pipeline platform reading public APIs using EC2, Lambda, and Snowflake, for creating time series models predicting product demand.
- Wrote a serverless web app displaying data visualizations and real-time monitors of key metrics using Flask, Zappa, and D3.js.
- Provided ad hoc analysis for all domains within the organization, and guided other team members in their analyses.
Senior Data Scientist2016 - 2017Whoop, Inc.
Technologies: Amazon Web Services (AWS), SciPy, NumPy, Pandas, Scikit-learn, Redshift, AWS, TensorFlow, PyTorch, SQL, Python
- Led team code reviews and collaborated on the direction of quarterly team goals and projects.
- Researched, developed, and deployed novel, deep, convolutional neural networks for classifying activities based on multidimensional sensor data using PyTorch.
- Developed and maintained data pipelines consuming from Redshift and PostgreSQL databases.
- Built a real-time activity detection algorithm for biometric time series data using NumPy and SciPy.
- Researched and developed convolutional autoencoder models for compressive sensing using PyTorch.
- Wrote a real-time algorithm to detect how the user wears a sensor based on accelerometer and biometric profiles using NumPy, SciPy, and scikit-learn.
Associate Data Scientist2014 - 2016Cogo Labs
Technologies: Scikit-learn, NumPy, SciPy, Pandas, Keras, TensorFlow, SQL, Python
- Implemented a Python library for A/B testing with Bayesian Statistics and other measurement tools.
- Developed statistical methods to mine URLs from user clickstream data (Presto) and developed neural networks (Tensorflow, Keras) to model user level characteristics from the browsing history.
- Applied NLP (scikit-learn, NumPy) techniques to cluster ad campaigns based on content similarity.
- Wrote algorithms (NumPy, SciPy, Pandas, scikit-learn) for user-campaign selection and developed models based on clickstream data, market intent, and demographics.
- Built tools to monitor critical metrics and score production models daily using Python, MySQL, PostgreSQL, Presto, and MapReduce.
Full-stack Developer2013 - 2014Microsoft Project Users Group
- Maintained the existing infrastructure and developed new back-end features and APIs on the legacy LAMP (PHP) stack.
- Wrote Python scripts to automate data processing and reporting from third-party APIs and internal database sources.