Data Scientist | Machine Learning Engineer Contractor2019 - PRESENTParabolica Labs
Technologies: XGBoost, NLTK, SpaCy, Dask, Databricks, Hadoop, Scikit-learn, Pandas, NumPy, SQL, Cython, Python
- Provided machine learning and data science solutions with a specialization in natural language processing (NLP), time series analysis problems, applications of machine learning to sensor data, deep learning models, and Bayesian statistical models.
- Developed a production grade conversational AI/chatbot for a Fortune 500 healthcare company (Rasa, spaCy) that allowed users to interact with their account and plans features through conversation.
- Performed in depth analysis (spaCy, NLTK, textacy) and created data visualizations on large scale text datasets to extract semantics, keywords, phrases, intents, and other linguistic features for product development and market exploration.
- Developed production models for classifying cohorts of users (Cython, Databricks, SciPy, scikit-learn, XGBoost) using matrix factorization and prior behavioral data.
- Developed distributed data pipelines for processing and transforming large sets of financial time series data using Cython, Dask, and Parquet.
Machine Learning Engineer2018 - 2019Twosense, Inc.
Technologies: Amazon Web Services (AWS), Git, AWS, SciPy, Scikit-learn, Pandas, NumPy, SQL, Cython, Python
- Researched, developed, and deployed a suite of machine learning models (NumPy, SciPy, scikit-learn, XGBoost, Cython) that authenticated users based on behavioral biometrics collected from sensors on phones and computers.
- Wrote large-scale data processing scripts that consumed real-time biometric data for model training and testing using NumPy, AWS Redshift, and Pandas.
- Produced Jupyter notebooks visualizing model validation metrics, data transformations, and critical data analysis.
- Guided best practices, led technical sessions, collaborated on project specifications, and wrote significant amounts of research documentation. I was hired as the first member of the machine learning and data science team.
- Wrote suites of unit tests for data processing, feature, extraction, and model validation.
- Completed feature extraction tasks from large, disparate datasets for model development.
Senior Data Scientist2017 - 2018Skedaddle
Technologies: SciPy, Scikit-learn, NumPy, Pandas, Snowflake, SQL, Flask, Python
- Developed a production-grade API for pricing algorithms using NumPy, scikit-learn, Lambda, and API Gateway.
- Built and maintained complete data pipeline platform reading public APIs using EC2, Lambda, and Snowflake, for creating time series models predicting product demand.
- Wrote a serverless web app displaying data visualizations and real-time monitors of key metrics using Flask, Zappa, and D3.js.
- Provided ad hoc analysis for all domains within the organization, and guided other team members in their analyses.
Senior Data Scientist2016 - 2017Whoop, Inc.
Technologies: Amazon Web Services (AWS), SciPy, NumPy, Pandas, Scikit-learn, Redshift, AWS, TensorFlow, PyTorch, SQL, Python
- Led team code reviews and collaborated on the direction of quarterly team goals and projects.
- Researched, developed, and deployed novel, deep, convolutional neural networks for classifying activities based on multidimensional sensor data using PyTorch.
- Developed and maintained data pipelines consuming from Redshift and PostgreSQL databases.
- Built a real-time activity detection algorithm for biometric time series data using NumPy and SciPy.
- Researched and developed convolutional autoencoder models for compressive sensing using PyTorch.
- Wrote a real-time algorithm to detect how the user wears a sensor based on accelerometer and biometric profiles using NumPy, SciPy, and scikit-learn.
Associate Data Scientist2014 - 2016Cogo Labs
Technologies: Scikit-learn, NumPy, SciPy, Pandas, Keras, TensorFlow, SQL, Python
- Implemented a Python library for A/B testing with Bayesian Statistics and other measurement tools.
- Developed statistical methods to mine URLs from user clickstream data (Presto) and developed neural networks (Tensorflow, Keras) to model user level characteristics from the browsing history.
- Applied NLP (scikit-learn, NumPy) techniques to cluster ad campaigns based on content similarity.
- Wrote algorithms (NumPy, SciPy, Pandas, scikit-learn) for user-campaign selection and developed models based on clickstream data, market intent, and demographics.
- Built tools to monitor critical metrics and score production models daily using Python, MySQL, PostgreSQL, Presto, and MapReduce.
Full-stack Developer2013 - 2014Microsoft Project Users Group
- Maintained the existing infrastructure and developed new back-end features and APIs on the legacy LAMP (PHP) stack.
- Wrote Python scripts to automate data processing and reporting from third-party APIs and internal database sources.