Lead Data Scientist2021 - 2021Logic20/20
Technologies: Time Series Analysis, Azure DevOps, Data Science, Luigi, Python, Computer Vision, Convolutional Neural Networks, Machine Learning, Amazon Web Services (AWS), Presentations, Pandas, Scikit-learn, PyTorch, SQL, Deep Learning, Keras, Technical Project Management, Docker, Project Management, Git, Continuous Integration (CI), Image Processing, Python 3, TensorFlow, Artificial Intelligence (AI), Databases, Data Analysis
- Led a new data science practice within San Diego Gas & Electric’s (SDG&E) asset management group.
- Served as the lead data scientist and ML engineer for Pacific Gas & Electric’s (PG&E) AI-assisted inspection team.
- Trained engineers, analysts, and data scientists in the full data science lifecycle, including project scoping, EDA, data pipelining, code testing, model training/validation, and deployment.
- Built the client's first ML app at SDG&E, making daily predictions about failures on 200,000 devices in the distribution grid using CNNs, LSTMs, and self-supervised embeddings. Built their first continuous integration pipeline using Azure DevOps.
- Trained and productionized deep computer vision models at scale to prioritize and assist PG&E’s inspection of millions of drone-captured images.
- Enabled real-time, automated assistance in the inspection of more than 100 thousand aerial images via four object detection and classification pipelines.
- Restructured a database containing millions of AI-detected components. Reduced query execution time on the DB by more than 50x.
- Replaced manual inspection form questions with AI predictions, reducing manual labor for tens of thousands of inspections. Demonstrated accuracy of over 90% across seven classes.
- Trained and productionized new iterations of a component classification model, adding new classes and improving the precision of existing classes by 3% on average.
- Deployed existing model pipelines to GPU, resulting in around 5x speed-up in response time and eliminating crashes on Kubernetes pods.
Senior Machine Learning Scientist2019 - 2020System1 Biosciences
Technologies: SQL, Deep Learning, Signal Processing, Image Processing, Experimental Research, Experimental Design, Continuous Integration (CI), Docker, Git, Project Management, Data Visualization, Statistics, Presentations, Amazon Web Services (AWS), Machine Learning, Convolutional Neural Networks, Computer Vision, PyTorch, Scikit-learn, Pandas, NumPy, SciPy, Python, Data Science, Time Series Analysis, Keras, Technical Project Management, Computational Biology, Scientific Computing, Python 3, Artificial Intelligence (AI), Data Analysis
- Led the video microscopy data pipeline team with biology, robotics, software, and data science members. Deployed a 12-step processing DAG in AWS on 500+ videos (over 10TB). Reduced the failure rate of QC-ed videos by 75% and increased frame rate 10x.
- Built and productionized CNN-based image segmentation for automated quantification of tissue protein expression. Deployed in AWS on over 1,000 scanned images (more than 1PB).
- Demonstrated effects of lab protocols on tissue quality, used for patents and investor demos.
- Created an advanced analytics pipeline to measure and describe neuronal network activity. It was used to demonstrate the significant and distinct effects of three different neuromodulatory drugs and validate new lab protocols.
- Built an analytics pipeline to assay hierarchical effects of experimental variables. Created novel, statistically rigorous methods for demonstrating disease effects.
- Served as a technical lead for the neurodegenerative disease program. Planned and executed scientific roadmaps and company and investor presentations while coordinating experimental designs, data pipelines, ML, and analytics.
Senior Data Scientist—Machine Learning2017 - 2019Intuit, Inc.
Technologies: A/B Testing, Git, Python, Pandas, Amazon Web Services (AWS), Docker, Technical Project Management, Keras, Deep Learning, Hadoop, PySpark, SQL, Natural Language Processing (NLP), SciPy, NumPy, Machine Learning, Data Science, Scikit-learn, Data Visualization, Python 3, TensorFlow, Artificial Intelligence (AI), Data Analysis
- Acted as a technical lead for QuickBooks Online's self-help recommendation algorithm, which required a multi-team collaboration. Expanded its use to all customer segments and submitted multiple patents for its back-end ML algorithms.
- Trained, productionized, and A/B tested the first real-time deep learning models (RNN and LSTM) in QuickBooks. Boosted customer engagement by 55%, reduced customer support call rates by 10%, and reduced direct annual costs by at least $900,000.
- Transformed data from millions of users and billions of clickstream events via distributed computing such as Spark to create embedded representations of online user activity and improve multiple existing ML services.
- Trained interns and led exploratory machine learning and NLP research for customer success. Projects included an API service to anonymize customer chat data and a predictive customer support call intent model.
Visiting Scientist2015 - 2017Oregon Health & Science University
Technologies: Scientific Computing, Linux, Experimental Research, 3D Image Processing, Signal Processing, Experimental Design, Factor Analysis, Python, Data Visualization, Statistics, Computer Vision, Graph Theory, Machine Learning, Data Science, Data Analysis
- Led two research projects on a six-member data team comprised of graduate students, postdoctoral scientists, and research staff, resulting in three publications and multiple conference presentations.
- Built multilinear regression models explaining more than 60% variance in the correlational structure of fMRI time-series data, using anatomical and gene expression data as features.
- Trained students and research staff in structural and functional MRI, signal processing, and data analysis.
Graduate Student Researcher2012 - 2017UC Davis Center for Neuroscience
Technologies: Signal Processing, 3D Image Processing, Linux, Experimental Design, Experimental Research, Data Visualization, Statistics, Data Science, Git, Time Series Analysis, Data Analysis
- Developed data analysis strategies independently. Selected for a two-year Autism Speaks research fellowship award for my work.
- Produced results that were instrumental in securing a federal grant worth over $1.5 million.
- Published 12 peer-reviewed studies with over 700 citations, covering advanced statistical and computational techniques for processing multimodal brain MRI data and characterizing typical and atypical brain organization.