David Grayson, Data Scientist and Machine Learning Developer in Oakland, CA, United States
David Grayson

Data Scientist and Machine Learning Developer in Oakland, CA, United States

Member since October 18, 2020
David is an experienced data and ML scientist with a PhD and demonstrated success at large and small companies. He published 12+ papers on computational neuroimaging, designed and built real-time ML apps for QuickBooks, improving product experience for over a million users. He led multiple initiatives at a biotech startup predicting neurological disease using novel computer vision, analytics, and ML methods. David is passionate about helping clients leverage data and AI to maximize their impact.
David is now available for hire

Portfolio

Experience

Location

Oakland, CA, United States

Availability

Part-time

Preferred Environment

Linux, Slack, Python 3, MacOS

The most amazing...

...ML product I've built is a recommender system for QuickBooks Online users needing help, based on real-time user activity and powered by deep learning.

Employment

  • Senior Machine Learning Scientist

    2019 - 2020
    System1 Biosciences
    • Led the video microscopy data pipeline team with biology, robotics, software, and data science members. Deployed a 12-step processing DAG in AWS on 500+ videos (over 10TB). Reduced the failure rate of QC-ed videos by 75% and increased frame rate 10x.
    • Built and productionized CNN-based image segmentation for automated quantification of tissue protein expression. Deployed in AWS on over 1,000 scanned images (more than 1PB).
    • Demonstrated effects of lab protocols on tissue quality, used for patents and investor demos.
    • Created an advanced analytics pipeline to measure and describe neuronal network activity. It was used to demonstrate the significant and distinct effects of three different neuromodulatory drugs and validate new lab protocols.
    • Built an analytics pipeline to assay hierarchical effects of experimental variables. Created novel, statistically rigorous methods for demonstrating disease effects.
    • Served as a technical lead for neurodegenerative disease program. Planned and executed scientific roadmaps and company and investor presentations while coordinating experimental designs, data pipelines, ML, and analytics.
    Technologies: SQL, Deep Learning, Signal Processing, Image Processing, Experimental Research, Experimental Design, Continuous Integration (CI), Docker, Git, Project Management, Data Visualization, Statistics, Presentations, Amazon Web Services (AWS), Machine Learning, Convolutional Neural Networks, Computer Vision, PyTorch, Scikit-learn, Pandas, NumPy, SciPy, Python
  • Senior Data Scientist—Machine Learning

    2017 - 2019
    Intuit, Inc.
    • Acted as a technical lead for QuickBooks Online's self-help recommendation algorithm, which required a multi-team collaboration. Expanded its use to all customer segments and submitted multiple patents for its backend ML algorithms.
    • Trained, productionized, and A/B tested the first real-time deep learning models (RNN and LSTM) in QuickBooks. Boosted customer engagement by 55%, reduced customer support call rates by 10% and reduced direct annual costs by at least $900,000.
    • Transformed data from millions of users and billions of clickstream events via distributed computing such as Spark to create embedded representations of online user activity and improve multiple existing ML services.
    • Trained interns and led exploratory machine learning and NLP research for customer success. Projects included an API service to anonymize customer chat data and a predictive customer support call intent model.
    Technologies: A/B Testing, Git, Python, Pandas, Amazon Web Services (AWS), Docker, Technical Project Management, Keras, Deep Learning, Hadoop, PySpark, SQL, Natural Language Processing (NLP), SciPy, NumPy, Machine Learning
  • Visiting Scientist

    2015 - 2017
    Oregon Health and Science University
    • Led two research projects on a 6-member data team comprised of graduate students, postdoctoral scientists, and research staff, resulting in three publications and multiple conference presentations.
    • Built multilinear regression models explaining more than 60% variance in the correlational structure of fMRI time-series data, using anatomical and gene expression data as features.
    • Trained students and research staff in structural and functional MRI, signal processing, and data analysis.
    Technologies: Scientific Computing, Linux, Experimental Research, 3D Image Processing, Signal Processing, Experimental Design, Factor Analysis, Python, Data Visualization, Statistics, Computer Vision, Graph Theory, Machine Learning
  • Graduate Student Researcher

    2012 - 2017
    UC Davis Center for Neuroscience
    • Developed data analysis strategies independently. Selected for a two-year Autism Speaks research fellowship award for my work.
    • Produced results that were instrumental in securing a federal grant worth over $1.5 million.
    • Published 12 peer-reviewed studies with over 700 citations, covering advanced statistical and computational techniques for processing multimodal brain MRI data and characterizing typical and atypical brain organization.
    Technologies: Signal Processing, 3D Image Processing, Linux, Experimental Design, Experimental Research, Data Visualization, Statistics

Experience

  • QuickBooks Online In-product Help Recommender (Development)

    Acted as a lead data scientist for QuickBooks Online's self-help recommendation app, which required a multi-team collaboration. The goal was to surface the most relevant help articles to customers and enable them to resolve their problems from within the product. My role was to build and integrate the ML engine.

    For data exploration, extraction, and feature engineering, I liaised with data science and data engineering teams to understand the multiple sources of relevant data. I wrote efficient PySpark code to ingest and transform high volumes of clickstream (billions of rows), customer profile data, and help article databases.

    For model training, I employed a novel deep learning approach consisting of shared layers, LSTMs, and merging temporal sequences with static features.

    To productionize the model, I led a team consisting of other DS contributors as well as front-end and back-end developers, and members of performance testing and A/B testing teams. Together we integrated the model with the existing click data streams, built I/O specs, ensured adequate stability and response latency, and measured significant improvements in customer engagement (55% higher clickthrough on articles) and support metrics (10% lower call rates).

  • Disease Classification from Neuronal Network Activity (Development)

    Served as a lead data scientist and technical lead on a Scrum team analyzing videos of high-resolution neural tissue microscopy data, with members from biology, robotics, software, and data science. The goal was to build, validate, and enable CI/CD for the pipeline, and use it to measure the effects of pharmacological perturbations, lab protocols, and genetic modifications on the activity of networks of artificially grown neurons.

    The key challenge was representing extremely high dimensional data (high spatiotemporal resolution) via low-D, biologically interpretable metrics. We built a 12-module semi-automated data processing graph (DAG), including supervised and unsupervised CV methods constrained by biological priors, to clean and standardize the data and intermediary QC steps that were automatically triggered and triggered the final stages.

    Deployed as a streaming app in AWS on over 10TB of data, it reduced QC-ed videos' failure rate by 75% and enabled us to increase the temporal resolution 10x.

    For analytics, I designed two novel ML-based methods to deconfound experimental variables and produced several critical endpoints, including distinct effects of three different drugs and significant accuracy in predicting disease.

Skills

  • Languages

    Python, SQL, Python 3
  • Paradigms

    Data Science, Continuous Integration (CI)
  • Other

    Machine Learning, Presentations, Experimental Design, Experimental Research, Computer Vision, Natural Language Processing (NLP), Deep Learning, Technical Project Management, Statistics, Data Visualization, Mathematics, Probability Theory, Signal Processing, 3D Image Processing, A/B Testing, Scientific Computing, Image Processing, Convolutional Neural Networks, Graph Theory, Network Science, Cognitive Science, Computational Biology, Factor Analysis
  • Libraries/APIs

    SciPy, NumPy, Pandas, Scikit-learn, PySpark, Keras, PyTorch
  • Tools

    Git, PyCharm, Slack
  • Platforms

    Linux, MacOS, Jupyter Notebook, Amazon Web Services (AWS), Docker
  • Frameworks

    Hadoop
  • Industry Expertise

    Project Management

Education

  • Doctoral Degree in Neuroscience (Computational Neuroimaging)
    2012 - 2017
    University of California, Davis - Davis, California
  • Bachelor's Degree in Computational Neuroscience
    2008 - 2012
    Cornell University - Ithaca, NY

To view more profiles

Join Toptal
Share it with others