Brian Todd, Developer in New York, NY, United States
Brian is available for hire
Hire Brian

Brian Todd

Verified Expert  in Engineering

Bio

Brian is an experienced data scientist and machine learning engineer with a track record of researching and deploying a wide range of models for natural language processing (NLP) tasks, computer vision algorithms, classical machine learning algorithms, Bayesian statistical models, time series analysis models, and large-scale data mining algorithms.

Portfolio

Parabolica Labs
XGBoost, Natural Language Toolkit (NLTK), SpaCy, Dask, Databricks, Hadoop...
Twosense, Inc.
Amazon Web Services (AWS), Git, SciPy, Scikit-learn, Pandas, NumPy, SQL, Cython...
Skedaddle
SciPy, Scikit-learn, NumPy, Pandas, Snowflake, SQL, Flask, Python

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Linux, Scikit-learn, Pandas, SciPy, NumPy, SQL, C++, Cython, Python

The most amazing...

...project I've developed is a patented, real-time deep learning pipeline that detected and classified user actions based on sensor data from a wrist-worn device.

Work Experience

Data Scientist | Machine Learning Engineer Contractor

2019 - PRESENT
Parabolica Labs
  • Provided machine learning and data science solutions with a specialization in natural language processing (NLP), time series analysis problems, applications of machine learning to sensor data, deep learning models, and Bayesian statistical models.
  • Developed a production-grade conversational AI/chatbot for a Fortune 500 healthcare company (Rasa, spaCy) that allowed users to interact with their account and plans features through conversation.
  • Performed in-depth analysis (SpaCy, NLTK, Textacy) and created data visualizations on large-scale text datasets to extract semantics, keywords, phrases, intents, and other linguistic features for product development and market exploration.
  • Developed production models for classifying cohorts of users (Cython, Databricks, SciPy, Scikit-learn, XGBoost) using matrix factorization and prior behavioral data.
  • Developed distributed data pipelines for processing and transforming large sets of financial time series data using Cython, C++, Dask, and Parquet.
Technologies: XGBoost, Natural Language Toolkit (NLTK), SpaCy, Dask, Databricks, Hadoop, Scikit-learn, Pandas, NumPy, SQL, Cython, Python

Machine Learning Engineer

2018 - 2019
Twosense, Inc.
  • Researched, developed, and deployed a suite of machine learning models (NumPy, SciPy, scikit-learn, XGBoost, Cython) that authenticated users based on behavioral biometrics collected from sensors on phones and computers.
  • Wrote large-scale data processing scripts that consumed real-time biometric data for model training and testing using NumPy, AWS Redshift, and Pandas.
  • Produced Jupyter notebooks visualizing model validation metrics, data transformations, and critical data analysis.
  • Guided best practices, led technical sessions, collaborated on project specifications, and wrote significant amounts of research documentation. I was hired as the first member of the machine learning and data science team.
  • Wrote suites of unit tests for data processing, feature, extraction, and model validation.
  • Completed feature extraction tasks from large, disparate datasets for model development.
Technologies: Amazon Web Services (AWS), Git, SciPy, Scikit-learn, Pandas, NumPy, SQL, Cython, Python

Senior Data Scientist

2017 - 2018
Skedaddle
  • Developed a production-grade API for pricing algorithms using NumPy, scikit-learn, Lambda, and API Gateway.
  • Built and maintained complete data pipeline platform reading public APIs using EC2, Lambda, and Snowflake, for creating time series models predicting product demand.
  • Wrote a serverless web app displaying data visualizations and real-time monitors of key metrics using Flask, Zappa, and D3.js.
  • Provided ad hoc analysis for all domains within the organization, and guided other team members in their analyses.
Technologies: SciPy, Scikit-learn, NumPy, Pandas, Snowflake, SQL, Flask, Python

Senior Data Scientist

2016 - 2017
Whoop, Inc.
  • Led team code reviews and collaborated on the direction of quarterly team goals and projects.
  • Researched, developed, and deployed novel, deep, convolutional neural networks for classifying activities based on multidimensional sensor data using PyTorch.
  • Developed and maintained data pipelines consuming from Redshift and PostgreSQL databases.
  • Built a real-time activity detection algorithm for biometric time series data using NumPy and SciPy.
  • Researched and developed convolutional autoencoder models for compressive sensing using PyTorch.
  • Wrote a real-time algorithm to detect how the user wears a sensor based on accelerometer and biometric profiles using NumPy, SciPy, and scikit-learn.
Technologies: Amazon Web Services (AWS), SciPy, NumPy, Pandas, Scikit-learn, Redshift, TensorFlow, PyTorch, SQL, Python

Associate Data Scientist

2014 - 2016
Cogo Labs
  • Implemented a Python library for A/B testing with Bayesian Statistics and other measurement tools.
  • Developed statistical methods to mine URLs from user clickstream data (Presto) and developed neural networks (Tensorflow, Keras) to model user level characteristics from the browsing history.
  • Applied NLP (scikit-learn, NumPy) techniques to cluster ad campaigns based on content similarity.
  • Wrote algorithms (NumPy, SciPy, Pandas, scikit-learn) for user-campaign selection and developed models based on clickstream data, market intent, and demographics.
  • Built tools to monitor critical metrics and score production models daily using Python, MySQL, PostgreSQL, Presto, and MapReduce.
Technologies: Scikit-learn, NumPy, SciPy, Pandas, Keras, TensorFlow, SQL, Python

Full-stack Developer

2013 - 2014
Microsoft Project Users Group
  • Maintained the existing infrastructure and developed new back-end features and APIs on the legacy LAMP (PHP) stack.
  • Performed front-end work using HTML5, CSS, and JavaScript to improve user experience and fix legacy bugs.
  • Wrote Python scripts to automate data processing and reporting from third-party APIs and internal database sources.
Technologies: JavaScript, Python, PHP, MySQL, Apache, Linux

ABayesian

https://github.com/brian-todd/ABayesian
Python package for Bayesian hypothesis testing.

CyDTW

https://github.com/twosense/CyDTW
High performance DTW library written in Cython for Python 3.x.

FX-Data-Loader

https://github.com/brian-todd/fx-data-loader
This project serves as an extensible framework for processing and cleaning historical FX data from Dukascopy's data feed.

CFBBot

https://github.com/brian-todd/CFBBot
Chatbot built to answer questions about several college football teams. Built using Rasa, Spacy, and requests.

Patented Activity Recognition Algorithm

https://patents.google.com/patent/US10548513B2/
A variety of techniques are used automate the collection and classification of workout data gathered by a wearable physiological monitor. The classification process is staged in order to correctly and efficiently characterize a workout type. Initially, a generalized workout event is detected using motion and heart rate data. Then a location of the monitor on a user is determined. An artificial intelligence engine can then be conditionally applied (if a workout is occurring and a suitable device location is detected) to identify the type of workout. In addition to improved speed and accuracy, a workout detection process implemented in this manner can be realized with a sufficiently small computational footprint for deployment on a wearable physiological monitor.

US Patent US10548513
2009 - 2013

Bachelor of Science Degree in Mathematics

University of Michigan - Ann Arbor, Michigan, USA

Libraries/APIs

LSTM, NumPy, SciPy, Pandas, Scikit-learn, PyTorch, XGBoost, Matplotlib, SpaCy, Natural Language Toolkit (NLTK), Dask, Keras, TensorFlow, OpenCV, D3.js

Tools

GitHub, Jupyter, Git, Seaborn, Rasa.ai, Sublime Text, Apache, Gensim, Apache Impala

Languages

Python, SQL, C, PHP, JavaScript, C++, Bash, Snowflake

Platforms

Jupyter Notebook, Linux, Amazon Web Services (AWS), Databricks, Docker, Unix, Google Cloud Platform (GCP), NVIDIA CUDA

Storage

PostgreSQL, MySQL, Redshift, SQLite, Apache Hive

Paradigms

Unit Testing, MapReduce

Frameworks

Flask, Hadoop, Presto, Spark

Other

Data Analysis, Data Visualization, Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNNs), Gated Recurrent Unit (GRU), Data Analytics, Scientific Computing, Numerical Methods, Numerical Modeling, Data Science, Machine Learning, Computer Vision, Natural Language Processing (NLP), A/B Testing, Mathematics, Statistics, Deep Learning, Data Mining, Generative Pre-trained Transformers (GPT), Generative Artificial Intelligence (GenAI), Artificial Intelligence (AI), Cython, Graphics Processing Unit (GPU), GPU Computing, Google BigQuery

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring