Chris Seal, Developer in Cincinnati, OH, United States
Chris is available for hire
Hire Chris

Chris Seal

Verified Expert  in Engineering

Machine Learning Developer

Location
Cincinnati, OH, United States
Toptal Member Since
September 20, 2018

Chris is an experienced data scientist with over five years of experience working independently and in government subcontracting with a leading data analytics firm. His well-rounded university education and work history include a magna cum laude undergraduate degree in physics, a master's in music composition, an advanced degree from Galvanize Data Science Immersive, and a deep learning specialization from DeepLearning.AI.

Portfolio

WhiteTower Capital Management
Data Science, Time Series, SQL, Predictive Learning, PostgreSQL, Flask, Dash...
Toptal
SpaCy, Matplotlib, XGBoost, Object-oriented Programming (OOP)...
Fantasy Outliers
Matplotlib, XGBoost, Object-oriented Programming (OOP), Data Visualization...

Experience

Availability

Part-time

Preferred Environment

GitHub, IPython, Atom, Sublime Text, Linux

The most amazing...

...result of a project is that I beat ESPN's fantasy football projections and tied Vegas's game-winners using raw, unadjusted machine learning.

Work Experience

Data Scientist

2021 - PRESENT
WhiteTower Capital Management
  • Built a company back end and front-end infrastructure from scratch, including several dozen table models and data sources, analytics scripts, machine learning models, data visualizations, etc.
  • Created a "Is the total crypto market cap going to be higher or lower in 24h?" model that performed with 80% accuracy in around 100 days of live testing.
  • Queried blockchain smart contracts directly to track, understand, and manage often complex transactions and holdings. Calculated things like daily portfolio position contribution and APY across various income mechanisms.
  • Managed a couple of contractors, a data scientist, and a full-stack Python developer for a couple of years and counting.
  • Resided over (and was the primary contributor to) a codebase of roughly 85,000 lines of active code (around 32,000 back end and around 53,000 front ends).
  • Created machine learning models for a couple dozen timeseries data streams, a few output variables each ranging from 1-365 days in the future.
Technologies: Data Science, Time Series, SQL, Predictive Learning, PostgreSQL, Flask, Dash, Plotly, HTML, Pandas, Data Analytics, Amazon Web Services (AWS), Data Engineering, Financial Modeling

Data Scientist | Owner

2018 - PRESENT
Toptal
  • Provided end-to-end automated data-as-a-service solutions involving data acquisition, database setup and maintenance, exploratory analysis, dashboards/data visualizations, machine learning for predictive and unsupervised modeling, and web apps.
  • Created a detailed plan-of-action that a global IT consulting company serving as end-to-end instructions to integrate a new service into their existing platform. Built a prototype to demonstrate functionality to stakeholders.
  • Mapped out a new database system from an existing operational schema for analysts at a leading collections agency to use, which simplified and lead to more robust analyses (SQL, Airflow).
  • Built a flask app for a publicly-traded healthcare company that optimizes efficiency and accuracy when preparing compliance reports. Incorporated human-in-the-loop report initialization, automated querying, task assignment, pdf generation, and more.
  • Built a parameterized keyword extraction API for the US Government to assist in summarizing and searching a large number of documents using variations of many popular NLP techniques and case-specific customization.
  • Converted an Excel-based unpoliced database and reporting process for an investment firm to a scalable, verifiable, and flexible database schema. Created an automated pdf summary report with a range of visualizations visible to key stakeholders.
  • Built an automated report for a leading retail investment company that provided extensive data visualizations which gave insight into all aspects of the sales pipeline.
  • Conducted a literature review for a start-up e-learning platform that resulted in a prioritized data collection and modeling plan of action from launch onward.
  • Built database integration mechanism (merged data from various APIs) and dash-plotly web application for cryptocurrency fund to help investors analyze coin and portfolio metrics.
Technologies: SpaCy, Matplotlib, XGBoost, Object-oriented Programming (OOP), Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Data Visualization, Amazon Web Services (AWS), GitHub, Deep Learning, Pandas, MongoDB, Flask, JavaScript, Data Science, Python 3, Machine Learning, NumPy, Apache Airflow, SQL, TensorFlow, Keras, Scikit-learn, Python

Lead Data Scientist | Owner

2016 - PRESENT
Fantasy Outliers
  • Scraped publicly available historical fantasy football data from thousands of leagues, and created an automated data scraping+wrangling process that obtained and merged NFL data from a variety of sources.
  • Beat ESPN's weekly projections in comparison during Weeks 6-16 of 2017 - https://medium.com/fantasy-outliers/how-artificial-intelligence-ai-beat-espn-in-fantasy-football-204f4c05e1c9.
  • Predicted several key underrated players going into the 2017 season (Russell Wilson, Zach Ertz, Mark Ingram) and QB projections beat expert consensus rankings - https://bit.ly/2LPpxOa.
  • Built an interactive website using HTML, CSS, D3.js, and Javascript with automated scripts in Python that interactively explores what actually happened in competitive leagues based on web-scraped data of public leagues (fantasyoutliers.com).
  • Tied Vegas's up-to-kickoff game-winner projections using automated predictions based on data available Tuesday morning the week prior with no manual adjustments for injuries - https://bit.ly/35GimyG.
  • Developed an automated lineup optimizer that showed promise in initial results in 50/50 ball at Draft Kings Daily Fantasy Sports (winning most lineups in the last three weeks of 2018).
  • Used an automated pipeline of preprocessing and predictive algorithms to iterate through both meta parameters, model parameters, and features to find the most predictive models for QB, WR, RB, TE, K, and D/ST for rookies, second-year players, and veterans.
Technologies: Matplotlib, XGBoost, Object-oriented Programming (OOP), Data Visualization, GitHub, Data Science, Python 3, Machine Learning, Amazon Web Services (AWS), Scikit-learn, Pandas, NumPy, CSS, HTML, D3.js, R, Python

Senior Data Scientist

2020 - 2021
Homee
  • Developed a data warehouse pipeline that transformed the transactional database into Snowflake and then built Looker dashboards with LookML for company-wide business intelligence via Python scripts.
  • Built an alert system to help operations focus their efforts using a parameterized pipeline of feature engineering, XGBClassifiers and XGBRegressors, and an Alerter module that updates models and threshold on a weekly basis.
  • Saved the company tens of thousands of dollars per year by manually integrating Hubspot data with Snowflake and the rest of the company's database.
  • Created an NLP tagging tool from raw text that helped the company track BI metrics in a more granular way than previously possible. Previous attempts at this task from other developers had failed.
Technologies: LookML, SpaCy, Object-oriented Programming (OOP), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), GPT, Data Visualization, Amazon Web Services (AWS), GitHub, Pandas, Data Science, Python 3, Machine Learning, Matplotlib, XGBoost, Looker, Snowflake, MySQL, Python

Senior Data Scientist

2019 - 2019
Clarigent Health
  • Improved status quo of a published, patented suicide ideation classification model based on therapist transcripts by 12%, based on leave-one-out validation, and the modeling approach performed better on a new dataset.
  • Expanded the scope of what the company previously thought was possible to predict. Built successful models in areas they hadn't previously thought possible.
  • Built a parameterized pipeline that includes version-controlled, advanced NLP feature engineering, dynamic dimensionality reduction, concurrent hyperparameter search and feature selection, model explainability, and insights across multiple models.
Technologies: Matplotlib, Object-oriented Programming (OOP), Data Visualization, GitHub, Pandas, Data Science, Python 3, Machine Learning, Scikit-learn, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), SpaCy, XGBoost, Azure, SQL, Python

Data Science Researcher

2016 - 2018
Georgia Tech Research Institute
  • Analyzed team cohesion in League of Legends Matches. Implemented automated data-collection pipeline in MongoDB with >3TB of data of League of Legends match data. Used PCA, K-Means clustering, network density, and others to develop non-skill-based features from a psychological perspective that discriminated between wins and losses. Trained Gradient Boosting Classifier to predict the game winner based on historical psychological dimensions across the team (non-skill-based) with some success (AUC 0.58-0.68).
  • Automated data acquisition, cleaning, merging, and visualizing various publicly available data breach sources, creating a more reliable and complete data source. Created an automated engine using web scraping and NLP to gather and search SEC filings for language containing a high probability of data breach cost disclosures.
  • Built compliance risk metric for government facilities using multiple, auto-trained and aggregated XGBoost models to help prioritize government resources (NLP, NNMF). Built automated, cross-document named entity analysis pipeline, using spacy and Python, for count-based association analysis.
  • Implemented software that inputs log data and a system definition and outputs an interactive system visualization dynamically changing across time as the user steps through time (mxGraph, Javascript, Python, HTML/CSS). Used to understand complex, nested systems and debug issues within them.
  • Built software, inspired by continuous integration platforms, that builds, runs, and assesses granularized performance of a script across all function calls (Python). Links to a git repository and runs with every commit, comparing performance to the previous commit, and raises alerts if performance dips below user-defined thresholds. Visualizes performance history in a dashboard (Flask, SQLAlchemy).
Technologies: Matplotlib, Speech Analytics, MxGraph, XGBoost, Object-oriented Programming (OOP), Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Data Visualization, GitHub, Pandas, MongoDB, JavaScript, Data Science, Python 3, Machine Learning, Scikit-learn, SpaCy, SQL, Flask, R, Python

Data Scientist Contractor

2015 - 2018
Self-employed (remote)
  • Built automated information extraction engine for unstructured financial statements using a unique pipeline of tree-based ensemble classifiers. Enabled company to engage in more complex historical analyses. Decreased data entry time and increased accuracy. Displayed results of classification models in an interactive website where users are pointed to areas of low confidence. System started with a small data set, and is built in such a way where models can be retrained from scratch at the click of a button when new data has been validated. (Python, Flask).
  • Created a Monte-Carlo-based pricing simulator that provides insight into both portfolio-wide and individual client pricing strategies with very little information about the customer. Expected profit simulated distributions combined with visualizations helped pricing team understand probabilistic expectations for a given customer, which lead to better client relationships. Built an automated system forecasting eligible assets, which led to higher profits.
  • Implemented first-of-kind program that analyzed signal rate data using a sequence of Random Forest Classifiers and logic to attribute signal load to individual devices and analyze results. Continued work on capstone project through prototype completion.
Technologies: SpaCy, Matplotlib, XGBoost, Object-oriented Programming (OOP), GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Visualization, GitHub, Pandas, JavaScript, Data Science, Python 3, Machine Learning, SQL, MongoDB, R, CSS, HTML, Flask, Python

Outbound Business Development + Operations

2014 - 2015
Connect First
  • Created foundational methodologies for a new lead generation department, which led to better sales and more internal funding for our department.
Technologies: Microsoft Excel

Composer, Founder

2010 - 2015
Tuneplant
  • Developed project management and relationship building skills with clients, maintaining profitable, repeat-customer business, and 5-star rating.
Technologies: Music Composition

Business Development and Music Production

2012 - 2014
alcheh&hunt
  • Grew list from ~100 to 900+ organically developed, active contacts in 12 months through introductory meeting generation with top-tier advertising agencies.
Technologies: Sales, Music Composition

Senior Diagnostic Consultant / Database Analyst

2005 - 2008
The Nielsen Company
  • Worked with VP’s and C-Level executives to create and implement a comprehensive quantitative and qualitative framework describing the consumer adoption process.
  • Used Excel and SPSS to craft data-driven responses to inquiries regarding historical database and to conduct research, which resulted in internal recognition of achievement award.
Technologies: SPSS, Microsoft Excel

Fantasy Football Predictive Models Beat ESPN, Tied Vegas

https://medium.com/fantasy-outliers
Last year, Fantasy Outliers’ predictive models helped a disproportionate number of users win their leagues, spotted Free Agent pickups a week or two before others started talking about them, gave good start/sit direction. When compared to ESPN's projections, yearly overall rankings were more accurate than ESPN’s 72% of the time and were directionally accurate 84% of the time for quarterbacks. Weekly projections were more accurate than ESPN's 57% of the time and directionally accurate 64% of the time for quarterbacks who were likely starters. Other positions were less accurate, but still better than ESPN often.

In 2018, we implemented a game winner prediction model that predicted NFL game winners with information available Tuesday morning that ended up tying Vegas's predictions that used information available up until kickoff.

Full write-ups include, How Artificial Intelligence (AI) beat ESPN in Fantasy Football (https://medium.com/fantasy-outliers/how-artificial-intelligence-ai-beat-espn-in-fantasy-football-204f4c05e1c9) and Can machine learning help improve your fantasy football draft? (https://medium.com/fantasy-outliers/can-machine-learning-can-help-improve-your-fantasy-football-draft-4ceea1f1b2bd).

Attributing Flowrate Signal to Devices Using Data Sensors

For a capstone project at Galvanize, built a system that uses data from sensors to analyze energy efficiency. The system can determine what devices or appliances are currently turned on and the resource demands attributed to each device, allowing for further usage optimization downstream.

Languages

Python 2, Python 3, Python, SQL, JavaScript, HTML, CSS, Snowflake, R, CSS3

Frameworks

Flask, Spark

Libraries/APIs

Scikit-learn, XGBoost, Matplotlib, Pandas, NumPy, TensorFlow, Keras, D3.js, Spark Streaming, SpaCy, jQuery

Tools

NLPP, MxGraph, Git, GitHub, Kafka Streams, Spark SQL, Sublime Text, Atom, IPython, Apache Airflow, Microsoft Excel, SPSS, Looker, Plotly

Paradigms

Data Science, Object-oriented Programming (OOP), Agile, Anomaly Detection

Other

Data Visualization, Machine Learning, Natural Language Processing (NLP), Speech Analytics, Algorithms, Data Mining, Data Analytics, Software Development, GPT, Generative Pre-trained Transformers (GPT), Deep Learning, Agile Data Science, Convolutional Neural Networks (CNN), Time Series Analysis, Sentiment Analysis, Data Scraping, Music Composition, Sales, Faust, LookML, Psychology, Time Series, Predictive Learning, Dash, Data Engineering, Financial Modeling

Platforms

Apache Kafka, Linux, Amazon Web Services (AWS), Azure, Windows

Storage

MongoDB, NoSQL, PostgreSQL, Amazon S3 (AWS S3), Redshift, MySQL

2005 - 2007

Master's Degree in Music Composition

University of Louisville - Louisville, KY

2000 - 2004

Bachelor's Degree in Physics, Music, Psychology (minor)

Wake Forest University - Winston-Salem, NC

APRIL 2020 - PRESENT

Data Streaming Nanodegree

Udacity

MARCH 2020 - PRESENT

Data Engineering Nanodegree

Udacity

JANUARY 2019 - PRESENT

Deep Learning Specialization

Coursera

APRIL 2016 - PRESENT

Data Analyst

Udacity

SEPTEMBER 2015 - PRESENT

Data Science Immersive Bootcamp

Galvanize

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring