Matthew is available for hire

Matthew Alhonte

Verified Expert in Engineering

Data Scientist and Developer

Location

New York, NY, United States

Toptal Member Since

August 21, 2018

Matt has officially worked as a Python-based data scientist for the past six years; however, he's spent the last ten at the intersection of stats and programming (before the term data scientist had caught on). He combines strong technical skills with a rigorous background in experiment design and statistical inference. More recently, he's been focusing on machine learning, including some natural language processing and computer vision.

Portfolio

Ophidian Scientific

Amazon Web Services (AWS), PostgreSQL, Keras, XGBoost, Random Forests, Spark...

Birch Infrastructure

Google Cloud Platform (GCP), BigQuery, Data Build Tool (dbt), Prefect, Python...

The University of Colorado — Office of Data Analytics

Amazon Web Services (AWS), XGBoost, Random Forests, Experimental Design...

Experience

Statistics - 11 years Data Visualization - 11 years Python - 6 years Machine Learning - 5 years Pandas - 5 years SQL - 5 years Functional Programming - 4 years Scikit-learn - 3 years

Availability

Part-time

Preferred Environment

PyCharm, Git, Spacemacs, Visual Studio Code (VS Code), Jupyter

The most amazing...

...thing I've done is to reverse-engineer an undocumented file format containing electrophysiology readings.

Work Experience

Data Science Consultant

2013 - PRESENT

Ophidian Scientific

Assisted numerous small clients with data-related work, ranging from data science and analysis, data engineering, and machine learning engineering.
Designed and built ETL pipelines in Python, Dask, and Prefect.
Oversaw the migrations between Google Sheets and Airtable. Airtable automation was execued in Python.
Used operations research libraries in Python to optimize teams for the sports betting website FanDuel.
Built a natural language processing (NLP) classifier for an article archive of a finance-based publication.

Technologies: Amazon Web Services (AWS), PostgreSQL, Keras, XGBoost, Random Forests, Spark, Database Design, Experimental Design, Clojure, Docker, Jupyter, Time Series, Pandas, SQL, Machine Learning, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Operations Research, Data Visualization, ETL, Scientific Data Analysis, Data Engineering, Data Science, Python

Data Scientist & Data Architect

2021 - 2021

Birch Infrastructure

Assisted with architect data infrastructure for a utility-scale renewable energy and data center company.
Created data pipelines with Prefect, mostly stitching together Google Cloud Functions and Cloud Run jobs.
Managed BigQuery data warehouse with dbt, made table schemas and transformations.
Set up data infrastructure (including Prefect and dbt).

Technologies: Google Cloud Platform (GCP), BigQuery, Data Build Tool (dbt), Prefect, Python, Serverless

Senior Data Scientist

2018 - 2019

The University of Colorado — Office of Data Analytics

Performed statistical analyses and modeling to support student success and helped establish practices during a restructuring of the university’s office of data analytics.
Created and presented findings and visualizations to high-level administrators with Jupyter and Zeppelin.
Developed a Monte Carlo simulation-based model to predict semester-by-semester student retention.
Built a Bayesian model of re-offense after student misconduct.
Modeled the effects of different kinds of financial aid with XGBoost.
Created a model to predict student GPAs with scikit-learn and Keras.
Helped establish practices during a restructuring of the university’s office of data analytics.

Technologies: Amazon Web Services (AWS), XGBoost, Random Forests, Experimental Design, Data Visualization, Time Series, SQL, Data Science, Machine Learning, Oracle Database, Zeppelin, Jupyter, Keras, PySpark, Scikit-learn, Pandas, Python

Data Engineer

2017 - 2018

NOMI Beauty

Designed and built the data infrastructure for a startup that made it easier to book hair-&-makeup appointments in hotel rooms.
Architected a big data pipeline with Spark, Kafka, and Cassandra.
Built data dashboards in Tableau for the operations team.
Designed an ETL for survey data from Typeform's API into MySQL.
Created reports in Jupyter notebooks with data visualizations in Python with Altair and Seaborn.
Designed and implemented a database schema in MySQL.
Designed and supported ETL from Couchbase to MySQL using Python.

Technologies: Amazon Web Services (AWS), Spark, Database Design, Data Visualization, SQL, Jupyter, Simulations, Cassandra, Apache Kafka, PySpark, MySQL, Pandas, Python

Data Science and Blockchain Integration Consultant

2017 - 2017

Tanktwo, Inc.

Architected a blockchain-based solution for managing IoT devices and the data they generate.
Create a demo of a potential network using Hyperledger.
Simulated a private blockchain network in action, using Python.
Helped present a demo to the venture capitalists who were looking to invest.
Conducted research on optimal blockchain implementation to suit business needs.

Technologies: Amazon Web Services (AWS), Jupyter, Data Visualization, Time Series, Hyperledger, Pandas, Python

Data Science Consultant

2014 - 2017

Hospital for Special Surgery

Worked on data science topics in a neurology lab that investigated intraoperative neurophysiological monitoring (IONM)—monitoring muscles and nerves during surgery to prevent damage.
Reverse-engineered an undocumented file format containing biosignal data.
Attempted to classify nerve conduction readings as indicating injury or anesthesia response using Scikit-learn.
Visualized biosignal data with Plotly and presented findings.
Investigated using Higuchi Fractal Dimension of nerve conduction readings taken during surgery as a means of assessing potential damage.
Analyzed biosignal data with a Python data suite (NumPy, Pandas, and SciPy).

Technologies: Experimental Design, Data Visualization, Time Series, Data Science, Machine Learning, Scikit-learn, PyEEG, Jupyter, Plotly, SciPy, Pandas, NumPy, Python

Natural Language Processing Consultant

2015 - 2015

New York City Department of Administrative Services

Scraped PDFs with Python to help digitize the back catalog for a publication, The City Record.
Helped design a schema for entries (such as extracting addresses).
Created data cleaning regimens to standardize entries from over a hundred city agencies reported in different formats.
Used Python and NLTK to perform exploratory natural language processing (NLP) on a century-long corpus of publications.
Worked to integrate a pipeline into their MS Access.

Technologies: Jupyter, Data Visualization, Data Science, Machine Learning, Python, Natural Language Toolkit (NLTK)

Integration and Development Consultant

2013 - 2014

Broadband Technologies Group

Provided computer vision-based assistance for digitizing video archives.
Used OpenCV and Python to tag damaged video areas.
Implemented Python to automatically fix certain types of damaged videoes.
Helped architect an Android application to deliver simultaneous subtitles for live performances.
Prepared presentations with Jupyter.

Technologies: Jupyter, Data Visualization, OpenCV, Python

Research Assistant

2008 - 2013

Hunter College

Designed and validated a novel psychometric scale.
Analyzed survey data in SPSS.
Presented findings at research conferences.
Maintained relationships with the lab after graduation, eventually moving from data analysis to Python.
Worked on the publication of older data.

Technologies: Experimental Design, Data Visualization, Data Science, SciPy, Python, SPSS

Summer Research Assistant

2009 - 2010

Yale School of Medicine

Designed and piloted a small study investigating psychopathic traits and behavior during an ultimatum game.
Analyzed GSR data.
Ran research participants through computer-based tasks in a presentation and DMDX.
Analyzed data from surveys and computer-based tasks.
Built and maintained a database of participants.

Technologies: Experimental Design, Data Visualization, Data Science, DMDX, SPSS

Experience

Spring 2018 Complexity Challenge

https://github.com/mattalhonte/sfi-challenge

My entry in the Spring 2018 Complexity Challenge by the Santa Fe Institute.

Graph Theory Notes

This is some code that I wrote to help me to understand the graph theory section of an online course on algorithmic information theory.

Binary Grid Search

https://hackersandslackers.com/tuning-machine-learning-hyperparameters-with-binary-search/

Here I am experimenting with using a binary search to tune hyperparameters for machine learning models in Scikit-learn.

Recasting Low-cardinality Columns as Categoricals

https://hackersandslackers.com/recasting-low-cardinality-columns-as-categoricals-2

A short tutorial on saving memory in Pandas by using categorical variables. It includes a code snippet to take a data frame and recast the low-cardinality columns as categoricals.

Removing Duplicate Columns in Pandas

https://hackersandslackers.com/remove-duplicate-columns-in-pandas

A short tutorial on finding and removing duplicate columns in Pandas.

Downcast Numerical Data Types with Pandas

https://hackersandslackers.com/downcast-numerical-columns-python-pandas/

A short tutorial on saving memory by downcasting Pandas columns into the smallest possible numerical representation.

Sentiment Analysis With AWS SageMaker

https://github.com/mattalhonte/sagemaker-deployment/tree/master/Project

Classifying movie reviews as positive or negative, using SageMaker's version of XGBoost.

Epilepsy Classifier

https://github.com/mattalhonte/epilepsy-classifier

A capstone project for Udacity's machine learning engineer nanodegree.

Python to Rust

A short walkthrough of training a machine learning model in Python, exporting a model artifact, and serving predictions in Rust. It was accepted as official documentation for a relevant Rust crate called "tract."

Splitting Columns With Pandas

https://hackersandslackers.com/splitting-columns-with-pandas/

I wrote a tutorial on splitting up Pandas columns with nested data.

Skills

Languages

Python 3, Python, SQL, Snowflake, Clojure, Rust

Libraries/APIs

Pandas, Scikit-learn, TensorFlow Deep Learning Library (TFLearn), XGBoost, NumPy, Keras, Dask, SciPy, OpenCV, Natural Language Toolkit (NLTK), PySpark, TensorFlow

Tools

DataViz, Jupyter, Spacemacs, PyCharm, SPSS, Plotly, DMDX, Git, Amazon SageMaker, BigQuery

Paradigms

Data Science, Database Design, Agile, Functional Programming, ETL

Platforms

Jupyter Notebook, Amazon Web Services (AWS), Docker, Hyperledger, Oracle Database, Linux, Zeppelin, Apache Kafka, Google Cloud Platform (GCP), Visual Studio Code (VS Code)

Other

Data, Statistical Data Analysis, Exploratory Data Analysis, Unstructured Data Analysis, Complex Data Analysis, Statistical Methods, Statistical Modeling, Statistical Forecasting, Statistical Analysis, Statistical Significance, Random Forests, Random Forest Regression, Experimental Design, Time Series, Machine Learning, Predictive Modeling, Data Visualization, Data Analysis, Data Analytics, Statistics, Computational Statistics, Bayesian Statistics, Statistical Programming, Amazon Machine Learning, Tf-idf, Convolutional Neural Networks (CNN), Analysis of Variance (ANOVA), Dashboards, Analytical Dashboards, Data Build Tool (dbt), Deep Learning, Natural Language Processing (NLP), Mathematical Modeling, Data Engineering, GPT, Generative Pre-trained Transformers (GPT), Operations Research, Simulations, PyEEG, Scientific Data Analysis, Prefect, Serverless

Storage

Databases, NoSQL, Cassandra, PostgreSQL, MySQL

Frameworks

Spark

Education

2006 - 2012

Bachelor of Arts Degree in Psychology

Hunter College - New York City, NY, USA

Certifications

JANUARY 2020 - PRESENT

Machine Learning Engineer Nanodegree

Udacity

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring