Matthew Alhonte, Statistics Developer in New York, NY, United States
Matthew Alhonte

Statistics Developer in New York, NY, United States

Member since June 20, 2018
Matt has officially worked as a Python-based data scientist for the past six years; however, he's spent the last ten at the intersection of stats and programming (before the term data scientist had caught on). He combines strong technical skills with a rigorous background in experiment design and statistical inference. More recently, he's been focusing on machine learning, including some natural language processing and computer vision.
Matthew is now available for hire



  • Experimental Design, 11 years
  • Data Visualization, 11 years
  • Statistics, 11 years
  • Python, 6 years
  • Pandas, 5 years
  • Machine Learning, 5 years
  • SQL, 5 years
  • Functional Programming, 4 years


New York, NY, United States



Preferred Environment

Jupyter, VS Code, Spacemacs, Git, PyCharm

The most amazing...

...thing I've done is to reverse-engineer an undocumented file format containing electrophysiology readings.


  • Data Scientist

    2018 - 2019
    The University of Colorado — Office of Data Analytics
    • Performed statistical analyses and modeling in support of student success.
    • Created and presented findings and visualizations to high-level administrators with Jupyter and Zeppelin.
    • Developed a Monte Carlo simulation-based model to predict semester-by-semester student retention.
    • Built a Bayesian model of reoffense after student misconduct.
    • Modeled the effects of different kinds of Financial Aid with XGBoost.
    • Created a model to predict student GPAs with Scikit-learn and Keras.
    Technologies: Python, Pandas, Scikit-learn, PySpark, Keras, Jupyter, Zeppelin, Oracle Database
  • Data Engineer

    2017 - 2018
    NOMI Beauty
    • Designed and supported ETL from Couchbase to MySQL using Python.
    • Architected a big data pipeline with Spark, Kafka, and Cassandra.
    • Built data dashboards in Tableau for the operations team.
    • Designed an ETL for survey data from Typeform's API into MySQL.
    • Created reports in Jupyter notebooks with data visualizations in Python with Altair and Seaborn.
    • Designed and implemented a database schema in MySQL.
    Technologies: Python, Pandas, MySQL, PySpark, Kafka, Cassandra, AWS, Altair, Jupyter
  • Data Science and Blockchain Integration Consultant

    2017 - 2017
    Tanktwo, Inc.
    • Architected a Blockchain-based solution for managing IoT devices and the data they generate.
    • Create a demo of a potential network using Hyperledger.
    • Simulated a private blockchain network in action using Python.
    • Helped present a demo to the VCs.
    • Conducted research on the optimal Blockchain implementation to suit business needs.
    Technologies: Python, Pandas, Hyperledger, AWS
  • Data Science Consultant

    2014 - 2017
    Hospital for Special Surgery
    • Analyzed biosignal data with a Python data suite (NumPy, Pandas, and SciPy).
    • Reverse-engineered an undocumented file format containing biosignal data.
    • Extracted data from an undocumented file format to CSVs.
    • Visualized biosignal data with Plotly.
    • Investigated using Higuchi Fractal Dimension of nerve conduction readings taken during surgery as a means of assessing potential damage.
    • Attempted to classify nerve conduction readings as indicating injury or anesthesia response using Scikit-learn.
    • Used Scikit-learn to classify nerve-stimulation trials. Did feature engineering, hyperparameter optimization using Grid Search and Random Search.
    • Looked at feature distribution of different types of nerve readings taken during surgery to discriminate injuries from healthy responses to anesthesia.
    Technologies: Python, NumPy, Pandas, SciPy, Plotly, Jupyter, PyEEG, Scikit-learn
  • Natural Language Processing Consultant

    2015 - 2015
    New York City Department of Administrative Services
    • Scraped PDFs with Python in order to help digitize the back catalog for a publication, The City Record.
    • Helped design a schema for entries (such as extracting addresses).
    • Created data cleaning regimens to standardize entries from over a hundred city agencies that all reported in different formats.
    • Used Python and NLTK to perform exploratory Natural Language Processing on a century-long corpus of publications.
    • Worked to integrate this pipeline into MS Access.
    Technologies: NLTK, Python
  • Integration and Development Consultant

    2013 - 2014
    Broadband Technologies Group
    • Provided computer vision-based assistance for digitizing video archives.
    • Used OpenCV and Python to tag damaged video areas.
    • Implemented Python to automatically fix certain types of damaged videoes.
    • Helped architect an Android application to deliver simultaneous subtitles for live performances.
    • Prepared presentations with Jupyter.
    Technologies: Python, OpenCV
  • Research Assistant

    2008 - 2013
    Hunter College
    • Designed and validated a novel psychometric scale.
    • Analyzed survey data in SPSS.
    • Presented findings at research conferences.
    • Maintained relationships with the lab after graduation, eventually moving from data analysis to Python.
    • Worked on the publication of older data.
    Technologies: SPSS, Python, Pandas, SciPy
  • Summer Research Assistant

    2009 - 2010
    Yale School of Medicine
    • Designed and piloted a small study investigating psychopathic traits and behavior during an ultimatum game.
    • Analyzed GSR data.
    • Ran research participants through computer-based tasks in a presentation and DMDX.
    • Analyzed data from surveys and computer-based tasks.
    • Built and maintained a database of participants.
    Technologies: SPSS, Presentation, DMDX



  • Languages

    Python, SQL, Clojure, Rust
  • Libraries/APIs

    Pandas, Sklearn, NumPy, Scikit-learn, PySpark, Keras, TensorFlow
  • Paradigms

    Data Science, Functional Programming
  • Other

    Experimental Design, Time Series, Machine Learning, Predictive Modeling, Data Visualization, Data Analysis, Data Analytics, Statistics, Scikit-Learn, Pyspark, Deep Learning, Natural Language Processing (NLP), Mathematical Modeling, Data Engineering, Deep Neural Networks, Neural Networks, Data Engineer
  • Tools

    Jupyter, Git, AWS CLI, Amazon SageMaker
  • Frameworks

  • Platforms

    Linux, AWS EC2, AWS Lambda, Zeppelin, Apache Kafka
  • Storage

    AWS S3, AWS RDS, Cassandra, PostgreSQL, MySQL


  • Bachelor of Arts degree in Psychology
    2006 - 2012
    Hunter College - New York City, NYC, USA
  • Machine Learning Engineer Nanodegree

To view more profiles

Join Toptal
I really like this profile
Share it with others