Matthew Alhonte, Statistics Developer in New York, NY, United States
Matthew Alhonte

Statistics Developer in New York, NY, United States

Member since June 20, 2018
Matt has officially worked as a Python-based data scientist for the past six years; however, he's spent the last ten at the intersection of stats and programming (before the term data scientist had caught on). He combines strong technical skills with a rigorous background in experiment design and statistical inference. More recently, he's been focusing on machine learning, including some natural language processing and computer vision.
Matthew is now available for hire

Portfolio

Experience

  • Experimental Design, 11 years
  • Data Visualization, 11 years
  • Statistics, 11 years
  • Python, 6 years
  • Pandas, 5 years
  • Machine Learning, 5 years
  • SQL, 5 years
  • Functional Programming, 4 years

Location

New York, NY, United States

Availability

Full-time

Preferred Environment

Jupyter, VS Code, Spacemacs, Git, PyCharm

The most amazing...

...thing I've done is to reverse-engineer an undocumented file format containing electrophysiology readings.

Employment

  • Data Scientist

    2018 - 2019
    The University of Colorado — Office of Data Analytics
    • Performed statistical analyses and modeling in support of student success.
    • Created and presented findings and visualizations to high-level administrators with Jupyter and Zeppelin.
    • Developed a Monte Carlo simulation-based model to predict semester-by-semester student retention.
    • Built a Bayesian model of reoffense after student misconduct.
    • Modeled the effects of different kinds of Financial Aid with XGBoost.
    • Created a model to predict student GPAs with Scikit-learn and Keras.
    Technologies: Python, Pandas, Scikit-learn, PySpark, Keras, Jupyter, Zeppelin, Oracle Database
  • Data Engineer

    2017 - 2018
    NOMI Beauty
    • Designed and supported ETL from Couchbase to MySQL using Python.
    • Architected a big data pipeline with Spark, Kafka, and Cassandra.
    • Built data dashboards in Tableau for the operations team.
    • Designed an ETL for survey data from Typeform's API into MySQL.
    • Created reports in Jupyter notebooks with data visualizations in Python with Altair and Seaborn.
    • Designed and implemented a database schema in MySQL.
    Technologies: Python, Pandas, MySQL, PySpark, Kafka, Cassandra, AWS, Altair, Jupyter
  • Data Science and Blockchain Integration Consultant

    2017 - 2017
    Tanktwo, Inc.
    • Architected a Blockchain-based solution for managing IoT devices and the data they generate.
    • Create a demo of a potential network using Hyperledger.
    • Simulated a private blockchain network in action using Python.
    • Helped present a demo to the VCs.
    • Conducted research on the optimal Blockchain implementation to suit business needs.
    Technologies: Python, Pandas, Hyperledger, AWS
  • Data Science Consultant

    2014 - 2017
    Hospital for Special Surgery
    • Analyzed biosignal data with a Python data suite (NumPy, Pandas, and SciPy).
    • Reverse-engineered an undocumented file format containing biosignal data.
    • Extracted data from an undocumented file format to CSVs.
    • Visualized biosignal data with Plotly.
    • Investigated using Higuchi Fractal Dimension of nerve conduction readings taken during surgery as a means of assessing potential damage.
    • Attempted to classify nerve conduction readings as indicating injury or anesthesia response using Scikit-learn.
    • Used Scikit-learn to classify nerve-stimulation trials. Did feature engineering, hyperparameter optimization using Grid Search and Random Search.
    • Looked at feature distribution of different types of nerve readings taken during surgery to discriminate injuries from healthy responses to anesthesia.
    Technologies: Python, NumPy, Pandas, SciPy, Plotly, Jupyter, PyEEG, Scikit-learn
  • Natural Language Processing Consultant

    2015 - 2015
    New York City Department of Administrative Services
    • Scraped PDFs with Python in order to help digitize the back catalog for a publication, The City Record.
    • Helped design a schema for entries (such as extracting addresses).
    • Created data cleaning regimens to standardize entries from over a hundred city agencies that all reported in different formats.
    • Used Python and NLTK to perform exploratory Natural Language Processing on a century-long corpus of publications.
    • Worked to integrate this pipeline into MS Access.
    Technologies: NLTK, Python
  • Integration and Development Consultant

    2013 - 2014
    Broadband Technologies Group
    • Provided computer vision-based assistance for digitizing video archives.
    • Used OpenCV and Python to tag damaged video areas.
    • Implemented Python to automatically fix certain types of damaged videoes.
    • Helped architect an Android application to deliver simultaneous subtitles for live performances.
    • Prepared presentations with Jupyter.
    Technologies: Python, OpenCV
  • Research Assistant

    2008 - 2013
    Hunter College
    • Designed and validated a novel psychometric scale.
    • Analyzed survey data in SPSS.
    • Presented findings at research conferences.
    • Maintained relationships with the lab after graduation, eventually moving from data analysis to Python.
    • Worked on the publication of older data.
    Technologies: SPSS, Python, Pandas, SciPy
  • Summer Research Assistant

    2009 - 2010
    Yale School of Medicine
    • Designed and piloted a small study investigating psychopathic traits and behavior during an ultimatum game.
    • Analyzed GSR data.
    • Ran research participants through computer-based tasks in a presentation and DMDX.
    • Analyzed data from surveys and computer-based tasks.
    • Built and maintained a database of participants.
    Technologies: SPSS, Presentation, DMDX

Experience

Skills

  • Languages

    Python, SQL, Clojure, Rust
  • Libraries/APIs

    Pandas, Sklearn, NumPy, Scikit-learn, PySpark, Keras, TensorFlow
  • Paradigms

    Data Science, Functional Programming
  • Other

    Experimental Design, Time Series, Machine Learning, Predictive Modeling, Data Visualization, Data Analysis, Data Analytics, Statistics, Scikit-Learn, Pyspark, Deep Learning, Natural Language Processing (NLP), Mathematical Modeling, Data Engineering, Deep Neural Networks, Neural Networks, Data Engineer
  • Tools

    Jupyter, Git, AWS CLI, Amazon SageMaker
  • Frameworks

    ClojureScript
  • Platforms

    Linux, AWS EC2, AWS Lambda, Zeppelin, Apache Kafka
  • Storage

    AWS S3, AWS RDS, Cassandra, PostgreSQL, MySQL

Education

  • Bachelor of Arts degree in Psychology
    2006 - 2012
    Hunter College - New York City, NYC, USA
Certifications
  • Machine Learning Engineer Nanodegree
    JANUARY 2020 - PRESENT
    Udacity

To view more profiles

Join Toptal
I really like this profile
Share it with others