Ryan Zotti, Machine Learning Developer in San Francisco, CA, United States
Ryan Zotti

Machine Learning Developer in San Francisco, CA, United States

Member since April 4, 2019
A data scientist and data engineer, Ryan can take projects from concept to deployment. He is a machine learning expert with a knack for impactful data visualizations. Ryan has collaborated with innovators in diverse fields, presented at tech conferences, and co-authored groundbreaking research papers in respected journals such as the Lancet. He is also the author of a popular autonomous vehicle project on GitHub.
Ryan is now available for hire


  • LSQ
    SQL, Python, Docker, ECS, AWS EC2, AWS S3, Microsoft SQL Server, MySQL...
  • Capital One
    Java, R, SQL, Python, AWS EC2, AWS S3, Apache Kafka, Storm, Hadoop...



San Francisco, CA, United States



Preferred Environment

Command Line Interface (CLI), DataGrip, PyCharm, MacOS, Linux

The most amazing...

...public project I’ve authored is a machine learning autonomous vehicle. The project is in GitHub's 99.6 percentile of popularity for Python repos worldwide.


  • Senior Data Scientist

    2017 - PRESENT
    • Brought in as the first data scientist hire of a leading US invoice finance (factoring) firm processing $3.2 billion in receivables in 2018.
    • Frequently challenged conventional wisdom with data-driven insights through detailed exploratory analysis. Saved the firm approximately $1 million by identifying material financial weaknesses in a key market initiative through analysis of client attrition, acquisition costs, and loss rate data. My recommendations led to the cancellation of the program and catalyzed significant process changes in marketing, underwriting, and account management.
    • Questioned the prevailing management view that post-recession financial risk is primarily driven by debtors. Determined that deleterious client behavior is a principal risk factor, and identified client CEO personal credit score as a leading predictor of client default. Advocated systematic tracking and evaluation of the client’s ability to pay before extending funds in excess of invoice collateral.
    • Identified target industries with the most attractive economics, market segments where LSQ could increase prices without affecting client attrition, and incentives to increase client longevity and lifetime value.
    • Reduced risk, and streamlined operations with machine learning models, and advanced feature engineering. Led an initiative to optimize outbound communication and improve data tracking to reduce invoice delinquency and increase collection rates. Enhanced data-driven decisions by building an invoice risk model to predict non-payment and inform funding choices.
    • Identified anomalous client behavior patterns signaling increased risk. Calculated debtor-centric-days to pay standard deviations above the norm to detect non-payment risk many weeks earlier than the legacy process, particularly when extreme early-payers start to deviate from past behavior although not yet delinquent.
    • Built a framework to fully automate machine learning and model training. Applied evolutionary algorithms to optimize model tuning parameters (i.e., number of trees and learning rate), as well as model input selection. Leveraged inexpensive elastic compute power on AWS to train tens of thousands of candidate models.
    • Transformed the company’s data infrastructure. Vastly improved data quality through relentless data cleaning initiatives. Cached frequently used metrics and model features in historical daily snapshot tables, dramatically reducing time to prototype new data projects. Enabled the department to quickly scale headcount by making data more intuitive and accessible, shortening onboarding time for new employees by six months.
    • Automated Extract Transform and Load (ETL) processes on AWS with Apache Airflow. Migrated compute and data-intensive tasks to large, elastically-sized EMR Spark clusters on AWS using the EC2 Spot Market for a three to five times speedup at a fraction of the cost of dedicated hardware.
    • Led multi-phase department-wide training to enforce fundamental software best practices, including source control (GitHub), unit testing, and containerization (Docker).
    Technologies: SQL, Python, Docker, ECS, AWS EC2, AWS S3, Microsoft SQL Server, MySQL, PostgreSQL, Redshift, Matplotlib, NumPy, Pandas, Scikit-learn, XGBoost, Amazon Web Services (AWS), Spark
  • Principal Data Engineer

    2012 - 2017
    Capital One
    • Built a targeted online acquisition platform using AWS, H2O, and Spark to grow the customer base and increase revenue. Presented Building Real-time Targeting Capabilities on AWS at the H2O Open Tour 2016 in New York. Discussed H2O and Apache Spark-based machine learning techniques for improving customer acquisition rates on Capital One's website. Explained how to build models at scale on AWS and how to conduct automated daily model refits and model deployments to reactive production systems.
    • Built a fully automated cloud infrastructure for a large-scale credit risk model hosted in Amazon's Elastic Compute Cloud (EC2) service using Cloud Formation and Chef. Utilized AWS services and technologies such as Simple Storage Service (S3), Relational Database Service (RDS), Auto Scaling, and Elastic Load Balancing.
    • Built and maintained massive-scale machine learning algorithms in production for operational credit risk minimization and marketing on a 240-node Hadoop cluster leveraging half a petabyte of data. Primarily used Python Hadoop Streaming (Map Reduce), Hive, Parquet, and Avro. Tuned performance of Hadoop jobs using YARN and various tools provided by Cloudera's CDH distribution.
    • Experimented with various big data technologies including Apache Spark, Apache Storm, and Apache Kafka. Taught numerous legacy internal Capital One teams how to use the latest Hadoop technologies.
    • Maintained and productionized more than 50 machine learning predictive models for a diverse range of use cases such as credit risk, marketing, fraud, and customer experience. Worked across multiple lines of business: credit card, auto finance, mortgage, and retail bank.
    Technologies: Java, R, SQL, Python, AWS EC2, AWS S3, Apache Kafka, Storm, Hadoop, Amazon Web Services (AWS), Spark


  • Self Driving Car (Development)

    In 2016 I built a miniaturized self-driving car (https://github.com/RyanZotti/Self-Driving-Car) using Python, machine learning (TensorFlow), OpenCV, and a Raspberry Pi. GitHub shows my repo is in the 99.6 percentile of popularity for Python projects worldwide when measured by stars (the number of times it is marked as a favorite by other GitHub users). I also gave a 90-minute tech talk on the subject, and the video recording on YouTube has nearly 60,000 views: https://www.youtube.com/watch?v=QbbOxrR0zdA&t=10s

  • Lancet Publication (Development)

    Working with Yale neuroscience Ph.D. candidate Adam Chekroud (now assistant professor adjunct, Yale School of Medicine), I used machine learning techniques to identify which patients would respond favorably to various antidepressants. The Lancet is one of the world's oldest, most prestigious, and best known general medical journals.


  • Languages

    Python 3, SQL, R, Python, Java
  • Libraries/APIs

    XGBoost, Scikit-learn, Matplotlib, TensorFlow, OpenCV, Pandas, NumPy, Flask-RESTful, Ggplot2, Caret
  • Platforms

    AWS EC2, Docker, Linux, MacOS, Amazon Web Services (AWS), Apache Kafka, RStudio, Databricks
  • Storage

    PostgreSQL, AWS S3, Redshift, MySQL, Microsoft SQL Server, Neo4j
  • Other

    Machine Learning, Command Line Interface (CLI), ECS
  • Frameworks

    Apache Spark, Hadoop, AWS EMR, Spark, Storm
  • Tools

    Apache Airflow, PyCharm, DataGrip, AWS CloudFormation, AWS ECS, Seaborn
  • Paradigms

    Agile Software Development, Pair Programming, Unit Testing


  • Bachelor's degree in Business Management
    2008 - 2012
    University of Illinois - Urbana-Champaign, Illinois

To view more profiles

Join Toptal
Share it with others