Ryan is available for hire

Ryan Zotti

Verified Expert in Engineering

Machine Learning Developer

Location

San Francisco, CA, United States

Toptal Member Since

May 1, 2019

A data scientist and data engineer, Ryan can take projects from concept to deployment. He is a machine learning expert with a knack for impactful data visualizations. Ryan has collaborated with innovators in diverse fields, presented at tech conferences, and co-authored groundbreaking research papers in respected journals such as the Lancet. He is also the author of a popular autonomous vehicle project on GitHub.

Machine Learning Python 3 SQL Amazon EC2 Linux PostgreSQL Docker MacOS Hadoop Amazon S3 (AWS S3)Apache Spark Agile Software Development MySQL R TensorFlow

Portfolio

LSQ

SQL, Python, Docker, ECS, Amazon EC2, Amazon S3 (AWS S3), Microsoft SQL Server...

Capital One

Java, R, SQL, Python, Amazon EC2, Amazon S3 (AWS S3), Apache Kafka, Storm...

Experience

Machine Learning - 9 years Python 3 - 8 years SQL - 7 years Apache Spark - 5 years Docker - 4 years TensorFlow - 4 years OpenCV - 3 years

Availability

Part-time

Preferred Environment

Command-line Interface (CLI), DataGrip, PyCharm, MacOS, Linux

The most amazing...

...public project I’ve authored is a machine learning autonomous vehicle. The project is in GitHub's 99.6 percentile of popularity for Python repos worldwide.

Work Experience

Senior Data Scientist

2017 - PRESENT

LSQ

Brought in as the first data scientist hire of a leading US invoice finance (factoring) firm processing $3.2 billion in receivables in 2018.
Frequently challenged conventional wisdom with data-driven insights through detailed exploratory analysis. Saved the firm approximately $1 million by identifying material financial weaknesses in a key market initiative through analysis of client attrition, acquisition costs, and loss rate data. My recommendations led to the cancellation of the program and catalyzed significant process changes in marketing, underwriting, and account management.
Questioned the prevailing management view that post-recession financial risk is primarily driven by debtors. Determined that deleterious client behavior is a principal risk factor, and identified client CEO personal credit score as a leading predictor of client default. Advocated systematic tracking and evaluation of the client’s ability to pay before extending funds in excess of invoice collateral.
Identified target industries with the most attractive economics, market segments where LSQ could increase prices without affecting client attrition, and incentives to increase client longevity and lifetime value.
Reduced risk, and streamlined operations with machine learning models, and advanced feature engineering. Led an initiative to optimize outbound communication and improve data tracking to reduce invoice delinquency and increase collection rates. Enhanced data-driven decisions by building an invoice risk model to predict non-payment and inform funding choices.
Identified anomalous client behavior patterns signaling increased risk. Calculated debtor-centric-days to pay standard deviations above the norm to detect non-payment risk many weeks earlier than the legacy process, particularly when extreme early-payers start to deviate from past behavior although not yet delinquent.
Built a framework to fully automate machine learning and model training. Applied evolutionary algorithms to optimize model tuning parameters (i.e., number of trees and learning rate), as well as model input selection. Leveraged inexpensive elastic compute power on AWS to train tens of thousands of candidate models.
Transformed the company’s data infrastructure. Vastly improved data quality through relentless data cleaning initiatives. Cached frequently used metrics and model features in historical daily snapshot tables, dramatically reducing time to prototype new data projects. Enabled the department to quickly scale headcount by making data more intuitive and accessible, shortening onboarding time for new employees by six months.
Automated Extract Transform and Load (ETL) processes on AWS with Apache Airflow. Migrated compute and data-intensive tasks to large, elastically-sized EMR Spark clusters on AWS using the EC2 Spot Market for a three to five times speedup at a fraction of the cost of dedicated hardware.
Led multi-phase department-wide training to enforce fundamental software best practices, including source control (GitHub), unit testing, and containerization (Docker).

Technologies: SQL, Python, Docker, ECS, Amazon EC2, Amazon S3 (AWS S3), Microsoft SQL Server, MySQL, PostgreSQL, Redshift, Matplotlib, NumPy, Pandas, Scikit-learn, XGBoost, Amazon Web Services (AWS), Spark

Principal Data Engineer

2012 - 2017

Capital One

Built a targeted online acquisition platform using AWS, H2O, and Spark to grow the customer base and increase revenue. Presented Building Real-time Targeting Capabilities on AWS at the H2O Open Tour 2016 in New York. Discussed H2O and Apache Spark-based machine learning techniques for improving customer acquisition rates on Capital One's website. Explained how to build models at scale on AWS and how to conduct automated daily model refits and model deployments to reactive production systems.
Built a fully automated cloud infrastructure for a large-scale credit risk model hosted in Amazon's Elastic Compute Cloud (EC2) service using Cloud Formation and Chef. Utilized AWS services and technologies such as Simple Storage Service (S3), Relational Database Service (RDS), Auto Scaling, and Elastic Load Balancing.
Built and maintained massive-scale machine learning algorithms in production for operational credit risk minimization and marketing on a 240-node Hadoop cluster leveraging half a petabyte of data. Primarily used Python Hadoop Streaming (Map Reduce), Hive, Parquet, and Avro. Tuned performance of Hadoop jobs using YARN and various tools provided by Cloudera's CDH distribution.
Experimented with various big data technologies including Apache Spark, Apache Storm, and Apache Kafka. Taught numerous legacy internal Capital One teams how to use the latest Hadoop technologies.
Maintained and productionized more than 50 machine learning predictive models for a diverse range of use cases such as credit risk, marketing, fraud, and customer experience. Worked across multiple lines of business: credit card, auto finance, mortgage, and retail bank.

Technologies: Java, R, SQL, Python, Amazon EC2, Amazon S3 (AWS S3), Apache Kafka, Storm, Hadoop, Amazon Web Services (AWS), Spark

Experience

Self Driving Car

https://github.com/RyanZotti/Self-Driving-Car

In 2016 I built a miniaturized self-driving car (https://github.com/RyanZotti/Self-Driving-Car) using Python, machine learning (TensorFlow), OpenCV, and a Raspberry Pi. GitHub shows my repo is in the 99.6 percentile of popularity for Python projects worldwide when measured by stars (the number of times it is marked as a favorite by other GitHub users). I also gave a 90-minute tech talk on the subject, and the video recording on YouTube has nearly 60,000 views: https://www.youtube.com/watch?v=QbbOxrR0zdA&t=10s

Lancet Publication

Working with Yale neuroscience Ph.D. candidate Adam Chekroud (now assistant professor adjunct, Yale School of Medicine), I used machine learning techniques to identify which patients would respond favorably to various antidepressants. The Lancet is one of the world's oldest, most prestigious, and best known general medical journals.

Skills

Languages

Python 3, SQL, R, Python, Java

Libraries/APIs

XGBoost, Scikit-learn, Matplotlib, TensorFlow, OpenCV, Pandas, NumPy, Flask-RESTful, Ggplot2, Caret

Platforms

Amazon EC2, Docker, Linux, MacOS, Amazon Web Services (AWS), Apache Kafka, RStudio, Databricks

Storage

PostgreSQL, Amazon S3 (AWS S3), Redshift, MySQL, Microsoft SQL Server, Neo4j

Other

Machine Learning, Command-line Interface (CLI), ECS

Frameworks

Apache Spark, Hadoop, Spark, Storm

Tools

Amazon Elastic MapReduce (EMR), Apache Airflow, PyCharm, DataGrip, AWS CloudFormation, Amazon Elastic Container Service (Amazon ECS), Seaborn

Paradigms

Agile Software Development, Pair Programming, Unit Testing

Education

2008 - 2012

Bachelor's Degree in Business Management

University of Illinois - Urbana-Champaign, Illinois

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring