Alex Risman, Developer in Chicago, United States
Alex is available for hire
Hire Alex

Alex Risman

Verified Expert  in Engineering

Software Developer

Location
Chicago, United States
Toptal Member Since
September 6, 2018

In Alex's current role, he uses artificial intelligence to automatically detect diseases in 2D and 3D medical images along with some algorithms to achieve superhuman performance. Previously, he worked as a data scientist at an eCommerce company, where he built and deployed a deep-learning-based product search engine.

Portfolio

Realize
Amazon Web Services (AWS), Spark, DICOM, Python, Docker, Kubernetes, Keras...
Virtual/Augmented Reality Consulting Firm
C++, Python, PyTorch, Torch, OpenCV, Amazon SageMaker, Object Detection...
Stealth Healthcare Startup
Python, Databricks, XGBoost, NumPy, Pandas, JSON API, JSON, Confluence...

Experience

Availability

Part-time

Preferred Environment

Unix, Git, Jupyter Notebook

The most amazing...

...software I've developed is a tool for detecting 14 different diseases in chest X-ray

Work Experience

CTO

2016 - PRESENT
Realize
  • Earned multiple US patents for combining convolutional and recurrent neural networks to automatically detect diseases in CT scans and MRIs, the current state-of-the-art.
  • Developed an AI system for the world's largest radiology group, deployed as a containerized RESTful API, including an NLP system for extracting diagnoses from radiology reports with over 95% accuracy.
  • Created an algorithm that detects tuberculosis in chest X-rays with world-class accuracy (greater than 0.9 AUC), as determined by multiple third-party evaluations.
  • Assembled and led the founding team, including a marketer and an MD/Ph.D oncologist, as the CEO until our 2018 merger with leading African radiology IT firm. This merger occurred with a greater than 30 times our paid-in capital valuation.
  • Advised governmental and NGO officials on AI healthcare applications.
Technologies: Amazon Web Services (AWS), Spark, DICOM, Python, Docker, Kubernetes, Keras, PyTorch, Matplotlib, Seaborn, Image Recognition, TensorFlow, APIs, REST APIs, RESTful Development, Twisted, Open Data, OpenCV, Architecture, Integration, DevOps, Neural Networks, CTO, Microservices

Computer Vision Developer

2021 - 2022
Virtual/Augmented Reality Consulting Firm
  • Developed a "universal green screen" application to remove a moving background in real-time from behind a human figure to superimpose a video of just that human into a virtual environment (e.g., a video game).
  • Prototyped new features using Python and ported them to C++ and OpenCV for real-time performance.
  • Worked with various stakeholders to ensure an appropriate balance of segmentation quality, speed, and hardware usage.
Technologies: C++, Python, PyTorch, Torch, OpenCV, Amazon SageMaker, Object Detection, Computer Vision Algorithms, Computer Vision

Head of Data and AI

2021 - 2022
Stealth Healthcare Startup
  • Led a team of data scientists, data engineers, and machine learning engineers in developing systems to detect potential errors in medical insurance claims.
  • Negotiated data purchasing and licensing agreements.
  • Drove the company's decision-making around third-party software vendor selection and buy versus build discussions.
Technologies: Python, Databricks, XGBoost, NumPy, Pandas, JSON API, JSON, Confluence, Analytics, Business Intelligence (BI), Software Design, API Integration, Machine Learning Operations (MLOps), Software Architecture

Interim CTO

2021 - 2021
Blockchain Startup (via Toptal)
  • Led the engineering team in developing a React and Django app, enabling users to create, customize, and share infographics about the crypto market based on a curated set of data sources.
  • Defined product requirements and oversaw their execution.
  • Conducted first-hand market research at the 2021 Miami Bitcoin conference.
Technologies: React, Django, Amazon Web Services (AWS), REST APIs, Leadership, Product Management, IT Project Management, CTO

Python Developer

2020 - 2020
Confidential (MBB Consulting Firm
  • Productionized a machine learning prototype my client had built for theirs (a Fortune 500 pharmaceutical firm), reducing the codebase by thousands of lines, adding modularity, and vastly simplifying the logic while preserving the original output.
  • Enabled the deployment of new marketing campaigns by configuration rather than a code change.
  • Wrote Unit Tests for all refactored modules and an automatic end-to-end test for the entire system.
Technologies: Python, Pytest, Unit Testing, Refactoring, NumPy, Pandas, Azure, Tableau, Azure Data Lake

Data Engineering Architect

2018 - 2020
Confidential (Major US Pharmacy Chain,
  • Created systems, including deep chains of complex Spark SQL queries and machine learning models, to identify gaps in more than 100 million patients' vaccination histories based on CDC guidelines and generate personalized vaccine recommendations daily.
  • Developed a PySpark method for adding a unique 18-digit ID to a DataFrame without merging to a single partition, removing a department-wide bottleneck.
  • Scaled the existing system for notifying patients their prescriptions were ready from a single node, on-premises SQL, to distributed Spark SQL in Azure.
  • Conducted hiring of data scientists and data engineers.
Technologies: Databricks, Spark, PySpark, Spark SQL, Spark ML, Apache Airflow, SQL, Jira, Agile, Python, Azure, NumPy, Pandas, Scikit-learn, Unit Testing, Big Data, Big Data Architecture, Data Pipelines, Architecture, Integration, Databases, CSV, Legacy Code, Legacy Software, Data Analysis, Data Analytics, Data

Spark Consultant

2018 - 2018
FLYR
  • Optimized existing YARN-managed PySpark jobs running on GCP, cutting runtimes and costs by over 80%.
  • Trained client staff in best practices for Spark and data engineering.
  • Used Agile methodology to manage my work, including daily scrums and sprint planning with Jira.
Technologies: Google Cloud Platform (GCP), Google Cloud Dataproc, Spark, PySpark, Spark ML, BigQuery, Kubernetes, YARN, Agile, Jira

Data Scientist

2013 - 2017
McMaster-Carr Supply
  • Conceived and developed a deep-learning-based eCommerce search engine that trained NLP models using recurrent neural networks on millions of customer searches, increasing the probability a given search would end with an "add to order" by 1.07%.
  • Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers over two years before and after activation.
  • Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
  • Developed a machine learning model to decide if non-catalog products sourced for customers required hazard handling based on supplier description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
  • Designed the above machine learning model in Python using Scikit-learn and Pandas.
  • Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
  • Prototyped the above machine learning model in R using Random Forest; the implementation is pending production.
Technologies: Theano, Keras, Scikit-learn, NumPy, Pandas, Python, C#.NET, Neo4j, Splunk, Time Series, Time Series Analysis, Forecasting, Supply Chain Management, Supply Chain Optimization, Recommendation Systems, C#, Cypher, .NET, eCommerce, HTML, Elasticsearch, Solr, Scalability, Search Engines, Data Visualization

Anomaly Detection in Volumetric Images Using Sequential Convolutional and Recurrent Neural Networks

https://patents.google.com/patent/US10347010B2/en
I created what is now the state-of-the-art deep learning architecture for analyzing CT scans. Computer-implemented methods and apparatuses for anomaly detection in volumetric images are provided. A two-dimensional convolutional neural network (CNN) is used to encode slices within a volumetric image, such as a CT scan. The CNN may be trained using an output layer that is subsequently omitted during the use of the CNN as an encoder. The CNN encoder output is applied to a recurrent neural network (RNN), such as a long short-term memory network. The RNN may output various indications of the presence, probability, and/or location of anomalies within the volumetric image.

CT Lung Nodule Detection

https://www.youtube.com/watch?v=X_8bpuL0G3Q
I developed an artificial intelligence software to automatically detect lung nodules which are often missed by radiologists and can portend cancer in CT scans. The link attached leads to a video demonstrating the AI results and the integration into the radiology workflow.

Scaling Up Music | Master's Project at UC Berkeley

I deployed and used a Spark cluster to predict the genre of songs in The Echo Nest’s Million Song Dataset using data on volume, tempo, pitch, and “danceability”. I also wrote the code to train models using Spark’s MLlib.

Evaluation of a Multiple Open-source Deep Learning Models for Detecting COVID-19 On Chest X-rays

https://pubmed.ncbi.nlm.nih.gov/35005058/
I was the primary investigator and first author of an international study on using AI to detect COVID-19, published in a peer-reviewed medical journal.

Abstract
Purpose: In the context of the COVID-19 pandemic, rapid triage of cases and exclusion of other pathologies with artificial intelligence (AI) can assist over-stretched radiology departments.
We aim to validate three open-source AI models on an external test set.

Approach:
We tested three open-source deep learning models, COVID-Net, COVIDNet-S-GEO, and CheXNet, for their ability to detect COVID-19 pneumonia and to determine its severity using 129 chest x-rays from two different vendors. Results: All three models detected COVID-19 pneumonia. Only the COVID-19 Net-S-GEO and CheXNet models performed well on severity scoring; COVID-Net only performed well at either task on images taken with a Philips machine (AUC 0.735) and not an Agfa machine (AUC 0.598).

Conclusions:
Chest x-ray triage using existing machine learning models for COVID-19 pneumonia can be successfully implemented using open-source AI models. Evaluation of the model using local x-ray machines and protocols is highly recommended before implementation to avoid vendor or protocol-dependent bias.

Capturing and Analyzing Sentiment Data of SEC 10K Filing’s Management’s Discussion and Analysis

https://s3-us-west-2.amazonaws.com/riteshsoni/papers/MDA_Analysis.pdf
Abstract: Securities and Exchange Commission (SEC) regulates US financial markets. One requirement for securities market participants is to provide disclosure to the public. SEC’s EDGAR (Electronic Data Gathering,
Analysis and Retrieval) database aggregates and disseminates the public disclosure data. There are more than 100 types of forms that market participants fill out and electronically file with EDGAR. The project is focused on a very important filing type, the 10­K. Publicly traded companies disclose comprehensive information about the company operations regularly. This project demonstrates the data collection, manipulation, and analysis (sentiment based on NLP) of the 10­K filings leveraging the Hadoop data processing framework for rapid data analysis.

Languages

Python, R, SQL, C#, Cypher, C++, HTML, C#.NET, JavaScript, Scala

Frameworks

Spark, Apache Spark, .NET, Hadoop, YARN, Twisted, Django

Libraries/APIs

PyTorch, OpenCV, PySpark, Pandas, Scikit-learn, TensorFlow, Keras, NumPy, SciPy, Theano, MLlib, Spark ML, Matplotlib, React, REST APIs, XGBoost, JSON API

Tools

Spark SQL, Apache Airflow, Jira, Tableau, Jupyter, Git, Google Cloud Dataproc, BigQuery, Pytest, Seaborn, Splunk, Collada, CAD, Amazon SageMaker, Confluence, Solr

Paradigms

Data Science, ETL, DevOps, RESTful Development, Distributed Computing, Agile, Parallel Computing, Unit Testing, Refactoring, Business Intelligence (BI), Microservices

Platforms

Databricks, Amazon Web Services (AWS), Amazon EC2, Kubernetes, Docker, NVIDIA CUDA, Ubuntu, Linux, Azure, Jupyter Notebook, Unix, Google Cloud Platform (GCP), Apache Kafka

Storage

Amazon S3 (AWS S3), Neo4j, PostgreSQL, Data Pipelines, Cassandra, Google Cloud, Databases, JSON, Elasticsearch

Other

APIs, Image Analysis, 3D Image Processing, Image Processing, Machine Learning, Data Engineering, Big Data, Big Data Architecture, Architecture, Integration, Neural Networks, Time Series, Time Series Analysis, Artificial Intelligence (AI), Deep Learning, Random Forests, Computer Vision, Natural Language Processing (NLP), Convolutional Neural Networks, Recurrent Neural Networks (RNN), GPU Computing, Graphics Processing Unit (GPU), Software Design, API Integration, Recommendation Systems, Data Analysis, Data Analytics, Data, eCommerce, Machine Learning Operations (MLOps), Statistical Modeling, Mathematical Modeling, Statistical Methods, Object Detection, Algorithms, Computer Vision Algorithms, GPT, Generative Pre-trained Transformers (GPT), Data Modeling, Object Tracking, OCR, Video Analysis, Legacy Code, Legacy Software, DICOM, Economics, Image Recognition, Apache Cassandra, Open Data, Forecasting, 3D CAD, Leadership, Product Management, IT Project Management, Torch, CTO, Azure Data Lake, CSV, Supply Chain Management, Supply Chain Optimization, Technical Writing, Writing & Editing, Statistics, Point Clouds, Analytics, Generative Adversarial Networks (GANs), Software Architecture, Scalability, Search Engines, Data Visualization

2014 - 2015

Master's Degree in Information and Data Science

University of California, Berkeley - Berkeley, CA, USA

2009 - 2013

Bachelor's Degree in Mathematical Methods in the Social Sciences, Economics

Northwestern University - Evanston, IL, USA