Alex Risman, Software Developer in Chicago, United States
Alex Risman

Software Developer in Chicago, United States

Member since September 6, 2018
In Alex's current role, he uses artificial intelligence to automatically detect diseases in 2D and 3D medical images along with some algorithms to achieve superhuman performance. Previously, he worked as a data scientist at an eCommerce company, where he built and deployed a deep-learning-based product search engine.
Alex is now available for hire

Portfolio

Experience

Location

Chicago, United States

Availability

Full-time

Preferred Environment

Unix, Git, Jupyter Notebook

The most amazing...

...software I've developed is a tool for detecting 14 different diseases in chest X-ray

Employment

  • CTO

    2016 - PRESENT
    Realize
    • Earned multiple US patents for combining convolutional and recurrent neural networks to automatically detect diseases in CT scans and MRIs, the current state-of-the-art.
    • Developed an AI system for the world's largest radiology group, deployed as a containerized RESTful API, including an NLP system for extracting diagnoses from radiology reports with over 95% accuracy.
    • Created an algorithm that detects tuberculosis in chest X-rays with world-class accuracy (greater than 0.9 AUC), as determined by multiple third-party evaluations.
    • Assembled and led the founding team, including a marketer and an MD/Ph.D oncologist, as the CEO until our 2018 merger with leading African radiology IT firm. This merger occurred with a greater than 30 times our paid-in capital valuation.
    • Advised governmental and NGO officials on AI healthcare applications.
    Technologies: Amazon Web Services (AWS), AWS, Spark, DICOM, Python, Docker, Kubernetes, Keras, PyTorch, Matplotlib, Seaborn, Image Recognition, TensorFlow, APIs, RESTful Development, RESTful APIs, Twisted, Open Data, OpenCV, Architecture, Integration, DevOps, Neural Networks, CTO, Microservices
  • Computer Vision Developer

    2021 - 2022
    Virtual/Augmented Reality Consulting Firm
    • Developed a "universal green screen" application to remove a moving background in real-time from behind a human figure to superimpose a video of just that human into a virtual environment (e.g., a video game).
    • Prototyped new features using Python and ported them to C++ and OpenCV for real-time performance.
    • Worked with various stakeholders to ensure an appropriate balance of segmentation quality, speed, and hardware usage.
    Technologies: C++, Python, PyTorch, Torch, OpenCV, Amazon SageMaker, Object Detection, Computer Vision Algorithms, Computer Vision
  • Head of Data and AI

    2021 - 2022
    Stealth Healthcare Startup
    • Led a team of data scientists, data engineers, and machine learning engineers in developing systems to detect potential errors in medical insurance claims.
    • Negotiated data purchasing and licensing agreements.
    • Drove the company's decision-making around third-party software vendor selection and buy versus build discussions.
    Technologies: Python, Databricks, XGBoost, NumPy, Pandas, JSON API, JSON, Confluence, Analytics, Business Intelligence (BI), Software Design, API Integration, Machine Learning Operations (MLOps), Software Architecture
  • Interim CTO

    2021 - 2021
    Blockchain Startup (via Toptal)
    • Led the engineering team in developing a React and Django app, enabling users to create, customize, and share infographics about the crypto market based on a curated set of data sources.
    • Defined product requirements and oversaw their execution.
    • Conducted first-hand market research at the 2021 Miami Bitcoin conference.
    Technologies: React, Django, AWS, REST APIs, Leadership, Product Management, IT Project Management, CTO
  • Python Developer

    2020 - 2020
    Confidential (MBB Consulting Firm via Toptal)
    • Productionized a machine learning prototype my client had built for theirs (a Fortune 500 pharmaceutical firm), reducing the codebase by thousands of lines, adding modularity, and vastly simplifying the logic while preserving the original output.
    • Enabled the deployment of new marketing campaigns by configuration rather than a code change.
    • Wrote Unit Tests for all refactored modules and an automatic end-to-end test for the entire system.
    Technologies: Python, Pytest, Unit Testing, Refactoring, NumPy, Pandas, Azure, Tableau, Azure Data Lake
  • Data Engineering Architect

    2018 - 2020
    Confidential (Major US Pharmacy Chain, via Toptal)
    • Created systems, including deep chains of complex Spark SQL queries and machine learning models, to identify gaps in more than 100 million patients' vaccination histories based on CDC guidelines and generate personalized vaccine recommendations daily.
    • Developed a PySpark method for adding a unique 18-digit ID to a DataFrame without merging to a single partition, removing a department-wide bottleneck.
    • Scaled the existing system for notifying patients their prescriptions were ready from a single node, on-premises SQL, to distributed Spark SQL in Azure.
    • Conducted hiring of data scientists and data engineers.
    Technologies: Databricks, Spark, PySpark, Spark SQL, Spark ML, Apache Airflow, SQL, Jira, Agile, Python, Azure, NumPy, Pandas, Scikit-learn, Unit Testing, Big Data, Big Data Architecture, Data Pipelines, Architecture, Integration, Databases, CSV, Legacy Code, Legacy Software, Data Analysis, Data Analytics, Data
  • Spark Consultant

    2018 - 2018
    FLYR
    • Optimized existing YARN-managed PySpark jobs running on GCP, cutting runtimes and costs by over 80%.
    • Trained client staff in best practices for Spark and data engineering.
    • Used Agile methodology to manage my work, including daily scrums and sprint planning with Jira.
    Technologies: Google Cloud Platform (GCP), Google Cloud Dataproc, Spark, PySpark, Spark ML, BigQuery, Kubernetes, YARN, Agile, Jira
  • Data Scientist

    2013 - 2017
    McMaster-Carr Supply
    • Conceived and developed a deep-learning-based eCommerce search engine that trained NLP models using recurrent neural networks on millions of customer searches, increasing the probability a given search would end with an "add to order" by 1.07%.
    • Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers over two years before and after activation.
    • Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
    • Developed a machine learning model to decide if non-catalog products sourced for customers required hazard handling based on supplier description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
    • Designed the above machine learning model in Python using Scikit-learn and Pandas.
    • Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
    • Prototyped the above machine learning model in R using Random Forest; the implementation is pending production.
    Technologies: Theano, Keras, Scikit-learn, NumPy, Pandas, Python, C#.NET, Neo4j, Splunk, Time Series, Time Series Analysis, Forecasting, Supply Chain Management, Supply Chain Optimization, Recommendation Systems, C#, Cypher, .NET, eCommerce, HTML, Elasticsearch, Solr, Scalability, Search Engines, Data Visualization

Experience

  • Anomaly Detection in Volumetric Images Using Sequential Convolutional and Recurrent Neural Networks
    https://patents.google.com/patent/US10347010B2/en

    I created what is now the state-of-the-art deep learning architecture for analyzing CT scans. Computer-implemented methods and apparatuses for anomaly detection in volumetric images are provided. A two-dimensional convolutional neural network (CNN) is used to encode slices within a volumetric image, such as a CT scan. The CNN may be trained using an output layer that is subsequently omitted during the use of the CNN as an encoder. The CNN encoder output is applied to a recurrent neural network (RNN), such as a long short-term memory network. The RNN may output various indications of the presence, probability, and/or location of anomalies within the volumetric image.

  • CT Lung Nodule Detection
    https://www.youtube.com/watch?v=X_8bpuL0G3Q

    I developed an artificial intelligence software to automatically detect lung nodules which are often missed by radiologists and can portend cancer in CT scans. The link attached leads to a video demonstrating the AI results and the integration into the radiology workflow.

  • Scaling Up Music | Master's Project at UC Berkeley

    I deployed and used a Spark cluster to predict the genre of songs in The Echo Nest’s Million Song Dataset using data on volume, tempo, pitch, and “danceability”. I also wrote the code to train models using Spark’s MLlib.

  • Evaluation of a Multiple Open-source Deep Learning Models for Detecting COVID-19 On Chest X-rays
    https://pubmed.ncbi.nlm.nih.gov/35005058/

    I was the primary investigator and first author of an international study on using AI to detect COVID-19, published in a peer-reviewed medical journal.

    Abstract
    Purpose: In the context of the COVID-19 pandemic, rapid triage of cases and exclusion of other pathologies with artificial intelligence (AI) can assist over-stretched radiology departments.
    We aim to validate three open-source AI models on an external test set.

    Approach:
    We tested three open-source deep learning models, COVID-Net, COVIDNet-S-GEO, and CheXNet, for their ability to detect COVID-19 pneumonia and to determine its severity using 129 chest x-rays from two different vendors. Results: All three models detected COVID-19 pneumonia. Only the COVID-19 Net-S-GEO and CheXNet models performed well on severity scoring; COVID-Net only performed well at either task on images taken with a Philips machine (AUC 0.735) and not an Agfa machine (AUC 0.598).

    Conclusions:
    Chest x-ray triage using existing machine learning models for COVID-19 pneumonia can be successfully implemented using open-source AI models. Evaluation of the model using local x-ray machines and protocols is highly recommended before implementation to avoid vendor or protocol-dependent bias.

  • Capturing and Analyzing Sentiment Data of SEC 10K Filing’s Management’s Discussion and Analysis
    https://s3-us-west-2.amazonaws.com/riteshsoni/papers/MDA_Analysis.pdf

    Abstract: Securities and Exchange Commission (SEC) regulates US financial markets. One requirement for securities market participants is to provide disclosure to the public. SEC’s EDGAR (Electronic Data Gathering,
    Analysis and Retrieval) database aggregates and disseminates the public disclosure data. There are more than 100 types of forms that market participants fill out and electronically file with EDGAR. The project is focused on a very important filing type, the 10­K. Publicly traded companies disclose comprehensive information about the company operations regularly. This project demonstrates the data collection, manipulation, and analysis (sentiment based on NLP) of the 10­K filings leveraging the Hadoop data processing framework for rapid data analysis.

Skills

  • Languages

    Python, R, SQL, C#, Cypher, C++, HTML, C#.NET, JavaScript, Scala
  • Frameworks

    Spark, Apache Spark, .NET, Hadoop, YARN, Twisted, Django
  • Libraries/APIs

    PyTorch, OpenCV, PySpark, Pandas, Scikit-learn, TensorFlow, Keras, NumPy, SciPy, Theano, MLlib, Spark ML, Matplotlib, React, REST APIs, XGBoost, JSON API
  • Tools

    Spark SQL, Apache Airflow, Jira, Tableau, Jupyter, Git, Google Cloud Dataproc, BigQuery, Pytest, Seaborn, Splunk, Collada, CAD, Amazon SageMaker, Confluence, Solr
  • Paradigms

    Data Science, ETL, DevOps, RESTful Development, Distributed Computing, Agile, Parallel Computing, Unit Testing, Refactoring, Business Intelligence (BI), Microservices
  • Platforms

    Databricks, Amazon Web Services (AWS), Amazon EC2 (Amazon Elastic Compute Cloud), Kubernetes, Docker, CUDA, Ubuntu, Linux, Azure, Jupyter Notebook, Unix, Google Cloud Platform (GCP), Apache Kafka
  • Storage

    Amazon S3 (AWS S3), Neo4j, PostgreSQL, Data Pipelines, Cassandra, Google Cloud, Databases, JSON, Elasticsearch
  • Other

    APIs, Image Analysis, 3D Image Processing, Image Processing, Machine Learning, Data Engineering, Big Data, Big Data Architecture, RESTful APIs, Architecture, Integration, Neural Networks, Time Series, Time Series Analysis, Artificial Intelligence (AI), Deep Learning, Random Forests, Computer Vision, Natural Language Processing (NLP), Convolutional Neural Networks, Recurrent Neural Networks, GPU Computing, Graphics Processing Unit (GPU), Software Design, API Integration, Recommendation Systems, Data Analysis, Data Analytics, Data, eCommerce, Machine Learning Operations (MLOps), Statistical Modeling, Mathematical Modeling, Statistical Methods, Object Detection, Algorithms, Computer Vision Algorithms, Data Modeling, Object Tracking, OCR, Video Analysis, Legacy Code, Legacy Software, DICOM, AWS, Economics, Image Recognition, Apache Cassandra, Open Data, Forecasting, 3D CAD, Leadership, Product Management, IT Project Management, Torch, CTO, Azure Data Lake, CSV, Supply Chain Management, Supply Chain Optimization, Technical Writing, Writing & Editing, Statistics, Point Clouds, Analytics, Generative Adversarial Networks (GANs), Software Architecture, Scalability, Search Engines, Data Visualization

Education

  • Master's Degree in Information and Data Science
    2014 - 2015
    University of California, Berkeley - Berkeley, CA, USA
  • Bachelor's Degree in Mathematical Methods in the Social Sciences, Economics
    2009 - 2013
    Northwestern University - Evanston, IL, USA

To view more profiles

Join Toptal
Share it with others