Verified Expert in Engineering
In Alex's current role, he uses artificial intelligence to automatically detect diseases in 2D and 3D medical images along with some algorithms to achieve superhuman performance. Previously, he worked as a data scientist at an eCommerce company, where he built and deployed a deep-learning-based product search engine.
Unix, Git, Jupyter Notebook
The most amazing...
...software I've developed is a tool for detecting 14 different diseases in chest X-ray
- Earned multiple US patents for combining convolutional and recurrent neural networks to automatically detect diseases in CT scans and MRIs, the current state-of-the-art.
- Developed an AI system for the world's largest radiology group, deployed as a containerized RESTful API, including an NLP system for extracting diagnoses from radiology reports with over 95% accuracy.
- Created an algorithm that detects tuberculosis in chest X-rays with world-class accuracy (greater than 0.9 AUC), as determined by multiple third-party evaluations.
- Assembled and led the founding team, including a marketer and an MD/Ph.D oncologist, as the CEO until our 2018 merger with leading African radiology IT firm. This merger occurred with a greater than 30 times our paid-in capital valuation.
- Advised governmental and NGO officials on AI healthcare applications.
Computer Vision Developer
Virtual/Augmented Reality Consulting Firm
- Developed a "universal green screen" application to remove a moving background in real-time from behind a human figure to superimpose a video of just that human into a virtual environment (e.g., a video game).
- Prototyped new features using Python and ported them to C++ and OpenCV for real-time performance.
- Worked with various stakeholders to ensure an appropriate balance of segmentation quality, speed, and hardware usage.
Head of Data and AI
Stealth Healthcare Startup
- Led a team of data scientists, data engineers, and machine learning engineers in developing systems to detect potential errors in medical insurance claims.
- Negotiated data purchasing and licensing agreements.
- Drove the company's decision-making around third-party software vendor selection and buy versus build discussions.
Blockchain Startup (via Toptal)
- Led the engineering team in developing a React and Django app, enabling users to create, customize, and share infographics about the crypto market based on a curated set of data sources.
- Defined product requirements and oversaw their execution.
- Conducted first-hand market research at the 2021 Miami Bitcoin conference.
Confidential (MBB Consulting Firm
- Productionized a machine learning prototype my client had built for theirs (a Fortune 500 pharmaceutical firm), reducing the codebase by thousands of lines, adding modularity, and vastly simplifying the logic while preserving the original output.
- Enabled the deployment of new marketing campaigns by configuration rather than a code change.
- Wrote Unit Tests for all refactored modules and an automatic end-to-end test for the entire system.
Data Engineering Architect
Confidential (Major US Pharmacy Chain,
- Created systems, including deep chains of complex Spark SQL queries and machine learning models, to identify gaps in more than 100 million patients' vaccination histories based on CDC guidelines and generate personalized vaccine recommendations daily.
- Developed a PySpark method for adding a unique 18-digit ID to a DataFrame without merging to a single partition, removing a department-wide bottleneck.
- Scaled the existing system for notifying patients their prescriptions were ready from a single node, on-premises SQL, to distributed Spark SQL in Azure.
- Conducted hiring of data scientists and data engineers.
- Optimized existing YARN-managed PySpark jobs running on GCP, cutting runtimes and costs by over 80%.
- Trained client staff in best practices for Spark and data engineering.
- Used Agile methodology to manage my work, including daily scrums and sprint planning with Jira.
- Conceived and developed a deep-learning-based eCommerce search engine that trained NLP models using recurrent neural networks on millions of customer searches, increasing the probability a given search would end with an "add to order" by 1.07%.
- Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers over two years before and after activation.
- Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
- Developed a machine learning model to decide if non-catalog products sourced for customers required hazard handling based on supplier description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
- Designed the above machine learning model in Python using Scikit-learn and Pandas.
- Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
- Prototyped the above machine learning model in R using Random Forest; the implementation is pending production.
Anomaly Detection in Volumetric Images Using Sequential Convolutional and Recurrent Neural Networkshttps://patents.google.com/patent/US10347010B2/en
CT Lung Nodule Detectionhttps://www.youtube.com/watch?v=X_8bpuL0G3Q
Scaling Up Music | Master's Project at UC Berkeley
Evaluation of a Multiple Open-source Deep Learning Models for Detecting COVID-19 On Chest X-rayshttps://pubmed.ncbi.nlm.nih.gov/35005058/
Purpose: In the context of the COVID-19 pandemic, rapid triage of cases and exclusion of other pathologies with artificial intelligence (AI) can assist over-stretched radiology departments.
We aim to validate three open-source AI models on an external test set.
We tested three open-source deep learning models, COVID-Net, COVIDNet-S-GEO, and CheXNet, for their ability to detect COVID-19 pneumonia and to determine its severity using 129 chest x-rays from two different vendors. Results: All three models detected COVID-19 pneumonia. Only the COVID-19 Net-S-GEO and CheXNet models performed well on severity scoring; COVID-Net only performed well at either task on images taken with a Philips machine (AUC 0.735) and not an Agfa machine (AUC 0.598).
Chest x-ray triage using existing machine learning models for COVID-19 pneumonia can be successfully implemented using open-source AI models. Evaluation of the model using local x-ray machines and protocols is highly recommended before implementation to avoid vendor or protocol-dependent bias.
Capturing and Analyzing Sentiment Data of SEC 10K Filing’s Management’s Discussion and Analysishttps://s3-us-west-2.amazonaws.com/riteshsoni/papers/MDA_Analysis.pdf
Analysis and Retrieval) database aggregates and disseminates the public disclosure data. There are more than 100 types of forms that market participants fill out and electronically file with EDGAR. The project is focused on a very important filing type, the 10K. Publicly traded companies disclose comprehensive information about the company operations regularly. This project demonstrates the data collection, manipulation, and analysis (sentiment based on NLP) of the 10K filings leveraging the Hadoop data processing framework for rapid data analysis.
Spark, Apache Spark, .NET, Hadoop, YARN, Twisted, Django
PyTorch, OpenCV, PySpark, Pandas, Scikit-learn, TensorFlow, Keras, NumPy, SciPy, Theano, MLlib, Spark ML, Matplotlib, React, REST APIs, XGBoost, JSON API
Spark SQL, Apache Airflow, Jira, Tableau, Jupyter, Git, Google Cloud Dataproc, BigQuery, Pytest, Seaborn, Splunk, Collada, CAD, Amazon SageMaker, Confluence, Solr
Data Science, ETL, DevOps, RESTful Development, Distributed Computing, Agile, Parallel Computing, Unit Testing, Refactoring, Business Intelligence (BI), Microservices
Databricks, Amazon Web Services (AWS), Amazon EC2, Kubernetes, Docker, NVIDIA CUDA, Ubuntu, Linux, Azure, Jupyter Notebook, Unix, Google Cloud Platform (GCP), Apache Kafka
Amazon S3 (AWS S3), Neo4j, PostgreSQL, Data Pipelines, Cassandra, Google Cloud, Databases, JSON, Elasticsearch
APIs, Image Analysis, 3D Image Processing, Image Processing, Machine Learning, Data Engineering, Big Data, Big Data Architecture, Architecture, Integration, Neural Networks, Time Series, Time Series Analysis, Artificial Intelligence (AI), Deep Learning, Random Forests, Computer Vision, Natural Language Processing (NLP), Convolutional Neural Networks, Recurrent Neural Networks (RNN), GPU Computing, Graphics Processing Unit (GPU), Software Design, API Integration, Recommendation Systems, Data Analysis, Data Analytics, Data, eCommerce, Machine Learning Operations (MLOps), Statistical Modeling, Mathematical Modeling, Statistical Methods, Object Detection, Algorithms, Computer Vision Algorithms, GPT, Generative Pre-trained Transformers (GPT), Data Modeling, Object Tracking, OCR, Video Analysis, Legacy Code, Legacy Software, DICOM, Economics, Image Recognition, Apache Cassandra, Open Data, Forecasting, 3D CAD, Leadership, Product Management, IT Project Management, Torch, CTO, Azure Data Lake, CSV, Supply Chain Management, Supply Chain Optimization, Technical Writing, Writing & Editing, Statistics, Point Clouds, Analytics, Generative Adversarial Networks (GANs), Software Architecture, Scalability, Search Engines, Data Visualization
Master's Degree in Information and Data Science
University of California, Berkeley - Berkeley, CA, USA
Bachelor's Degree in Mathematical Methods in the Social Sciences, Economics
Northwestern University - Evanston, IL, USA