CTO2016 - PRESENTRealize
Technologies: Amazon Web Services (AWS), AWS, Spark, DICOM, Python, Docker, Kubernetes, Keras, PyTorch, Matplotlib, Seaborn, Image Recognition, TensorFlow, APIs, RESTful Development, RESTful APIs, Twisted, Open Data, OpenCV, Architecture, Integration, DevOps, Neural Networks
- Earned multiple US patents for combining convolutional and recurrent neural networks to automatically detect diseases in CT scans and MRIs, the current state of the art.
- Developed an algorithm that detects tuberculosis in chest X-rays with world-class accuracy (>.9 AUC), as determined by multiple third-party evaluations.
- Assembled and led the founding team, including a marketer and an MD/Ph.D oncologist, as the CEO until our 2018 merger with leading African radiology IT firm. This merger occurred with a valuation of >30x our paid-in capital.
- Developed an AI system for the world’s largest radiology group, deployed as a containerized RESTful API.
- Advised governmental and NGO officials on healthcare applications of AI.
Python Developer for Machine Learning Tools2020 - 2020Confidential (MBB Consulting Firm via Toptal)
Technologies: Python, Pytest, Unit Testing, Code Refactoring, NumPy, Pandas, Azure, Tableau
- Productionized a machine learning prototype my client had built for theirs (a Fortune 500 pharmaceutical firm), reducing the codebase by thousands of lines, adding modularity, and vastly simplifying the logic while preserving the original output.
- Enabled the deployment of new marketing campaigns by configuration rather than a code change.
- Wrote Unit Tests for all refactored modules and an automatic end-to-end test for the entire system.
Data Engineering Architect2018 - 2020Confidential (Major US Pharmacy Chain, via Toptal)
Technologies: Databricks, Spark, PySpark, Spark SQL, Spark ML, Apache Airflow, SQL, Jira, Agile, Python, Azure, NumPy, Pandas, Scikit-learn, Unit Testing, Big Data, Big Data Architecture, Data Pipelines, Architecture, Integration
- Created systems, including accurate ML models and deep chains of complex Spark SQL queries, to identify gaps in 100M+ patients' vaccination histories based on CDC guidelines and generate personalized vaccine recommendations daily.
- Developed a PySpark method for adding a unique 18-digit ID to a DataFrame without coalescing to a single partition, removing a department-wide bottleneck.
- Scaled existing system for notifying patients their prescriptions were ready from a single node, on-premises SQL to distributed Spark SQL in Azure.
- Conducted hiring of data scientists and data engineers.
Spark Consultant2018 - 2018FLYR
Technologies: Google Cloud Platform (GCP), Google Cloud Dataproc, Spark, PySpark, Spark ML, BigQuery, Kubernetes, YARN, Agile, Jira
- Optimized existing YARN-managed PySpark jobs running on GCP, cutting runtimes and costs by over 80%.
- Trained client's staff in best practices for Spark and data engineering.
- Used Agile methodology to manage my work including daily scrums and sprint planning with Jira.
Data Scientist2013 - 2017McMaster-Carr Supply
Technologies: Theano, Keras, Scikit-learn, NumPy, Pandas, Python, C#.NET, Neo4j, Splunk, Time Series, Time Series Analysis
- Conceived, developed, and deployed a deep-learning-based eCommerce search engine that trained recurrent neural networks on millions of customer searches, increasing the probability a given search would end with an "add to order" by 1.07%.
- Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers (two years before and after activation).
- Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
- Developed a machine learning model to decide if non-catalog products sourced for customers required hazards handling based on supplier/description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
- Prototyped the above machine learning model in Python using Scikit-learn and Pandas.
- Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
- Prototyped the above machine learning model in R using Random Forest; the implementation is in production pending.