CTO
2016 - PRESENTRealize- Earned multiple US patents for combining convolutional and recurrent neural networks to automatically detect diseases in CT scans and MRIs, the current state-of-the-art.
- Developed an AI system for the world's largest radiology group, deployed as a containerized RESTful API, including an NLP system for extracting diagnoses from radiology reports with over 95% accuracy.
- Created an algorithm that detects tuberculosis in chest X-rays with world-class accuracy (greater than 0.9 AUC), as determined by multiple third-party evaluations.
- Assembled and led the founding team, including a marketer and an MD/Ph.D oncologist, as the CEO until our 2018 merger with leading African radiology IT firm. This merger occurred with a greater than 30 times our paid-in capital valuation.
- Advised governmental and NGO officials on AI healthcare applications.
Technologies: Amazon Web Services (AWS), Spark, DICOM, Python, Docker, Kubernetes, Keras, PyTorch, Matplotlib, Seaborn, Image Recognition, TensorFlow, APIs, REST APIs, RESTful Development, Twisted, Open Data, OpenCV, Architecture, Integration, DevOps, Neural Networks, CTO, MicroservicesComputer Vision Developer
2021 - 2022Virtual/Augmented Reality Consulting Firm- Developed a "universal green screen" application to remove a moving background in real-time from behind a human figure to superimpose a video of just that human into a virtual environment (e.g., a video game).
- Prototyped new features using Python and ported them to C++ and OpenCV for real-time performance.
- Worked with various stakeholders to ensure an appropriate balance of segmentation quality, speed, and hardware usage.
Technologies: C++, Python, PyTorch, Torch, OpenCV, Amazon SageMaker, Object Detection, Computer Vision Algorithms, Computer VisionHead of Data and AI
2021 - 2022Stealth Healthcare Startup- Led a team of data scientists, data engineers, and machine learning engineers in developing systems to detect potential errors in medical insurance claims.
- Negotiated data purchasing and licensing agreements.
- Drove the company's decision-making around third-party software vendor selection and buy versus build discussions.
Technologies: Python, Databricks, XGBoost, NumPy, Pandas, JSON API, JSON, Confluence, Analytics, Business Intelligence (BI), Software Design, API Integration, Machine Learning Operations (MLOps), Software ArchitectureInterim CTO
2021 - 2021Blockchain Startup (via Toptal)- Led the engineering team in developing a React and Django app, enabling users to create, customize, and share infographics about the crypto market based on a curated set of data sources.
- Defined product requirements and oversaw their execution.
- Conducted first-hand market research at the 2021 Miami Bitcoin conference.
Technologies: React, Django, Amazon Web Services (AWS), REST APIs, Leadership, Product Management, IT Project Management, CTOPython Developer
2020 - 2020Confidential (MBB Consulting Firm via Toptal)- Productionized a machine learning prototype my client had built for theirs (a Fortune 500 pharmaceutical firm), reducing the codebase by thousands of lines, adding modularity, and vastly simplifying the logic while preserving the original output.
- Enabled the deployment of new marketing campaigns by configuration rather than a code change.
- Wrote Unit Tests for all refactored modules and an automatic end-to-end test for the entire system.
Technologies: Python, Pytest, Unit Testing, Refactoring, NumPy, Pandas, Azure, Tableau, Azure Data LakeData Engineering Architect
2018 - 2020Confidential (Major US Pharmacy Chain, via Toptal)- Created systems, including deep chains of complex Spark SQL queries and machine learning models, to identify gaps in more than 100 million patients' vaccination histories based on CDC guidelines and generate personalized vaccine recommendations daily.
- Developed a PySpark method for adding a unique 18-digit ID to a DataFrame without merging to a single partition, removing a department-wide bottleneck.
- Scaled the existing system for notifying patients their prescriptions were ready from a single node, on-premises SQL, to distributed Spark SQL in Azure.
- Conducted hiring of data scientists and data engineers.
Technologies: Databricks, Spark, PySpark, Spark SQL, Spark ML, Apache Airflow, SQL, Jira, Agile, Python, Azure, NumPy, Pandas, Scikit-learn, Unit Testing, Big Data, Big Data Architecture, Data Pipelines, Architecture, Integration, Databases, CSV, Legacy Code, Legacy Software, Data Analysis, Data Analytics, DataSpark Consultant
2018 - 2018FLYR- Optimized existing YARN-managed PySpark jobs running on GCP, cutting runtimes and costs by over 80%.
- Trained client staff in best practices for Spark and data engineering.
- Used Agile methodology to manage my work, including daily scrums and sprint planning with Jira.
Technologies: Google Cloud Platform (GCP), Google Cloud Dataproc, Spark, PySpark, Spark ML, BigQuery, Kubernetes, YARN, Agile, JiraData Scientist
2013 - 2017McMaster-Carr Supply- Conceived and developed a deep-learning-based eCommerce search engine that trained NLP models using recurrent neural networks on millions of customer searches, increasing the probability a given search would end with an "add to order" by 1.07%.
- Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers over two years before and after activation.
- Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
- Developed a machine learning model to decide if non-catalog products sourced for customers required hazard handling based on supplier description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
- Designed the above machine learning model in Python using Scikit-learn and Pandas.
- Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
- Prototyped the above machine learning model in R using Random Forest; the implementation is pending production.
Technologies: Theano, Keras, Scikit-learn, NumPy, Pandas, Python, C#.NET, Neo4j, Splunk, Time Series, Time Series Analysis, Forecasting, Supply Chain Management, Supply Chain Optimization, Recommendation Systems, C#, Cypher, .NET, eCommerce, HTML, Elasticsearch, Solr, Scalability, Search Engines, Data Visualization