R Developer in Chicago, IL, United States
Founder | Managing Principal2018 - PRESENTPeriData
Technologies: Python, NumPy, Pandas, Scikit-learn, Keras, Docker, Kubernetes, AWS, Jupyter, Google Cloud
- Successfully completed AI and data science development and advisory engagements for dozens of clients.
- Led the development of a deep learning algorithm that converted 2D images into 3D CAD models using Theano, Docker, and Pycollada.
- Developed an educational math website using JupyterHub, Docker, and AWS.
- Built a platform that enables users to, with a single command line prompt, spin up a new server on AWS and conduct multi-GPU training of deep learning models using Keras, Docker, and Kubernetes.
- Acted as a Spark consultant to FLYR, an airline revenue management firm. Improved the performance of the existing Spark processes running on Google Cloud Dataproc, cutting job runtimes by up to 80% and computing costs by up to 90% and saving the company $10,000/year.
CTO2016 - PRESENTRealize
Technologies: Keras, Kubernetes, Docker, Python, DICOM, Spark, AWS
- Company acquired by IntriHEALTH, Africa's leading radiology IT provider, to bring AI-powered diagnostic solutions to the developing world. Developed an algorithm that detects 14 diseases in chest X-rays, currently undergoing a multisite clinical trial in South Africa.
- Spearheaded a strategic partnership with vRad, America’s largest radiology practice, to create and deploy algorithms for prioritizing CT scans in emergency work queues by the likelihood of a pulmonary embolism (a national first).
- Implemented massive parallel extraction and preprocessing jobs for a 10+ TB database of CT scans using PySpark. Since CT scans are typically several hundred MB each, an extremely memory efficient architecture was required and developed, which eventually allowed us to run jobs arbitrarily quickly by scaling up the cluster.
- Co-invented a deep learning architecture combining CNNs and RNNs to analyze 3D images: Risman, Alexander; Chen, Sea. 2017. Anomaly Detection in Volumetric Images. US20180033144A1, filed September 26, 2017. Patent pending.
- Developed an algorithm that can detect lung nodules in CT, with third-party testing finding a <20% miss rate at a clinically acceptable false positive level. For comparison, radiologist miss rates of over 50% have been documented.
Data Scientist2013 - 2017McMaster-Carr
Technologies: C#.NET, Python, Pandas, NumPy, Scikit-learn, Keras, Theano
- Conceived of, developed, and deployed a deep-learning-based eCommerce search engine that trained recurrent neural networks on millions of customer searches and increasing the probability a given search would end with an "add to order" by 1.07%, as shown by A/B testing.
- Estimated and visualized the causal effect of “punch-out” purchasing software on sales with R/ggplot2, using a panel dataset of monthly sales figures from 30 customers (two years before and after activation).
- Built systems for tracking and analyzing A/B tests using a Neo4J graph database and R with methods for verifying assumptions and estimating treatment effects in superiority and non-inferiority trials.
- Developed a machine learning model to decide if non-catalog products sourced for customers required hazards handling based on supplier/description, achieving .99 AUC, 98% accuracy, and no false negatives in testing.
- Prototyped the above machine learning model in Python using Scikit-learn and Pandas.
- Implemented a Random Forest algorithm in C# on top of Accord, the most popular .NET ML framework, for production; Random Forest pull request to Accord accepted to master branch.
- Developed a machine learning model to sort product attributes in new faceted search panes by predicted popularity rank, correctly predicting the most popular attribute in 59% more existing panes than our merchandising department had placed it at the top in.
- Prototyped the above machine learning model in R using Random Forest; the implementation is in production pending.
- CT Lung Nodule Detection (Development)https://www.youtube.com/watch?v=X_8bpuL0G3Q
I developed artificial intelligence software to automatically detect lung nodules which are often missed by radiologists and can portend cancer, in CT scans. The link attached leads to a video demonstrating the AI results and the integration into the radiology workflow.
- Scaling Up Music | Master's Project at UC Berkeley (Development)
I deployed and used a Spark cluster to predict the genre of songs in The Echo Nest’s Million Song Dataset using data on volume, tempo, pitch, and “danceability”. I also wrote the code to train models using Spark’s MLlib.
LanguagesPython, R, SQL
Libraries/APIsPandas, Scikit-learn, Keras, NumPy, PySpark, TensorFlow
PlatformsAmazon Web Services (AWS), AWS EC2, Kubernetes, Docker
StorageAWS S3, Google Cloud
OtherArtificial Intelligence (AI), Deep Learning, Random Forests, Computer Vision, Natural Language Processing (NLP), Convolutional Neural Networks, Recurrent Neural Networks, 3D CAD
ToolsJupyter, Collada, CAD
ParadigmsParallel & Distributed Computing
- Master's degree in Information and Data Science2014 - 2015University of California, Berkeley - Berkely, CA, USA
- Bachelor's degree in Mathematical Methods in the Social Sciences, Economics2009 - 2013Northwestern University - Evanston, IL, USA