Software Engineer
2018 - 2020OM1- Designed a Python platform on the AWS cloud to receive, de-identify, and normalize electronic medical records and insurance claims data from diverse sources on hundreds of millions of patients.
- Deployed the ingestion platform to receive data from 10 different data partners to ingest GBs of data daily. This project was accomplished within one year.
- Constructed a data processing service in Scala to define cohorts of patients based on clinical disease criteria and enrich those cohorts with predictive metrics on disease outcomes and medical expenditures from machine learning models.
- Managed the data team, as the interim team lead, to develop data transmission procedures with customers. I also planned the team’s roadmap and mentored junior engineers.
- Created SQL queries and workflows to manage complex ETL tasks on medical data including de-duplication, patient linking, and deriving clinically relevant metrics such as insurance histories and drug eras.
Technologies: Amazon Web Services (AWS), Claims, Amazon S3 (AWS S3), SQL, Python, ScalaSoftware Engineer and Forecasting Researcher
2012 - 2017Institute for Health Metrics and Evaluation- Built a predictive modeling platform to generate forecasts of health scenarios worldwide and the potential impacts of specific policies on global health as the team’s lead Spark engineer. Forecasted mortality from 205 causes in 195 countries.
- Developed a scientific software pipeline in Python used by dozens of modelers to run more than 20,000 models annually. Data and results were stored in our SQL database, and models run on a Univa Grid-Engine cluster.
- Created Python tools to support data analysts and researchers in modeling disease prevalence and economic drivers of health.
Technologies: STATA, SQL, Data Science, Data Analysis, R, Python