Principal Data Scientist | Co-founder
2018 - PRESENTExago Machine Learning- Created an hourly population flow model for entire countries based on the mobile phone data used by the State of New York and the UK government to track COVID-19 measures.
- Designed and implemented a user segmentation into around 50 groups using deep learning deployed at several thousand queries per second.
- Implemented and created a model that filters unprofitable traffic in an ad auction server early in the pipeline, reducing the client's cloud cost by roughly 20%.
Technologies: R, Python 3, TensorFlow, Keras, Spark, sparklyr, Bayesian Inference & Modeling, Google Cloud, Databricks, Data Science, Machine Learning, Algorithms, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), Data Analytics, Data Visualization, Bash, Pandas, PyTorch, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data PipelinesSenior Data Scientist
2016 - 2018BEN Energy- Created customer churn models based on custom neural networks trained on censored time-to-event data. These models predicted the time until customer churn and could use partial information provided by active customers.
- Developed a SaaS predictive dashboard that provided customers with churn alerts and cross-selling recommendations.
- Presented complex modeling results to over 20 energy utility companies in interactive workshops.
Technologies: R, Python 3, Ansible, SQL, Data Science, Machine Learning, Algorithms, MySQL, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Data Reporting, Data Analytics, Data Visualization, Bash, Natural Language Processing (NLP), Pandas, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product DevelopmentSenior Data Scientist
2010 - 2015Motorola Mobility- Built a complex survival model integrating hardware properties with usage logs to investigate a newly released phone's high-return rates, which were due to the high-end model's target audience, not the hardware.
- Implemented an R library that assembled a concise device history from manufacturing, QA, sales, and the data used to inform multiple reporting and modeling tasks, including connecting sources in Oracle, Apache Hadoop, and BigQuery.
- Supported product launches with data on early product returns by building R Markdown templates that provided reports within days of a product coming to market.
Technologies: RStudio, R, RStudio Shiny, Python 3, Hadoop, Google BigQuery, Data Science, Machine Learning, Algorithms, Recommendation Systems, MySQL, PostgreSQL, Ggplot2, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), BigQuery, Data Reporting, Data Analytics, Data Visualization, Bash, SQL-99, ETL, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data PipelinesHead of Analytics
2009 - 2010Aloqa (acquired by Motorola Mobility)- Developed an end-to-end big data analytics solution from the mobile client through Hadoop to the web reporting front end.
- Created a randomized keep-alive algorithm to deliver instant push messages to mobile clients before Google and Apple created APIs that enable this.
- Developed an early microservice architecture to scale from thousands to millions of users within weeks.
Technologies: R, Ruby, Java, SQL, Hadoop, Amazon Web Services (AWS), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product DevelopmentLead Developer
2007 - 2008MoDeST- Coordinated the development of a full-stack cheminformatics framework, including fingerprint, graph-based, ligand-ligand superpositioning, and protein/ligand docking methods.
- Implemented novel 3D visualizations for proteins based on OpenGL shaders, such as real-time ambient occlusion.
- Co-invented several novel techniques based on protein-ligand docking, e.g., inverting the normal process to look for molecular targets of known drugs.
Technologies: Java, Ruby, OpenGL, R, Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis, Product DevelopmentResearch Assistant
1999 - 2007Ludwig Maximilians University of Munich- Developed machine learning-based methods for automated diagnosis of vertigo-related diseases based on accelerometer recordings of upright stance.
- Worked on text mining, NLP, protein alignment extensions to profile the profile, and statistical approaches to validating lattice-based inference of text topics.
- Contributed to novel methods and applications in protein-ligand docking.
Technologies: MATLAB, R, Ruby, Java, Artificial Intelligence (AI), Statistical Analysis, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis