Verified Expert in Engineering
Data Scientist and Developer
Simon is a data scientist with experience in machine learning, statistics, big data, and method development. Over his career, he has worked in various fields, including adtech, molecular biology, telecommunication networks, and hardware reliability. Simon has built predictive machine learning systems, reporting dashboards, and in-depth analytical reports, ranging from small datasets to systems operating in real time with thousands of requests per second.
Linux, RStudio, Python 3
The most amazing...
...project I've worked on is a mobile phone data-based population mobility analysis that provided information to several governments during the COVID-19 pandemic.
Principal Data Scientist | Co-founder
Exago Machine Learning
- Created an hourly population flow model for entire countries based on the mobile phone data used by the State of New York and the UK government to track COVID-19 measures.
- Designed and implemented a user segmentation into around 50 groups using deep learning deployed at several thousand queries per second.
- Implemented and created a model that filters unprofitable traffic in an ad auction server early in the pipeline, reducing the client's cloud cost by roughly 20%.
Senior Data Scientist
- Created customer churn models based on custom neural networks trained on censored time-to-event data. These models predicted the time until customer churn and could use partial information provided by active customers.
- Developed a SaaS predictive dashboard that provided customers with churn alerts and cross-selling recommendations.
- Presented complex modeling results to over 20 energy utility companies in interactive workshops.
Senior Data Scientist
- Built a complex survival model integrating hardware properties with usage logs to investigate a newly released phone's high-return rates, which were due to the high-end model's target audience, not the hardware.
- Implemented an R library that assembled a concise device history from manufacturing, QA, sales, and the data used to inform multiple reporting and modeling tasks, including connecting sources in Oracle, Apache Hadoop, and BigQuery.
- Supported product launches with data on early product returns by building R Markdown templates that provided reports within days of a product coming to market.
Head of Analytics
Aloqa (acquired by Motorola Mobility)
- Developed an end-to-end big data analytics solution from the mobile client through Hadoop to the web reporting front end.
- Created a randomized keep-alive algorithm to deliver instant push messages to mobile clients before Google and Apple created APIs that enable this.
- Developed an early microservice architecture to scale from thousands to millions of users within weeks.
- Coordinated the development of a full-stack cheminformatics framework, including fingerprint, graph-based, ligand-ligand superpositioning, and protein/ligand docking methods.
- Implemented novel 3D visualizations for proteins based on OpenGL shaders, such as real-time ambient occlusion.
- Co-invented several novel techniques based on protein-ligand docking, e.g., inverting the normal process to look for molecular targets of known drugs.
Ludwig Maximilians University of Munich
- Developed machine learning-based methods for automated diagnosis of vertigo-related diseases based on accelerometer recordings of upright stance.
- Worked on text mining, NLP, protein alignment extensions to profile the profile, and statistical approaches to validating lattice-based inference of text topics.
- Contributed to novel methods and applications in protein-ligand docking.
Population Mobility and Its Effect on the COVID-19 Pandemic in the US
We used a deep learning model to augment the mobility data with user age information. The model was built and previously measured to be accurate to around 80% with five age group bins. This data was then used in a Bayesian hierarchical model analysis to attribute infection spread to different age groups in each US state.
R, Python, SQL, SQL-99, Bash, Python 3, C, Ruby, Java
TensorFlow, Ggplot2, PyTorch, REST APIs, Keras, Pandas, OpenGL
sparklyr, Ansible, BigQuery, MATLAB
Data Science, ETL, Agile, Scrum, XP
RStudio, Linux, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Docker
Data Pipelines, Google Cloud, PostgreSQL, MySQL
Deep Learning, Neural Networks, Machine Learning, Large Data Sets, Data Analytics, Data Visualization, Artificial Intelligence (AI), Predictive Modeling, Models, Communication, Modeling, Data Analysis, Product Analytics, Geospatial Data, Convolutional Neural Networks, Deep Neural Networks, Algorithms, Computational Biology, Data Manipulation, Data Extraction, Data Engineering, Data Reporting, Statistical Analysis, Statistical Modeling, Version Control Systems, A/B Testing, Product Development, Computer Vision, Bayesian Inference & Modeling, Google BigQuery, Recommendation Systems, Biology, Molecular Biology, Natural Language Processing (NLP), Signal Processing, GPT, Generative Pre-trained Transformers (GPT)
Spark, RStudio Shiny, Hadoop
Master's Degree in Computational Biology
Ludwig Maximilian University of Munich - Munich, Germany
Certified SAFe 5 Agile Software Engineer
Scaled Agile, Inc.