Simon Tietze, Developer in Berlin, Germany
Simon is available for hire
Hire Simon

Simon Tietze

Verified Expert  in Engineering

Data Scientist and Developer

Location
Berlin, Germany
Toptal Member Since
November 17, 2022

Simon is a data scientist with experience in deep learning, machine learning, statistics, big data, and method development. Over his career, he has worked in various fields, including adtech, molecular biology, telecommunication networks, and hardware reliability. Simon has built predictive machine learning systems, reporting dashboards, and in-depth analytical reports, ranging from small datasets to systems operating in real time with thousands of requests per second.

Portfolio

Exago Machine Learning
R, Python 3, TensorFlow, Keras, Spark, sparklyr, Bayesian Inference & Modeling...
BEN Energy
R, Python 3, Ansible, SQL, Data Science, Machine Learning, Algorithms, MySQL...
Motorola Mobility
RStudio, R, RStudio Shiny, Python 3, Hadoop, Google BigQuery, Data Science...

Experience

Availability

Part-time

Preferred Environment

Linux, RStudio, Python 3

The most amazing...

...project I've worked on is a mobile phone data-based population mobility analysis that provided information to several governments during the COVID-19 pandemic.

Work Experience

Principal Data Scientist | Co-founder

2018 - PRESENT
Exago Machine Learning
  • Created an hourly population flow model for entire countries based on the mobile phone data used by the State of New York and the UK government to track COVID-19 measures.
  • Designed and implemented a user segmentation into around 50 groups using deep learning deployed at several thousand queries per second.
  • Implemented and created a model that filters unprofitable traffic in an ad auction server early in the pipeline, reducing the client's cloud cost by roughly 20%.
Technologies: R, Python 3, TensorFlow, Keras, Spark, sparklyr, Bayesian Inference & Modeling, Google Cloud, Databricks, Data Science, Machine Learning, Algorithms, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), Data Analytics, Data Visualization, Bash, Pandas, PyTorch, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Geospatial Data, REST APIs, Computer Vision, Convolutional Neural Networks (CNN)

Senior Data Scientist

2016 - 2018
BEN Energy
  • Created customer churn models based on custom neural networks trained on censored time-to-event data. These models predicted the time until customer churn and could use partial information provided by active customers.
  • Developed a SaaS predictive dashboard that provided customers with churn alerts and cross-selling recommendations.
  • Presented complex modeling results to over 20 energy utility companies in interactive workshops.
Technologies: R, Python 3, Ansible, SQL, Data Science, Machine Learning, Algorithms, MySQL, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Data Reporting, Data Analytics, Data Visualization, Bash, GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Pandas, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product Development, Geospatial Data, REST APIs, Convolutional Neural Networks (CNN)

Senior Data Scientist

2010 - 2015
Motorola Mobility
  • Built a complex survival model integrating hardware properties with usage logs to investigate a newly released phone's high-return rates, which were due to the high-end model's target audience, not the hardware.
  • Implemented an R library that assembled a concise device history from manufacturing, QA, sales, and the data used to inform multiple reporting and modeling tasks, including connecting sources in Oracle, Apache Hadoop, and BigQuery.
  • Supported product launches with data on early product returns by building R Markdown templates that provided reports within days of a product coming to market.
Technologies: RStudio, R, RStudio Shiny, Python 3, Hadoop, Google BigQuery, Data Science, Machine Learning, Algorithms, Recommendation Systems, MySQL, PostgreSQL, Ggplot2, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), BigQuery, Data Reporting, Data Analytics, Data Visualization, Bash, SQL-99, ETL, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Geospatial Data, REST APIs, Convolutional Neural Networks (CNN)

Head of Analytics

2009 - 2010
Aloqa (acquired by Motorola Mobility)
  • Developed an end-to-end big data analytics solution from the mobile client through Hadoop to the web reporting front end.
  • Created a randomized keep-alive algorithm to deliver instant push messages to mobile clients before Google and Apple created APIs that enable this.
  • Developed an early microservice architecture to scale from thousands to millions of users within weeks.
Technologies: R, Ruby, Java, SQL, Hadoop, Amazon Web Services (AWS), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product Development, Geospatial Data, REST APIs, Convolutional Neural Networks (CNN)

Lead Developer

2007 - 2008
MoDeST
  • Coordinated the development of a full-stack cheminformatics framework, including fingerprint, graph-based, ligand-ligand superpositioning, and protein/ligand docking methods.
  • Implemented novel 3D visualizations for proteins based on OpenGL shaders, such as real-time ambient occlusion.
  • Co-invented several novel techniques based on protein-ligand docking, e.g., inverting the normal process to look for molecular targets of known drugs.
Technologies: Java, Ruby, OpenGL, R, Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis, Product Development, Convolutional Neural Networks (CNN)

Research Assistant

1999 - 2007
Ludwig Maximilians University of Munich
  • Developed machine learning-based methods for automated diagnosis of vertigo-related diseases based on accelerometer recordings of upright stance.
  • Worked on text mining, NLP, protein alignment extensions to profile the profile, and statistical approaches to validating lattice-based inference of text topics.
  • Contributed to novel methods and applications in protein-ligand docking.
Technologies: MATLAB, R, Ruby, Java, Artificial Intelligence (AI), Statistical Analysis, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis, Computer Vision, Convolutional Neural Networks (CNN)

Population Mobility and Its Effect on the COVID-19 Pandemic in the US

Collaborated with Imperial College London on a mobility trend analysis of data grouped by user age for the entire US. We developed a pipeline combining mobile carrier and adtech location data to verify user locations. The carrier data is reliable but only precise to the cell tower level, while adtech data contains precise locations but is often fraudulent.

We used a deep learning model to augment the mobility data with user age information. The model was built and previously measured to be accurate to around 80% with five age group bins. This data was then used in a Bayesian hierarchical model analysis to attribute infection spread to different age groups in each US state.

Languages

R, Python, SQL, SQL-99, Bash, Python 3, C, Ruby, Java

Libraries/APIs

TensorFlow, Ggplot2, PyTorch, REST APIs, Keras, Pandas, OpenGL

Tools

sparklyr, Ansible, BigQuery, MATLAB

Paradigms

Data Science, ETL, Agile, Scrum, XP

Platforms

RStudio, Linux, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Docker

Industry Expertise

Bioinformatics

Storage

Data Pipelines, Google Cloud, PostgreSQL, MySQL

Other

Deep Learning, Neural Networks, Machine Learning, Large Data Sets, Data Analytics, Data Visualization, Artificial Intelligence (AI), Predictive Modeling, Models, Communication, Modeling, Data Analysis, Product Analytics, Geospatial Data, Convolutional Neural Networks (CNN), Deep Neural Networks, Algorithms, Computational Biology, Data Manipulation, Data Extraction, Data Engineering, Data Reporting, Statistical Analysis, Statistical Modeling, Version Control Systems, A/B Testing, Product Development, Computer Vision, Bayesian Inference & Modeling, Google BigQuery, Recommendation Systems, Biology, Molecular Biology, Natural Language Processing (NLP), Signal Processing, GPT, Generative Pre-trained Transformers (GPT)

Frameworks

Spark, RStudio Shiny, Hadoop

2000 - 2006

Master's Degree in Computational Biology

Ludwig Maximilian University of Munich - Munich, Germany

DECEMBER 2022 - PRESENT

Certified SAFe 5 Agile Software Engineer

Scaled Agile, Inc.

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring