Simon Tietze, Data Scientist and Developer in Berlin, Germany
Simon Tietze

Data Scientist and Developer in Berlin, Germany

Member since November 17, 2022
Simon is a data scientist with experience in machine learning, statistics, big data, and method development. Over his career, he has worked in various fields, including adtech, molecular biology, telecommunication networks, and hardware reliability. Simon has built predictive machine learning systems, reporting dashboards, and in-depth analytical reports, ranging from small datasets to systems operating in real time with thousands of requests per second.
Simon is now available for hire

Portfolio

  • Exago Machine Learning
    R, Python 3, TensorFlow, Keras, Spark, sparklyr...
  • BEN Energy
    R, Python 3, Ansible, SQL, Data Science, Machine Learning, Algorithms, MySQL...
  • Motorola Mobility
    RStudio, R, RStudio Shiny, Python 3, Hadoop, Google BigQuery, Data Science...

Experience

Location

Berlin, Germany

Availability

Full-time

Preferred Environment

Linux, RStudio, Python 3

The most amazing...

...project I've worked on is a mobile phone data-based population mobility analysis that provided information to several governments during the COVID-19 pandemic.

Employment

  • Principal Data Scientist | Co-founder

    2018 - PRESENT
    Exago Machine Learning
    • Created an hourly population flow model for entire countries based on the mobile phone data used by the State of New York and the UK government to track COVID-19 measures.
    • Designed and implemented a user segmentation into around 50 groups using deep learning deployed at several thousand queries per second.
    • Implemented and created a model that filters unprofitable traffic in an ad auction server early in the pipeline, reducing the client's cloud cost by roughly 20%.
    Technologies: R, Python 3, TensorFlow, Keras, Spark, sparklyr, Bayesian Inference & Modeling, Google Cloud, Databricks, Data Science, Machine Learning, Algorithms, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), Data Analytics, Data Visualization, Bash, Pandas, PyTorch, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines
  • Senior Data Scientist

    2016 - 2018
    BEN Energy
    • Created customer churn models based on custom neural networks trained on censored time-to-event data. These models predicted the time until customer churn and could use partial information provided by active customers.
    • Developed a SaaS predictive dashboard that provided customers with churn alerts and cross-selling recommendations.
    • Presented complex modeling results to over 20 energy utility companies in interactive workshops.
    Technologies: R, Python 3, Ansible, SQL, Data Science, Machine Learning, Algorithms, MySQL, PostgreSQL, Python, Neural Networks, Ggplot2, Deep Neural Networks, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Data Reporting, Data Analytics, Data Visualization, Bash, Natural Language Processing (NLP), Pandas, SQL-99, ETL, Docker, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product Development
  • Senior Data Scientist

    2010 - 2015
    Motorola Mobility
    • Built a complex survival model integrating hardware properties with usage logs to investigate a newly released phone's high-return rates, which were due to the high-end model's target audience, not the hardware.
    • Implemented an R library that assembled a concise device history from manufacturing, QA, sales, and the data used to inform multiple reporting and modeling tasks, including connecting sources in Oracle, Apache Hadoop, and BigQuery.
    • Supported product launches with data on early product returns by building R Markdown templates that provided reports within days of a product coming to market.
    Technologies: RStudio, R, RStudio Shiny, Python 3, Hadoop, Google BigQuery, Data Science, Machine Learning, Algorithms, Recommendation Systems, MySQL, PostgreSQL, Ggplot2, Data Manipulation, Data Extraction, Large Data Sets, Data Engineering, Google Cloud Platform (GCP), BigQuery, Data Reporting, Data Analytics, Data Visualization, Bash, SQL-99, ETL, Amazon Web Services (AWS), Artificial Intelligence (AI), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines
  • Head of Analytics

    2009 - 2010
    Aloqa (acquired by Motorola Mobility)
    • Developed an end-to-end big data analytics solution from the mobile client through Hadoop to the web reporting front end.
    • Created a randomized keep-alive algorithm to deliver instant push messages to mobile clients before Google and Apple created APIs that enable this.
    • Developed an early microservice architecture to scale from thousands to millions of users within weeks.
    Technologies: R, Ruby, Java, SQL, Hadoop, Amazon Web Services (AWS), Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, A/B Testing, Data Analysis, Product Analytics, Data Pipelines, Product Development
  • Lead Developer

    2007 - 2008
    MoDeST
    • Coordinated the development of a full-stack cheminformatics framework, including fingerprint, graph-based, ligand-ligand superpositioning, and protein/ligand docking methods.
    • Implemented novel 3D visualizations for proteins based on OpenGL shaders, such as real-time ambient occlusion.
    • Co-invented several novel techniques based on protein-ligand docking, e.g., inverting the normal process to look for molecular targets of known drugs.
    Technologies: Java, Ruby, OpenGL, R, Statistical Analysis, Statistical Modeling, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis, Product Development
  • Research Assistant

    1999 - 2007
    Ludwig Maximilians University of Munich
    • Developed machine learning-based methods for automated diagnosis of vertigo-related diseases based on accelerometer recordings of upright stance.
    • Worked on text mining, NLP, protein alignment extensions to profile the profile, and statistical approaches to validating lattice-based inference of text topics.
    • Contributed to novel methods and applications in protein-ligand docking.
    Technologies: MATLAB, R, Ruby, Java, Artificial Intelligence (AI), Statistical Analysis, Predictive Modeling, Models, Version Control Systems, Communication, Modeling, Data Analysis

Experience

  • Population Mobility and Its Effect on the COVID-19 Pandemic in the US

    Collaborated with Imperial College London on a mobility trend analysis of data grouped by user age for the entire US. We developed a pipeline combining mobile carrier and adtech location data to verify user locations. The carrier data is reliable but only precise to the cell tower level, while adtech data contains precise locations but is often fraudulent.

    We used a deep learning model to augment the mobility data with user age information. The model was built and previously measured to be accurate to around 80% with five age group bins. This data was then used in a Bayesian hierarchical model analysis to attribute infection spread to different age groups in each US state.

Skills

  • Languages

    R, Python, SQL, SQL-99, Bash, Python 3, C, Ruby, Java
  • Libraries/APIs

    TensorFlow, Ggplot2, PyTorch, Keras, Pandas, OpenGL
  • Tools

    sparklyr, Ansible, BigQuery, MATLAB
  • Paradigms

    Data Science, ETL, Agile, Scrum, XP
  • Platforms

    RStudio, Linux, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Docker
  • Industry Expertise

    Bioinformatics
  • Storage

    Data Pipelines, Google Cloud, PostgreSQL, MySQL
  • Other

    Deep Learning, Neural Networks, Machine Learning, Large Data Sets, Data Analytics, Data Visualization, Artificial Intelligence (AI), Predictive Modeling, Models, Communication, Modeling, Data Analysis, Product Analytics, Deep Neural Networks, Algorithms, Computational Biology, Data Manipulation, Data Extraction, Data Engineering, Data Reporting, Statistical Analysis, Statistical Modeling, Version Control Systems, A/B Testing, Product Development, Bayesian Inference & Modeling, Google BigQuery, Recommendation Systems, Biology, Molecular Biology, Natural Language Processing (NLP), Signal Processing
  • Frameworks

    Spark, RStudio Shiny, Hadoop

Education

  • Master's Degree in Computational Biology
    2000 - 2006
    Ludwig Maximilian University of Munich - Munich, Germany

Certifications

  • Certified SAFe 5 Agile Software Engineer
    DECEMBER 2022 - PRESENT
    Scaled Agile, Inc.

To view more profiles

Join Toptal
Share it with others