Nicolas Keller, Developer in Berlin, Germany
Nicolas is available for hire
Hire Nicolas

Nicolas Keller

Verified Expert  in Engineering

Data Scientist and Developer

Location
Berlin, Germany
Toptal Member Since
January 21, 2020

With a strong mathematical background (a master's degree in mathematics), Nicolas is a passionate data scientist who can contribute the ideal combination of machine learning knowledge, practical programming skills, and a problem solving and analytical mindset to a project. He has a demonstrated history of transforming business problems into data-driven solutions and recently has worked as a data scientist at the global insurance company, Allianz.

Portfolio

Pfizer
Python, Neo4j, Amazon EC2, Machine Learning, Large Language Models (LLMs)...
IPPMed GmbH
Dashboard Design, Selenium, Automation, Web Crawlers, Python, Reporting...
Focus Sensors Limited
Python, Algorithms, Apache Kafka, SciPy, Testing, Streaming Data...

Experience

Availability

Part-time

Preferred Environment

Jupyter Notebook, RStudio, Git, Linux

The most amazing...

...thing I've coded was an R package to predict life insurance claims based on individual characteristics. It is a novel approach going beyond the status quo.

Work Experience

Data Science Lead

2021 - PRESENT
Pfizer
  • Led the development of data science and data engineering projects within the domain of clinical trials.
  • Set up a cloud infrastructure for model training, deployment, and application prototyping, which increased the impact and visibility of our team within the organization.
  • Designed and administrated a Neo4j graph database to centralize organizational datasets and leverage graph algorithms to answer complex business questions.
  • Developed an interface to the graph database, allowing non-technical users to ask questions in natural language. We trained a deep learning model to translate English to Cypher (graph query language).
  • Created an optimization algorithm to select the best possible sites for clinical trials and oversaw the roll-out and integration into the business operations.
Technologies: Python, Neo4j, Amazon EC2, Machine Learning, Large Language Models (LLMs), GPT, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Data Engineering, Project Leadership, Dataiku, Quantum Computing, Pandas, Data Science, Web Scraping, Web Crawlers, Microsoft Excel, Statistical Analysis, SQL, Linux, Business Intelligence (BI), NumPy, Scikit-learn, Git, Databases, Data Analysis, Dashboards, Dashboard Development, Data Visualization, Dashboard Design, Plotly, Process Automation, Jupyter Notebook

Data Engineer

2020 - PRESENT
IPPMed GmbH
  • Supported the external adjudication process of two medical studies.
  • Automated the process of combining, filling, and sending a large number of PDF forms using Python.
  • Kept track of the data exchange with the adjudicators via an automated Excel table and provided a dashboard to create progress reports.
  • Used Selenium to automate downloading and gathering PDF files from a website, which would have been weeks of manual work.
Technologies: Dashboard Design, Selenium, Automation, Web Crawlers, Python, Reporting, Database Management, Microsoft Office, Pandas, Data Science, Web Scraping, Microsoft Excel, Linux, Git, Databases, Data Analysis, Dashboards, Dashboard Development, Data Visualization, Plotly, Process Automation, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Jupyter Notebook

Data Scientist

2020 - 2021
Focus Sensors Limited
  • Examined the implementation and the mathematical concepts of an extensive codebase of an anomaly detection algorithm of sensor data.
  • Changed the core architecture from static data files to streaming data using Kafka.
  • Tested and optimized the new architecture in terms of processing time and output integrity.
Technologies: Python, Algorithms, Apache Kafka, SciPy, Testing, Streaming Data, Signal Processing, Docker, Pandas, Data Science, Linux, NumPy, Scikit-learn, Git, Data Analysis, Data Visualization, Jupyter Notebook

Data Scientist

2020 - 2021
Moneyhub
  • Implemented and productionized a personalized machine learning algorithm to classify transaction data using the AWS SageMaker and Lambda infrastructure.
  • Detected trends in the customers' behavior and created frequent reports to present the results, which have been published regularly on the company's website.
  • Completed various data analyses and POCs to answer requests from the business using a combination of SQL and Python for the back end and Jupyter Notebooks and Plotly to present findings.
Technologies: Financial Data, Applied Mathematics, XGBoost, LightGBM, Time Series Analysis, Reporting, Data Visualization, Data Analysis, MongoDB, Automation, Amazon Athena, Databases, Git, Scikit-learn, NumPy, Business Intelligence (BI), Linux, Statistical Analysis, Algorithms, Data Science, Data Reporting, Pandas, Data Analytics, Amazon Web Services (AWS), Natural Language Processing (NLP), Machine Learning, Plotly, Jupyter Notebook, SQL, Amazon SageMaker, Python, Redshift, Big Data, Dashboards, Dashboard Development, Dashboard Design

Data Scientist

2020 - 2020
TradeDepot
  • Analyzed the microloan data to identify the relevant features that have an impact on repayment behavior.
  • Implemented and tested a Python module that returns a credit risk score together with a detailed explanation.
  • Deployed that module on the AWS SageMaker and Lambda infrastructure to fully integrate it with the current system.
Technologies: Amazon Web Services (AWS), Financial Data, Software Engineering, Loans & Lending, Credit Risk, Amazon SageMaker, Python, Redshift, Pandas, Data Science, Statistical Analysis, NumPy, Scikit-learn, Data Analysis, Data Visualization, Plotly

Data Scientist

2019 - 2020
Sopra Steria España
  • Developed new methods to measure business success for a retail client and its implementation in Python.
  • Performed a post-analysis of retail promotions using SQL and Python and presented findings to stakeholders.
  • Optimized SQL queries to extract insights from large tables.
  • Restructured and optimized an internal Python package to extract and visualize statistics of large database tables.
Technologies: Reporting, Data Visualization, Data Analysis, SQL Server Management Studio (SSMS), Business Intelligence (BI), Pandas, Data Analytics, Microsoft Excel, Azure, Databricks, Python, SQL, Big Data, MySQL, Data Science, Statistical Analysis

Data Scientist (Master Thesis Student)

2019 - 2019
Allianz
  • Wrote my thesis about machine learning methods to model life tables.
  • Performed the preprocessing, analysis, and modeling of data with a size of over 100GB.
  • Built and tested an R package for internal usage in the actuarial department.
  • Conducted my final presentation in front of experts as part of the official training series.
  • Implemented exhaustive performance optimization of R code using vectorization, parallelization, and optimized packages.
Technologies: Applied Mathematics, Ggplot2, Data Analysis, Mathematics, SQL, Algorithms, Data Analytics, Plotly, Markdown, LaTeX, Python, R, Machine Learning, Data Science, Data Visualization

Data Scientist

2018 - 2019
Allianz
  • Implemented and supported extensive interactive data-driven dashboards in R-Shiny.
  • Developed a product recommendation system for corporate clients based on the clients' characteristics and product history.
  • Built a productive automated system for the early detection of problems with products or business processes based on client complaint data.
  • Implemented the visualization of complex data and presentation of insights using Plotly, D3.js, and R Markdown.
  • Performed topic modeling and text mining of client complaint texts using LDA.
  • Constructed presentations concerning theory and programming packages within the field of machine learning.
  • Created internal programming packages to streamline and simplify frequently used data science tasks.
Technologies: Markdown, Microsoft PowerPoint, Natural Language Processing (NLP), XGBoost, LightGBM, Ggplot2, Financial Markets, Random Forests, Reporting, Dashboard Design, Data Visualization, Dashboard Development, Dashboards, Data Analysis, CSS, Databases, Business Intelligence (BI), Statistical Analysis, Data Science, Data Reporting, Machine Learning, Data Analytics, Microsoft Excel, Plotly, RStudio Shiny, SQL, Git, Python, RStudio, MySQL, Web Scraping, Web Crawlers

Researcher

2017 - 2018
Fraunhofer Institute for Industrial Mathematics ITWM
  • Worked on the project Senrisk (Senrisk.eu/), which predicted price fluctuations of corporate and sovereign bonds based on news sentiments.
  • Built recurrent neural networks in PyTorch to predict financial time series.
  • Developed statistical methods for fraud detection in the health insurance industry.
  • Implemented a Python package for financial time series prediction, including the integration to a web service.
  • Constructed a software prototype in R Shiny to visualize the impact of different sample sizes in the context of fraud detection.
Technologies: Financial Data, Applied Mathematics, Neural Networks, Time Series Analysis, Financial Markets, Optimization, Random Forests, Data Analysis, Keras, Scikit-learn, Statistical Analysis, Algorithms, Data Science, Machine Learning, Data Analytics, R, RStudio Shiny, PyTorch, Anaconda, Linux, Python, RStudio, Data Visualization

Intern

2016 - 2017
Universidad Técnica Federico Santa María
  • Implemented software in C# to evaluate financial options based on the Black-Scholes model.
  • Created a detailed report about the theoretical foundations of option price valuation.
  • Conducted research related to the Black-Scholes model and financial time series.
Technologies: Financial Markets, Data Analysis, Data Analytics, Microsoft Excel, R, C#

EU Project SENRISK

As a member of the Fraunhofer ITWM research institute, I participated in the EU-funded SENSIRK project. The main goal of this project was to predict corporate and sovereign bond prices based on news sentiments.

My part was mainly the implementation of the prediction system. We used recurrent neural networks and boosting methods and built a Python package to streamline the whole process.

Analysis and Visualization of WhatsApp Chats

https://github.com/l47y/whatsappalytics
A Python toolkit to visualize WhatsApp chats. It offers some fun visualizations of single or group chats. Additionally, it has an interactive dashboard that can be used to navigate through visualizations. It transforms the original text file into a handy data frame and handles different input formats, including other iOS and Android versions.

Machine Learning Demonstration Tool

https://github.com/l47y/ml_tool
This shiny app serves as a little user interface to demonstrate some standard machine learning tasks. You can upload an example data set and edit, visualize and model it. I used it for demonstration purposes, especially when showing the basic ML concepts to non-technical users.

Android App Course Analyzer

https://github.com/l47y/SiCourses
An Android application to track given courses. It is used by a small group of persons for a specific business use case. You can insert a course and specify received evaluations.

On the main page, you have an overview of all courses, and you can export and import a list of the courses. Finally, you can see statistics of the evaluations, and a map shows the places where the courses have taken place with some additional information about it. Currently, it is only available in Spanish.

Languages

Python, R, SQL, C#, Markdown, Kotlin, HTML, CSS

Frameworks

RStudio Shiny, LightGBM, Selenium

Libraries/APIs

Pandas, Ggplot2, XGBoost, Keras, Scikit-learn, Beautiful Soup, NumPy, PySpark, PyTorch, SciPy

Tools

Plotly, Amazon SageMaker, Amazon Athena, Git, LaTeX, Microsoft Excel, PyCharm, Microsoft PowerPoint

Paradigms

Data Science, Business Intelligence (BI), Automation, Testing

Platforms

Jupyter Notebook, RStudio, Amazon Web Services (AWS), Linux, Azure, Anaconda, Databricks, Apache Kafka, Docker, Amazon EC2, Dataiku

Other

Data Analysis, Dashboards, Data Analytics, Applied Mathematics, Mathematics, Dashboard Development, Data Visualization, Machine Learning, Natural Language Processing (NLP), Random Forests, Dashboard Design, Reporting, Data Reporting, GPT, Generative Pre-trained Transformers (GPT), Financial Markets, Financial Data, Big Data, Algorithms, Process Automation, Optimization, Web Scraping, Web Crawlers, Time Series Analysis, Statistical Analysis, Neural Networks, Credit Risk, Loans & Lending, Software Engineering, Data Engineering, Android Development, Streaming Data, Signal Processing, Large Language Models (LLMs), Project Leadership, Microsoft Office, Quantum Computing, Quantum Machine Learning

Storage

Redshift, Databases, SQL Server Management Studio (SSMS), MySQL, MongoDB, Neo4j, Database Management, Graph Databases

2016 - 2019

Master of Science Degree in Financial and Actuarial Mathematics

Technical University Kaiserslautern - Kaiserslautern, Germany

2016 - 2016

Spent an Exchange Year in Financial Mathematics

Universidad Técnica Federico Santa María - Valparaíso, Chile

2013 - 2016

Bachelor of Science Degree in Mathematics

Technical University Kaiserslautern - Kaiserslautern, Germany

JULY 2023 - PRESENT

Quantum Applications Lab

IBM

MARCH 2023 - PRESENT

Quantum Business Foundations

IBM

AUGUST 2022 - PRESENT

Neo4j Certified Professional

Neo4j

OCTOBER 2019 - PRESENT

Big Data Fundamentals with PySpark

DataCamp

OCTOBER 2019 - PRESENT

Applying SQL to Real-world Problems

DataCamp

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring