Samuel López Santamaría, Developer in Santa Cruz de Tenerife, Spain
Samuel is available for hire
Hire Samuel

Samuel López Santamaría

Verified Expert  in Engineering

Data Scientist and Developer

Location
Santa Cruz de Tenerife, Spain
Toptal Member Since
September 23, 2022

Samuel is a seasoned data scientist experienced in building intelligent systems. He worked on solutions to predict aircraft delays for Lufthansa and upcoming graffiti on trains for Deutsche Bahn. He has also led an Agile team researching applications of reinforcement learning for autonomous vehicles. Samuel has deep expertise working with Python and has engaged in all stages of the data science pipeline, from data engineering to model training and deployment to the cloud (Azure, GCP).

Portfolio

qdive GmbH
Python, Azure, Docker, XGBoost, Deep Reinforcement Learning, Scikit-learn...
zeroG
Python, PySpark, Pandas, Kepler.gl, XGBoost, LightGBM, Tableau, SQL...
zeroG GmbH
Tableau, SQL, Python, Data Science, Communication

Experience

Availability

Part-time

Preferred Environment

PyCharm, Conda, Pandas, XGBoost, Docker, Pytest, Plotly, Scikit-learn, Matplotlib, Linux

The most amazing...

...challenge I've solved was finding a WiFi password encoded as a low-dimensional manifold in a 500-dimensional space.

Work Experience

Senior Data Scientist

2020 - 2022
qdive GmbH
  • Developed machine learning models to predict the occurrence of graffiti on trains using geographic data. Built a dashboard to visualize graffiti hotspots. Identified challenges with data quality and led technical workshops to define countermeasures.
  • Led a team of six professionals researching and implementing reinforcement learning applications to autonomous shuttles in public transportation.
  • Developed an algorithm to optimize maintenance operations on wind turbines, minimizing profit loss due to turbine downtime. Supported deployment as an Azure Function.
  • Built and published a reinforcement learning baseline for the Kaggle Kore 2022 competition, which became the most-voted code contribution and was used by many participants.
  • Received the second community prize for an observation builder in the NeurIPS Flatland Challenge 2020: Multi-agent Reinforcement Learning in Complex Train Networks.
  • Participated in numerous pitches, workshops, and talks, including the Machine Learning on Graphs–Hands-on Approach and Current Challenges, at the Machine Learning Week Europe, 2021.
  • Interviewed over 50 candidates for junior to lead data science positions and mentored five colleagues.
  • Organized the first company retrospective, which was so successful that it was repeated on a quarterly basis.
Technologies: Python, Azure, Docker, XGBoost, Deep Reinforcement Learning, Scikit-learn, Scrum, Git, Machine Learning, DevOps, Seaborn, Matplotlib, PyCharm, JetBrains, Azure DevOps, GitLab, GitLab CI/CD, Graphs, NoSQL, Azure Cosmos DB, Team Leadership, Remote Team Leadership, Jupyter Notebook, Julia, Data Analysis, Data Analytics, Conda, Pandas, Plotly, Reinforcement Learning, LightGBM, Artificial Intelligence (AI), Agile, Machine Learning Operations (MLOps), Data Science, Linear Optimization, Predictive Modeling, Models, Version Control Systems, Modeling, Communication, Google Colaboratory (Colab), Data Engineering, Microsoft SQL Server, Regression, Classification, Neural Networks, Statistics, Regression Modeling

Data Scientist

2019 - 2020
zeroG
  • Developed a machine learning model with PySpark and XGBoost to predict flight delays that outperformed the model operating at the time.
  • Defined custom metrics to track and communicate model performance.
  • Performed data cleaning, feature engineering, and optimization of existing Python code.
Technologies: Python, PySpark, Pandas, Kepler.gl, XGBoost, LightGBM, Tableau, SQL, Machine Learning, Jupyter Notebook, Data Analysis, Data Analytics, Scikit-learn, Conda, Plotly, Git, Seaborn, Matplotlib, Artificial Intelligence (AI), Agile, Data Science, Predictive Modeling, Models, Version Control Systems, Modeling, Communication, Data Engineering, Regression, Classification, Neural Networks, Statistics, Regression Modeling

Business Data Analyst

2018 - 2019
zeroG GmbH
  • Reported and performed ad-hoc data analysis for requesting departments using Python and SQL.
  • Created interactive dashboards in Tableau to quantitatively track and visualize data quality.
  • Designed a "moonshot" business model involving blockchain, data sovereignty, and the future of online advertisement.
Technologies: Tableau, SQL, Python, Data Science, Communication

LabVIEW Developer

2016 - 2017
NCLogics AG
  • Developed software powering the company's custom-made brain-computer interface to assess neurological disorders.
  • Built real-time graphical visualizations of patients' spatiotemporal EEG activity.
  • Implemented the company's proprietary signal-processing algorithms for quantitative assessment of the likelihood of neurological conditions.
Technologies: Brain-computer Interface, Neuroscience, Conda

Predicting Upcoming Graffiti on Trains

Graffiti is a problem in some regions of Germany, producing exorbitant costs for affected companies. The client, a regional train operator, had procedures in place to minimize recovery times, but none were preventive. I developed a machine learning model to predict upcoming graffiti on trains to prevent damage before it happens.

Using geographic and historical data, I led the development and testing of these models—XGBoost, LightGBM, and algorithms from scikit-learn. Also, I visualized the geographic distribution of graffiti hotspots with Kepler.gl and GeoPandas. The project further involved identifying key challenges with data quality and leading technical workshops to define countermeasures.

Predicting Flight Delays

Swift turnarounds are key to an airline's operations. Wasted minutes on the ground can propagate quickly, generating a cascade of expensive, negative consequences for other flights. I participated in the development of a machine-learning-based approach to predict flight delays.

My team trained a model on 3D geographic data using PySpark and XGBoost, which outperformed the client's current delay prediction method. I also reduced computation time for one feature from days to minutes.

Reinforcement Learning for Self-driving Buses

I led a team of six professionals, including data scientists, data engineers, and researchers. We used reinforcement learning to optimize the routing of self-driving buses in a village in Bavaria, Germany. I introduced an Agile workflow, coding guidelines, and other best coding practices. Further, I reviewed pull requests and guided development until a first-performing agent was found and trained with Stable Baselines3 (SB3). Also, I defined a development path to further increase the agent's performance.

Optimization of Maintenance Operations to Minimize Loss of Profit

Most wind turbine maintenance operations require the turbine to be shut down. Since the turbine cannot produce power, maintenance always implies lost profit. Ideally, maintenance would be carried out during low wind conditions to minimize this loss. However, scheduling is affected by a vast amount of constraints, and the task is not trivial.

I integrated into the team to develop a Python algorithm to compute optimized schedules. The underlying mixed-integer linear problem was solved with Gurobi and Python-MIP. Schedules were saved to a NoSQL database—Azure Cosmos DB. Data was gathered from a variety of internal APIs, while code quality was guaranteed by unit and integration tests. The algorithm was deployed as an Azure function.

I carried out user stories in all the steps described above, writing and reviewing code and coordinating development with the data engineering team.
2011 - 2014

Master's Degree in Integrative Neuroscience

Otto-von-Guericke University Magdeburg - Magdeburg, Germany

2007 - 2010

Bachelor's Degree in Physics

Ludwig Maximilians University of Munich - Munich, Germany

JANUARY 2020 - PRESENT

Deep Reinforcement Learning

Udacity

Languages

Python, SQL, Julia

Libraries/APIs

Pandas, Scikit-learn, XGBoost, Matplotlib, PySpark, Kepler.gl, PyTorch

Tools

Git, Seaborn, PyCharm, MATLAB, Pytest, Plotly, JetBrains, GitLab, GitLab CI/CD, Tableau

Paradigms

Data Science, Scrum, DevOps, Azure DevOps, Agile

Platforms

Jupyter Notebook, Azure, Docker, Google Cloud Platform (GCP), Vertex AI, Azure Functions, Linux

Other

Conda, Machine Learning, Artificial Intelligence (AI), Data Analysis, Data Analytics, Predictive Modeling, Models, Version Control Systems, Modeling, Regression Modeling, Communication, Regression, Classification, Statistics, Physics, Neuroscience, Brain-computer Interface, Reinforcement Learning, Deep Reinforcement Learning, Graphs, GeoPandas, Deep Learning, Team Leadership, Remote Team Leadership, Stable Baselines3 (SB3), Machine Learning Operations (MLOps), Linear Optimization, APIs, Google Colaboratory (Colab), Data Engineering, Neural Networks

Frameworks

LightGBM

Storage

NoSQL, Azure Cosmos DB, Microsoft SQL Server

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring