Samuel López Santamaría
Verified Expert in Engineering
Data Scientist and Developer
Santa Cruz de Tenerife, Spain
Toptal member since September 23, 2022
Samuel is a seasoned data scientist experienced in building intelligent systems. He worked on solutions to predict aircraft delays for Lufthansa and upcoming graffiti on trains for Deutsche Bahn. He has also led an Agile team researching applications of reinforcement learning for autonomous vehicles. Samuel has deep expertise working with Python and has engaged in all stages of the data science pipeline, from data engineering to model training and deployment to the cloud (Azure, GCP).
Portfolio
Experience
- Data Science - 4 years
- Python - 4 years
- Artificial Intelligence (AI) - 4 years
- Scikit-learn - 4 years
- Machine Learning - 4 years
- XGBoost - 2 years
- Reinforcement Learning - 1 year
- DevOps - 1 year
Availability
Preferred Environment
PyCharm, Conda, Pandas, XGBoost, Docker, Pytest, Plotly, Scikit-learn, Matplotlib, Linux
The most amazing...
...challenge I've solved was finding a WiFi password encoded as a low-dimensional manifold in a 500-dimensional space.
Work Experience
Senior Data Scientist (Freelance)
Self-employed
- Helped a German e-learning leverage NLP and NER for extracting key insights from online advertisements. Led the implementation of the machine learning solution in Azure. This project was elected as one of the company's top 5 most innovative projects.
- Instructed a Python for Data Science course at the HTW Berlin (Hochschule für Technik und Wissenschaft).
- Acted as a sparring partner for the data science team of a Canadian radiology company, providing expert insights and guidance.
- Built financial dashboards with Google Looker Studio for a UK-based aviation client.
Senior Data Scientist
qdive GmbH
- Developed machine learning models to predict the occurrence of graffiti on trains using geographic data. Built a dashboard to visualize graffiti hotspots. Identified challenges with data quality and led technical workshops to define countermeasures.
- Led a team of six professionals researching and implementing reinforcement learning applications to autonomous shuttles in public transportation.
- Developed an algorithm to optimize maintenance operations on wind turbines, minimizing profit loss due to turbine downtime. Supported deployment as an Azure Function.
- Built and published a reinforcement learning baseline for the Kaggle Kore 2022 competition, which became the most-voted code contribution and was used by many participants.
- Received the second community prize for an observation builder in the NeurIPS Flatland Challenge 2020: Multi-agent Reinforcement Learning in Complex Train Networks.
- Participated in numerous pitches, workshops, and talks, including the Machine Learning on Graphs–Hands-on Approach and Current Challenges, at the Machine Learning Week Europe, 2021.
- Interviewed over 50 candidates for junior to lead data science positions and mentored five colleagues.
- Organized the first company retrospective, which was so successful that it was repeated on a quarterly basis.
Data Scientist
zeroG
- Developed a machine learning model with PySpark and XGBoost to predict flight delays that outperformed the model operating at the time.
- Defined custom metrics to track and communicate model performance.
- Performed data cleaning, feature engineering, and optimization of existing Python code.
Business Data Analyst
zeroG GmbH
- Reported and performed ad-hoc data analysis for requesting departments using Python and SQL.
- Created interactive dashboards in Tableau to quantitatively track and visualize data quality.
- Designed a "moonshot" business model involving blockchain, data sovereignty, and the future of online advertisement.
LabVIEW Developer
NCLogics AG
- Developed software powering the company's custom-made brain-computer interface to assess neurological disorders.
- Built real-time graphical visualizations of patients' spatiotemporal EEG activity.
- Implemented the company's proprietary signal-processing algorithms for quantitative assessment of the likelihood of neurological conditions.
Experience
Predicting Upcoming Graffiti on Trains
Using geographic and historical data, I led the development and testing of these models—XGBoost, LightGBM, and algorithms from scikit-learn. Also, I visualized the geographic distribution of graffiti hotspots with Kepler.gl and GeoPandas. The project further involved identifying key challenges with data quality and leading technical workshops to define countermeasures.
Predicting Flight Delays
My team trained a model on 3D geographic data using PySpark and XGBoost, which outperformed the client's current delay prediction method. I also reduced computation time for one feature from days to minutes.
Reinforcement Learning for Self-driving Buses
Optimization of Maintenance Operations to Minimize Loss of Profit
I integrated into the team to develop a Python algorithm to compute optimized schedules. The underlying mixed-integer linear problem was solved with Gurobi and Python-MIP. Schedules were saved to a NoSQL database—Azure Cosmos DB. Data was gathered from a variety of internal APIs, while code quality was guaranteed by unit and integration tests. The algorithm was deployed as an Azure function.
I carried out user stories in all the steps described above, writing and reviewing code and coordinating development with the data engineering team.
Education
Master's Degree in Integrative Neuroscience
Otto-von-Guericke University Magdeburg - Magdeburg, Germany
Bachelor's Degree in Physics
Ludwig Maximilians University of Munich - Munich, Germany
Certifications
Deep Reinforcement Learning
Udacity
Skills
Libraries/APIs
Pandas, Scikit-learn, XGBoost, Matplotlib, PySpark, Kepler.gl, PyTorch
Tools
Git, Seaborn, PyCharm, MATLAB, Pytest, Plotly, JetBrains, GitLab, GitLab CI/CD, Tableau
Languages
Python, SQL, Julia
Platforms
Jupyter Notebook, Azure, Docker, Google Cloud Platform (GCP), Vertex AI, Azure Functions, Linux
Paradigms
Scrum, DevOps, Azure DevOps, Agile
Frameworks
LightGBM
Storage
NoSQL, Azure Cosmos DB, Microsoft SQL Server
Other
Conda, Machine Learning, Artificial Intelligence (AI), Data Analysis, Data Analytics, Data Science, Predictive Modeling, Models, Version Control Systems, Modeling, Regression Modeling, Communication, Regression, Classification, Statistics, Physics, Neuroscience, Brain-computer Interface, Reinforcement Learning, Deep Reinforcement Learning, Graphs, GeoPandas, Deep Learning, Team Leadership, Remote Team Leadership, Stable Baselines3 (SB3), Machine Learning Operations (MLOps), Linear Optimization, APIs, Google Colaboratory (Colab), Data Engineering, Neural Networks, Natural Language Processing (NLP)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring