Demetrio is available for hire

Demetrio Rodriguez

Verified Expert in Engineering

Data Scientist and Developer

Location

Vienna, Austria

Toptal Member Since

May 25, 2021

Demetrio is an experienced data scientist who's comfortable with the entire data science stack. He excels at developing sophisticated machine learning models and leveraging highly tractable and robust statistical methods. Besides his technical and statistical expertise, Demetrio's presentation style and powerful visualizations effortlessly get across key takeaways of highly technical results to any audience.

Portfolio

Fortune 100 Foods & Beverages company

Python, Data Science, SQL, Machine Learning, Pandas, NumPy, Matplotlib, Seaborn...

Sclable Academy

Python, Git, Docker, Discrete Optimization, IFC, Open Cascade Technology (OCCT)...

Parkbob

R, Satellite Images, Geospatial Data, Geospatial Analytics, Spatial Analysis...

Experience

Python - 10 years Machine Learning - 7 years Statistics - 7 years Data Science - 7 years Data Visualization - 6 years R - 5 years Deep Learning - 3 years Generative Pre-trained Transformers (GPT) - 2 years

Availability

Part-time

Preferred Environment

Linux, PyCharm, Slack, Git, GitHub, Jupyter Notebook, Agile Data Science, Agile Workflow, Python, Rapid Prototyping

The most amazing...

...experience was leading a small team in tackling an unsolved problem in construction planning: reusability detection of preassembled form-work elements.

Work Experience

Senior Data Scientist

2021 - 2022

Fortune 100 Foods & Beverages company

Scoped the project starting as a green field: interviewed stakeholders and validated their problems using data.
Identified multiple opportunities for an ML solution to alleviate some out-of-stock burdens.
Built an end-to-end ML solution to forecast out-of-stock inventory when dealing with a major eCommerce retailer.
Led a project initiative to forecast drop-ship volumes.

Technologies: Python, Data Science, SQL, Machine Learning, Pandas, NumPy, Matplotlib, Seaborn, XGBoost, Time Series, Time Series Analysis, Forecasting, ETL, Docker, Modeling, Git, GitHub, Code Review, Geospatial Data, Geospatial Analytics, GIS, Mathematics, Statistics, TensorFlow, Spatial Analysis, Keras, Deep Learning, Data Engineering, Scikit-learn, Statistical Analysis, Supervised Machine Learning, Jupyter, Dashboards, Statistical Data Analysis, Predictive Modeling, Data Analysis, Classification, Explainable Artificial Intelligence (XAI), Models, Communication, Version Control Systems, Google Colaboratory (Colab), Data Modeling, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Snowflake, Linear Regression, Data Pipelines, Model Development, Amazon Web Services (AWS)

Data Scientist | Project Tech Lead

2019 - 2020

Sclable Academy

Supervised other data scientists regarding their tasks on the project.
Created technical requirements—Jira stories and tasks—for the upcoming features.
Conducted extensive code reviews and established code standards.
Managed the relationship with the development team regarding the integration and deployment of the solution.
Worked closely with the project manager on the direction of the project, timelines, team performance, and so on.
Built a data pipeline that processed 3D models of buildings into graphs.
Translated product requirements into a formal graph optimization problem.
Researched and implemented optimization techniques.
Visualized intermediate and final model results in a customer-friendly way.
Built an uncertainty prediction model for wholesales using a mixture density network.

Technologies: Python, Git, Docker, Discrete Optimization, IFC, Open Cascade Technology (OCCT), NetworkX, API Integration, Code Review, TensorFlow, Agile Project Management, Graphs, Algorithms, Building Information Modeling (BIM), Keras, Deep Learning, Machine Learning, PredictionIO, Linux, Data Visualization, Data Science, Supervised Machine Learning, Bash, Matplotlib, Scikit-learn, Pandas, Statistical Analysis, Jupyter, PyCharm, Slack, GitHub, Jupyter Notebook, Mathematics, Statistics, Modeling, Technical Writing, Optimization, Data Engineering, Automation, Agile Workflow, Agile Sprints, Agile Data Science, Rapid Prototyping, NumPy, Cloud Computing, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Agile, Google Cloud Platform (GCP), Seaborn, Models, Communication, Version Control Systems, Google Colaboratory (Colab), Data Modeling, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Data Pipelines, Model Development, Amazon Web Services (AWS)

Data Scientist

2018 - 2019

Parkbob

Developed an NLP solution based on a bidirectional LSTM with attention that simplified traffic sign texts to a simple machine-readable format.
Developed multiple NLP prototypes that supported workflow of GIS department.
Extended a prototype solution of extracting parking availability from satellite imagery and used it in the first production-level scenario.
Supervised a junior data scientist in developing a prototype for car-sharing fleet efficiency.
Took over the scaling and deployment of the above in new markets.
Presented model improvements and results in new markets to the customers.
Participated in the hiring process and supervised interns.

Technologies: R, Satellite Images, Geospatial Data, Geospatial Analytics, Spatial Analysis, QGIS, GIS, Python, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), TensorFlow, Keras, Fleet Management, Mobility, Linux, Data Visualization, Data Science, Spatial Statistics, Supervised Machine Learning, Bash, Matplotlib, Scikit-learn, Pandas, Statistical Analysis, PyCharm, Slack, Git, Jupyter Notebook, Mathematics, Statistics, Modeling, Technical Writing, Code Review, Deep Learning, Machine Learning, Data Engineering, LaTeX, Automation, Agile Workflow, Agile Sprints, Agile Data Science, Rapid Prototyping, NumPy, Dashboards, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Agile, ETL, Time Series, Time Series Analysis, Seaborn, Forecasting, Classification, XGBoost, Models, Communication, Version Control Systems, Data Modeling, LSTM, Exploratory Data Analysis, Neural Networks, Regression, Artificial Neural Networks (ANN), Linear Regression, Data Pipelines, Model Development

Data Scientist

2016 - 2017

Record Evolution

Worked on a 30TB large analytics-oriented data warehouse project.
Took over responsibility for the analytics layer of the solution.
Translated all existing analyses for a new data segment including heavy performance optimization, new requirements, interpretation, and visualization.
Cooperated closely with the client regarding enhancements in the analytics layer.
Launched systematic quality assurance of the aggregated data leading to the discovery of crucial inconsistencies that have gone unnoticed for years.
Performed various adjustments in the ETL process and services.
Developed an IoT prototype that collected sensor data from Raspberry Pis and uploaded it to the cloud.

Technologies: SQL, PostgreSQL, Python, Business Intelligence (BI), Risk Modeling, Data Engineering, Continuous Integration (CI), Docker, Kubernetes, Linux, Data Visualization, Data Science, Bash, Matplotlib, Pandas, Statistical Analysis, Slack, Git, GitHub, Mathematics, Statistics, Technical Writing, Code Review, Automation, Cloud Computing, Dashboards, Statistical Data Analysis, Predictive Modeling, Data Analysis, Google Cloud Platform (GCP), ETL, Time Series, Time Series Analysis, Forecasting, Classification, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Data Pipelines

Junior Researcher

2016 - 2016

The SAFE-FDZ

Refactored an economic model's existing numerical solution.
Extended analytically the model and extensively enhanced the numerical solution with a focus on algorithm efficiency.
Contributed substantially to a working paper by finding and correcting mathematical errors.

Technologies: MATLAB, Numerical Methods, Algorithms, Dynamic Programming, Optimization, Linux, Git, Mathematics, Modeling, Scientific Data Analysis, Technical Writing, LaTeX, Research, Dynamic Systems Modeling, Models, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Model Development

Research Assistant

2015 - 2015

Deutsche Bundesbank, Research Centre

Constructed a unique multi-country dataset regarding inflation targeting by central banks.
Developed an analytical and numerical solution for a DSGE economic model addressing agents' expectations and inflation dynamics.
Automated model-mining and generation of structured reports.
Visualized, documented, interpreted, and presented the outcomes of our research.

Technologies: MATLAB, LaTeX, Numerical Methods, Research, Dynamic Systems Modeling, Linux, Data Visualization, Data Science, Statistical Analysis, Git, Mathematics, Modeling, Scientific Data Analysis, Technical Writing, Optimization, Dynamic Programming, Automation, Rapid Prototyping, NumPy, Dashboards, Data Analysis, Time Series, Time Series Analysis, Forecasting, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Linear Regression, Data Pipelines, Model Development

Research Assistant (Part-time)

2014 - 2015

Center for European Economic Research

Prepared a large scientific dataset (approximately 39 million entries) with only remote and restricted access.
Performed statistical data analysis and presented the findings to the research team.
Developing a standardized results-generating pipeline using a combination of Stata and Python.
Assisted the research team with the model implementation in Python which included visualizing simulation outputs, writing unit tests, and optimizing numerical procedures.

Technologies: Statistical Analysis, Research, Automation, Python, Data Visualization, Data Science, Matplotlib, Pandas, Git, Mathematics, Statistics, Scientific Data Analysis, Technical Writing, LaTeX, NumPy, Cloud Computing, Dashboards, Statistical Data Analysis, Data Analysis, Time Series, Time Series Analysis, Forecasting, ETL, Models, Communication, Version Control Systems, Data Modeling, Exploratory Data Analysis, Regression, Linear Regression, Data Pipelines

Experience

Staying Ahead of an eCommerce Platform as a Manufacturer

A major manufacturer of foods and beverages saw a great shift in consumer purchase behavior toward online retail. Most of its eCommerce revenue comes from selling its products on a well-established platform.

However, the transition was somewhat unsteady: some products were declared out-of-stock and taken off the platform, creating massive revenue losses. Yet, only a small portion of those products was experiencing supply-chain shortages. For most, it was a combination of missed metrics like "delivery window," "weeks of cover," "past orders fill rate," etc. The eCommerce platform did not share the inner workings of its algorithms.

To facilitate weekly planning, I developed a machine learning model to forecast this out-of-stock behavior two weeks in advance. I combined the metrics reported by the eCommerce platform, internal supply-chain data, marketing planning calendar, and more. The problem was formulated as a time-series classification and solved using gradient boosting trees with inputs being various weekly aggregates of the last ten weeks in combination with known future static factors (e.g., holidays and promotions). I automated the output into a dashboard and delivered it to the stakeholders every Monday.

NLP: Text Simplification | Information Retrieval

https://static1.hotcarsimages.com/wordpress/wp-content/uploads/2018/06/Pick-One.jpg

Traffic signs come in all shapes and forms. And very often, the most important part of a traffic sign is the text below it, especially if the text says when the sign is valid, e.g., "MON 6 PM-8 PM." Those texts are supposed to be roughly standardized and structured. Thus our development team approached the problem of converting the texts into strict rules by creating comprehensive regular expressions. This worked very well for a while but slowly became unmaintainable, so a scalable approach became necessary.

As our team already had a very comprehensive Regex-based parser, my suggestion was not to train an end-to-end system but a text simplifier. It is almost a machine translation task: all of "MON," "MND," "Mondays" would become "Monday," "Noon-3 PM" would be translated as "12 PM-3 PM," "No Littering!" would be ignored.

For this problem, I trained a state-of-the-art (at the time) NLP model—a bidirectional LSTM with attention. After just two months of development, it achieved a reasonable accuracy (92%) and was suitable for a human-in-the-loop deployment. Additionally, we requested a research grant to scale the solution further.

Wholesales Forecast with Uncertainty

Oftentimes machine learning solutions focus on predicting one number. Specifically, this is not always very useful in wholesales, as the actual daily sales can vary quite a bit. Such variation when not addressed will result either in overfilled storage or empty shelfs. In order to effectively perform capacity planning, a manager should know a range of outcomes that could happen with some degree of certainty.

In order to address this challenge, I've trained a mixture density neural network. The output layer in such architecture are parameters of a mixture of distributions (gamma in this case) and a parametric inverse maximum likelihood is used as a loss for training. This allows capturing multi-modal conditional distributions or a broad range of right-skewed distributions. As the data came from various stores across different geographic regions and exhibited strong trend shifts, it was first de-trended, then standardized before being modeled by the mixture density network.

Satellite-based Ground Truth for Parking Availability

https://medium.com/ubiq/satellite-based-ground-truth-for-parking-availability-e477c7e1b412

Predicting on-street parking space occupancy is an extremely challenging problem. Mainly as there are no reliable sources of the ground truth.

Our solution was to use satellite imagery as a scalable approach to assess the parking situation in multiple cities across the globe at once. The main challenge is not to detect cars on the satellite images, which is just an object-detection problem (a very nasty one, however). It's about putting together a multi-stage pipeline that uses machine learning, heuristic rules, and legal restrictions to output how many free parking spots are there on a street.

The blog article was written by me and explains our approach in great detail.

Car Sharing Fleet Efficiency

https://medium.com/ubiq/the-art-of-fleet-rebalancing-our-ai-tool-to-increase-the-utilization-of-every-single-vehicle-c86731f98c39

A well-known car-sharing company approached us with a problem: some of their cars were being taken within minutes after being parked and some stand unused for days. They already knew it had to do with the geography of the city, population density, major transport hubs, time of day, and so on.

So, we offered to build a machine learning model that would take all of these influencing factors into account and tell where and when cars will be in high demand to initiate relocating cars from low-demand areas.

Under my supervision and mentoring, a junior data scientist on their first project and I delivered a successful MVP. We found a suitable deployment strategy through my cooperation with the project team and launched the first version of the product four months after getting the initial dataset.

Consequently, I took over scaling the solution to multiple cities, adjusting its real-time efficiency, and adding multiple features based on the client's requests and model performance.

Eventually, this has become the most successful product of the startup; they then rebranded and now offer it as their only service.

Skills

Languages

Python, R, SQL, Bash, Regex, Snowflake

Libraries/APIs

Pandas, Scikit-learn, Matplotlib, NumPy, NetworkX, TensorFlow, Keras, XGBoost, LSTM, PyTorch

Tools

PyCharm, Git, GitHub, GIS, LaTeX, Jupyter, Seaborn, Slack, PredictionIO, MATLAB

Paradigms

Data Science, Agile Workflow, Rapid Prototyping, Agile, Agile Project Management, Automation, ETL, Building Information Modeling (BIM), Business Intelligence (BI), Continuous Integration (CI), Dynamic Programming

Platforms

Jupyter Notebook, Docker, Linux, Amazon Web Services (AWS), Open Cascade Technology (OCCT), Kubernetes, Google Cloud Platform (GCP), Databricks

Storage

PostgreSQL, Data Pipelines

Other

Statistics, Modeling, Scientific Data Analysis, Technical Writing, Optimization, Code Review, Geospatial Data, Geospatial Analytics, Spatial Analysis, Machine Learning, Data Engineering, Statistical Analysis, Supervised Machine Learning, Data Visualization, Time Series, Time Series Analysis, Agile Sprints, Agile Data Science, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Forecasting, Models, Communication, Version Control Systems, Data Modeling, Data Aggregation, Data Analytics, Exploratory Data Analysis, Regression, Linear Regression, Mathematics, Satellite Images, Natural Language Processing (NLP), Mobility, Deep Learning, Dynamic Systems Modeling, Spatial Statistics, Dashboards, Classification, Google Colaboratory (Colab), Neural Networks, Artificial Neural Networks (ANN), Model Development, GPT, Generative Pre-trained Transformers (GPT), Discrete Optimization, IFC, API Integration, QGIS, Fleet Management, Graphs, Algorithms, Risk Modeling, Numerical Methods, Research, Cloud Computing, Explainable Artificial Intelligence (XAI), ARIMA, ARIMA Models

Education

2012 - 2015

Bachelor's Degree in Economics and Mathematics

University of Mannheim - Mannheim, Germany

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring