Verified Expert in Engineering
Data Scientist and Developer
Demetrio is an experienced data scientist who's comfortable with the entire data science stack. He excels at developing sophisticated machine learning models and leveraging highly tractable and robust statistical methods. Besides his technical and statistical expertise, Demetrio's presentation style and powerful visualizations effortlessly get across key takeaways of highly technical results to any audience.
Linux, PyCharm, Slack, Git, GitHub, Jupyter Notebook, Agile Data Science, Agile Workflow, Python, Rapid Prototyping
The most amazing...
...experience was leading a small team in tackling an unsolved problem in construction planning: reusability detection of preassembled form-work elements.
Senior Data Scientist
Fortune 100 Foods & Beverages company
- Scoped the project starting as a green field: interviewed stakeholders and validated their problems using data.
- Identified multiple opportunities for an ML solution to alleviate some out-of-stock burdens.
- Built an end-to-end ML solution to forecast out-of-stock inventory when dealing with a major eCommerce retailer.
- Led a project initiative to forecast drop-ship volumes.
Data Scientist | Project Tech Lead
- Supervised other data scientists regarding their tasks on the project.
- Created technical requirements—Jira stories and tasks—for the upcoming features.
- Conducted extensive code reviews and established code standards.
- Managed the relationship with the development team regarding the integration and deployment of the solution.
- Worked closely with the project manager on the direction of the project, timelines, team performance, and so on.
- Built a data pipeline that processed 3D models of buildings into graphs.
- Translated product requirements into a formal graph optimization problem.
- Researched and implemented optimization techniques.
- Visualized intermediate and final model results in a customer-friendly way.
- Built an uncertainty prediction model for wholesales using a mixture density network.
- Developed an NLP solution based on a bidirectional LSTM with attention that simplified traffic sign texts to a simple machine-readable format.
- Developed multiple NLP prototypes that supported workflow of GIS department.
- Extended a prototype solution of extracting parking availability from satellite imagery and used it in the first production-level scenario.
- Supervised a junior data scientist in developing a prototype for car-sharing fleet efficiency.
- Took over the scaling and deployment of the above in new markets.
- Presented model improvements and results in new markets to the customers.
- Participated in the hiring process and supervised interns.
- Worked on a 30TB large analytics-oriented data warehouse project.
- Took over responsibility for the analytics layer of the solution.
- Translated all existing analyses for a new data segment including heavy performance optimization, new requirements, interpretation, and visualization.
- Cooperated closely with the client regarding enhancements in the analytics layer.
- Launched systematic quality assurance of the aggregated data leading to the discovery of crucial inconsistencies that have gone unnoticed for years.
- Performed various adjustments in the ETL process and services.
- Developed an IoT prototype that collected sensor data from Raspberry Pis and uploaded it to the cloud.
- Refactored an economic model's existing numerical solution.
- Extended analytically the model and extensively enhanced the numerical solution with a focus on algorithm efficiency.
- Contributed substantially to a working paper by finding and correcting mathematical errors.
Deutsche Bundesbank, Research Centre
- Constructed a unique multi-country dataset regarding inflation targeting by central banks.
- Developed an analytical and numerical solution for a DSGE economic model addressing agents' expectations and inflation dynamics.
- Automated model-mining and generation of structured reports.
- Visualized, documented, interpreted, and presented the outcomes of our research.
Research Assistant (Part-time)
Center for European Economic Research
- Prepared a large scientific dataset (approximately 39 million entries) with only remote and restricted access.
- Performed statistical data analysis and presented the findings to the research team.
- Developing a standardized results-generating pipeline using a combination of Stata and Python.
- Assisted the research team with the model implementation in Python which included visualizing simulation outputs, writing unit tests, and optimizing numerical procedures.
Staying Ahead of an eCommerce Platform as a Manufacturer
However, the transition was somewhat unsteady: some products were declared out-of-stock and taken off the platform, creating massive revenue losses. Yet, only a small portion of those products was experiencing supply-chain shortages. For most, it was a combination of missed metrics like "delivery window," "weeks of cover," "past orders fill rate," etc. The eCommerce platform did not share the inner workings of its algorithms.
To facilitate weekly planning, I developed a machine learning model to forecast this out-of-stock behavior two weeks in advance. I combined the metrics reported by the eCommerce platform, internal supply-chain data, marketing planning calendar, and more. The problem was formulated as a time-series classification and solved using gradient boosting trees with inputs being various weekly aggregates of the last ten weeks in combination with known future static factors (e.g., holidays and promotions). I automated the output into a dashboard and delivered it to the stakeholders every Monday.
NLP: Text Simplification | Information Retrievalhttps://static1.hotcarsimages.com/wordpress/wp-content/uploads/2018/06/Pick-One.jpg
As our team already had a very comprehensive Regex-based parser, my suggestion was not to train an end-to-end system but a text simplifier. It is almost a machine translation task: all of "MON," "MND," "Mondays" would become "Monday," "Noon-3 PM" would be translated as "12 PM-3 PM," "No Littering!" would be ignored.
For this problem, I trained a state-of-the-art (at the time) NLP model—a bidirectional LSTM with attention. After just two months of development, it achieved a reasonable accuracy (92%) and was suitable for a human-in-the-loop deployment. Additionally, we requested a research grant to scale the solution further.
Wholesales Forecast with Uncertainty
In order to address this challenge, I've trained a mixture density neural network. The output layer in such architecture are parameters of a mixture of distributions (gamma in this case) and a parametric inverse maximum likelihood is used as a loss for training. This allows capturing multi-modal conditional distributions or a broad range of right-skewed distributions. As the data came from various stores across different geographic regions and exhibited strong trend shifts, it was first de-trended, then standardized before being modeled by the mixture density network.
Satellite-based Ground Truth for Parking Availabilityhttps://medium.com/ubiq/satellite-based-ground-truth-for-parking-availability-e477c7e1b412
Our solution was to use satellite imagery as a scalable approach to assess the parking situation in multiple cities across the globe at once. The main challenge is not to detect cars on the satellite images, which is just an object-detection problem (a very nasty one, however). It's about putting together a multi-stage pipeline that uses machine learning, heuristic rules, and legal restrictions to output how many free parking spots are there on a street.
The blog article was written by me and explains our approach in great detail.
Car Sharing Fleet Efficiencyhttps://medium.com/ubiq/the-art-of-fleet-rebalancing-our-ai-tool-to-increase-the-utilization-of-every-single-vehicle-c86731f98c39
So, we offered to build a machine learning model that would take all of these influencing factors into account and tell where and when cars will be in high demand to initiate relocating cars from low-demand areas.
Under my supervision and mentoring, a junior data scientist on their first project and I delivered a successful MVP. We found a suitable deployment strategy through my cooperation with the project team and launched the first version of the product four months after getting the initial dataset.
Consequently, I took over scaling the solution to multiple cities, adjusting its real-time efficiency, and adding multiple features based on the client's requests and model performance.
Eventually, this has become the most successful product of the startup; they then rebranded and now offer it as their only service.
Python, R, SQL, Bash, Regex, Snowflake
Pandas, Scikit-learn, Matplotlib, NumPy, NetworkX, TensorFlow, Keras, XGBoost, LSTM, PyTorch
PyCharm, Git, GitHub, GIS, LaTeX, Jupyter, Seaborn, Slack, PredictionIO, MATLAB
Data Science, Agile Workflow, Rapid Prototyping, Agile, Agile Project Management, Automation, ETL, Building Information Modeling (BIM), Business Intelligence (BI), Continuous Integration (CI), Dynamic Programming
Jupyter Notebook, Docker, Linux, Amazon Web Services (AWS), Open Cascade Technology (OCCT), Kubernetes, Google Cloud Platform (GCP), Databricks
PostgreSQL, Data Pipelines
Statistics, Modeling, Scientific Data Analysis, Technical Writing, Optimization, Code Review, Geospatial Data, Geospatial Analytics, Spatial Analysis, Machine Learning, Data Engineering, Statistical Analysis, Supervised Machine Learning, Data Visualization, Time Series, Time Series Analysis, Agile Sprints, Agile Data Science, Statistical Data Analysis, Predictive Modeling, Artificial Intelligence (AI), Data Analysis, Forecasting, Models, Communication, Version Control Systems, Data Modeling, Data Aggregation, Data Analytics, Exploratory Data Analysis, Regression, Linear Regression, Mathematics, Satellite Images, Natural Language Processing (NLP), Mobility, Deep Learning, Dynamic Systems Modeling, Spatial Statistics, Dashboards, Classification, Google Colaboratory (Colab), Neural Networks, Artificial Neural Networks (ANN), Model Development, GPT, Generative Pre-trained Transformers (GPT), Discrete Optimization, IFC, API Integration, QGIS, Fleet Management, Graphs, Algorithms, Risk Modeling, Numerical Methods, Research, Cloud Computing, Explainable Artificial Intelligence (XAI), ARIMA, ARIMA Models
Bachelor's Degree in Economics and Mathematics
University of Mannheim - Mannheim, Germany