Hugo De Oliveira, Developer in Hamburg, Germany
Hugo is available for hire
Hire Hugo

Hugo De Oliveira

Verified Expert  in Engineering

Data Scientist and Developer

Location
Hamburg, Germany
Toptal Member Since
June 15, 2021

Hugo is a full-stack data scientist. Besides his strong scientific education, his business experience gives him hands-on skills in data engineering, analytics, and predictive modeling. Hugo's research background provides him with autonomy, scientific curiosity, and creativity in the development of theoretical and practical solutions to complex problems.

Portfolio

Synthesis School, Inc
Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift...
HEVA
Visual Studio Code (VS Code), GitLab, Health, TensorFlow, Python...
Polytechnique Montréal
RStudio, Data Analysis, Data Analytics, Analytics

Experience

Availability

Full-time

Preferred Environment

Visual Studio Code (VS Code), Jupyter Notebook, Git, Python, Redshift, SQL, Data Build Tool (dbt), Google Sheets

The most amazing...

...opportunity I've had was working on a French national health database and developing innovative predictive modeling methods for patient pathways.

Work Experience

Senior Data Scientist

2021 - 2023
Synthesis School, Inc
  • Provided metrics to the different departments within the company (Product, Operations, Marketing, Finance).
  • Built and maintained company analytics pipeline, from data engineering to reporting.
  • Used Python to create ETL scripts for different data sources, dbt for data modeling, Apache Airflow for orchestration, Redshift for data warehousing, Google Sheets, and Mode for dashboards and reporting.
  • Built a Slack notification system to send daily and weekly notifications, informing about acquisition and product metrics.
  • Created a heuristic to automatically propose planning of new classes to open every month based on waitlisted student time preferences and teacher availabilities.
  • Proposed a Python script to optimize game infrastructure scaling based on scheduled sessions to reduce the number of allocated servers not in operation while ensuring capacity for all sessions.
  • Developed a proof of concept (POC) for student progress metrics targeted for parents, including data on interactions with teammates from different locations, game results, and session participation.
  • Created a financial dashboard for company executives, including company data and financial reports extracted from the QuickBooks API via a Python script (revenue, expenses, gross margins, cash available, burn, and runway).
  • Assisted in the transition to a flat rate system for teacher payment, automating the process of hour tracking, thus saving time for teachers and HR while controlling company costs.
  • Created and maintained a budget for a company product as a bi-weekly P&L sheet, reviewed by the team every month to control expenses.
Technologies: Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift, Amazon S3 (AWS S3), Apache Airflow, Segment, Google Sheets, Python, Data Build Tool (dbt), ETL, Dashboards, MySQL, Data Modeling, Reporting

Data Scientist

2017 - 2020
HEVA
  • Conducted health data analysis studies for public institutions, pharmaceutical, and medical device companies.
  • Collaborated with data scientists, data engineers, developers, UI/UX designers, and medical experts.
  • Participated in a range of research and development projects, from theoretical ideas to implementations of case studies, leading to scientific and technical contributions presented at international conferences or published in peer-reviewed journals.
Technologies: Visual Studio Code (VS Code), GitLab, Health, TensorFlow, Python, Data Visualization, Predictive Modeling, Jupyter Notebook, Git, Scikit-learn, NumPy, Pandas, SQL, Plotly, Machine Learning, Data Science, Deep Learning, Data Analysis, Data Analytics, Analytics, Dashboards, Artificial Intelligence (AI), Data Engineering, Predictive Analytics, Data Modeling

Research Intern

2016 - 2016
Polytechnique Montréal
  • Analyzed data and extracted knowledge to improve the workload distribution for the Home Care Regional Services of Montreal Island.
  • Created a database in SQL in order to structure caregivers and visit data.
  • Designed and adapted a dashboard to facilitate future data collection.
Technologies: RStudio, Data Analysis, Data Analytics, Analytics

Automatic and Explainable Labeling of Medical Event Logs with Auto-encoding

Process mining is a suitable method for knowledge extraction from patient pathways. Structured in event logs, medical events are complex, often described using various medical codes. Finding an efficient method of labeling these events before applying process mining analysis was challenging.

This project focused on developing an innovative methodology to handle the complexity of events in medical event logs. Based on auto-encoding, accurate labels are created by clustering similar events in latent space. Moreover, the explanation of created labels is provided by the decoding of the corresponding events.

Meta-TAK: A Scalable Double-clustering Method for Treatment Sequence Visualization

This project focuses on the study of treatment sequences, particularly the extraction of patterns from nonclinical claim databases through clustering. For this purpose, the TAK algorithm was proposed and demonstrated its usefulness. However, the scalability of the TAK algorithm regarding the number of patients was an issue; the method was impossible to use in practice for thousands of patients. For this purpose, we developed an extension of the TAK algorithm. Referred to as Meta-TAK, this method appears to be robust and computationally efficient.

Optimal Process Mining of Timed Event Logs

This project focuses on solving the problem of determining the optimal process model of an event log of traces of events with temporal information. We introduced a new formalism, along with a Tabu search algorithm to determine the optimal process model that maximizes the traces' representation subject to the constraints of the maximal number of nodes and arcs. We then conducted a healthcare case study to demonstrate the applicability of the approach for clinical pathway modeling. Special attention was paid to readability, so those final users could interpret the process mining results.

Binary Classification from French Hospital Data

In this project, a benchmark of seven machine learning algorithms was performed on binary classification tasks of hospital data. We then tested seven algorithms on three data sets extracted from the French national hospital database. Lastly, we used an efficient global optimization algorithm to solve the hyperparameter tuning problem.

Optimal Pathway Discovery Analysis of Sepsis Hospital Admissions Using the HES Database in England

https://academic.oup.com/jamiaopen/article/3/3/439/5979570
The “Bow-tie” optimal pathway discovery analysis uses large clinical event datasets to map clinical pathways and to visualize risks (improvement opportunities) before and outcomes, after a specific clinical event. This proof-of-concept study assesses the use of NHS Hospital Episode Statistics (HES) in England as a potential clinical event dataset for this pathway discovery analysis approach.

Explaining Predictive Factors in Patient Pathways Using Autoencoders

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277135
This project focused on developing an end-to-end methodology to predict a pathway-related outcome and identifying predictive factors using autoencoders. The method was tested in a case study, predicting short-term mortality after the implementation of an implantable cardioverter-defibrillator.

Paradigms

Data Science, ETL

Other

Machine Learning, Deep Learning, Process Mining, Health, Data Visualization, Predictive Modeling, Data Analysis, Data Analytics, Analytics, Dashboards, Data Build Tool (dbt), Metrics, Predictive Analytics, Data Modeling, Reporting, Optimization, Operations Research, Machine Learning Operations (MLOps), Explainable Artificial Intelligence (XAI), Clustering, Hyperparameters, Data Analytics (Marketing), Segment, Artificial Intelligence (AI), Healthcare & Insurance, Data Engineering, Education

Languages

SQL, Python, R

Libraries/APIs

NumPy, Pandas, Scikit-learn, TensorFlow

Tools

Plotly, Google Sheets, Git, GitLab, Apache Airflow

Platforms

Visual Studio Code (VS Code), Jupyter Notebook, RStudio

Storage

Redshift, Amazon S3 (AWS S3), MySQL

2017 - 2020

Ph.D. in Engineering

Mines Saint-Etienne - Saint-Etienne, France

2014 - 2017

Master's Degree in Engineering

Mines Saint-Etienne - Saint-Etienne, France