Hugo is available for hire

Hugo De Oliveira

Verified Expert in Engineering

Data Scientist and Developer

Location

Hamburg, Germany

Toptal Member Since

June 15, 2021

Hugo is a full-stack data scientist. Besides his strong scientific education, his business experience gives him hands-on skills in data engineering, analytics, and predictive modeling. Hugo's research background provides him with autonomy, scientific curiosity, and creativity in the development of theoretical and practical solutions to complex problems.

Machine Learning SQL Data Visualization Data Analysis Data Analytics Analytics Deep Learning Metrics NumPy Pandas Python Redshift Google Sheets Visual Studio Code (VS Code)Clustering

Portfolio

Synthesis School, Inc

Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift...

HEVA

Visual Studio Code (VS Code), GitLab, Health, TensorFlow, Python...

Polytechnique Montréal

RStudio, Data Analysis, Data Analytics, Analytics

Experience

Dashboards - 6 years Data Visualization - 6 years Data Science - 6 years Python - 6 years SQL - 6 years Machine Learning - 4 years Predictive Modeling - 4 years Google Sheets - 2 years

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), Jupyter Notebook, Git, Python, Redshift, SQL, Data Build Tool (dbt), Google Sheets

The most amazing...

...opportunity I've had was working on a French national health database and developing innovative predictive modeling methods for patient pathways.

Work Experience

Senior Data Scientist

2021 - 2023

Synthesis School, Inc

Provided metrics to the different departments within the company (Product, Operations, Marketing, Finance).
Built and maintained company analytics pipeline, from data engineering to reporting.
Used Python to create ETL scripts for different data sources, dbt for data modeling, Apache Airflow for orchestration, Redshift for data warehousing, Google Sheets, and Mode for dashboards and reporting.
Built a Slack notification system to send daily and weekly notifications, informing about acquisition and product metrics.
Created a heuristic to automatically propose planning of new classes to open every month based on waitlisted student time preferences and teacher availabilities.
Proposed a Python script to optimize game infrastructure scaling based on scheduled sessions to reduce the number of allocated servers not in operation while ensuring capacity for all sessions.
Developed a proof of concept (POC) for student progress metrics targeted for parents, including data on interactions with teammates from different locations, game results, and session participation.
Created a financial dashboard for company executives, including company data and financial reports extracted from the QuickBooks API via a Python script (revenue, expenses, gross margins, cash available, burn, and runway).
Assisted in the transition to a flat rate system for teacher payment, automating the process of hour tracking, thus saving time for teachers and HR while controlling company costs.
Created and maintained a budget for a company product as a bi-weekly P&L sheet, reviewed by the team every month to control expenses.

Technologies: Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift, Amazon S3 (AWS S3), Apache Airflow, Segment, Google Sheets, Python, Data Build Tool (dbt), ETL, Dashboards, MySQL, Data Modeling, Reporting, Data Manipulation, Amazon Web Services (AWS)

Data Scientist

2017 - 2020

HEVA

Conducted health data analysis studies for public institutions, pharmaceutical, and medical device companies.
Collaborated with data scientists, data engineers, developers, UI/UX designers, and medical experts.
Participated in a range of research and development projects, from theoretical ideas to implementations of case studies, leading to scientific and technical contributions presented at international conferences or published in peer-reviewed journals.

Technologies: Visual Studio Code (VS Code), GitLab, Health, TensorFlow, Python, Data Visualization, Predictive Modeling, Jupyter Notebook, Git, Scikit-learn, NumPy, Pandas, SQL, Plotly, Machine Learning, Data Science, Deep Learning, Data Analysis, Data Analytics, Analytics, Dashboards, Artificial Intelligence (AI), Data Engineering, Predictive Analytics, Data Modeling, Data Manipulation, Dash, Clustering Algorithms

Research Intern

2016 - 2016

Polytechnique Montréal

Analyzed data and extracted knowledge to improve the workload distribution for the Home Care Regional Services of Montreal Island.
Created a database in SQL in order to structure caregivers and visit data.
Designed and adapted a dashboard to facilitate future data collection.

Technologies: RStudio, Data Analysis, Data Analytics, Analytics

Experience

Debate Simulation with ChatGPT

https://github.com/hugros-93/debate-ai

A Python framework to simulate a debate between two ChatGPT agents. The script will generate the discussion by providing a subject for the debate and an opinion as well as a tone for the argumentation of the two agents.

Automate Analytics with ChatGPT

https://github.com/hugros-93/chatgpt-analytics

A Python project to create a dashboard automating analytics with ChatGPT. After loading CSV data, use natural language to ask ChatGPT to plot a chart. The chart will be displayed in the dashboard, allowing for export. The module allows for context, making possible iterations to refine the visualization.

Explaining Predictive Factors in Patient Pathways Using Autoencoders

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277135

This project focused on developing an end-to-end methodology to predict a pathway-related outcome and identifying predictive factors using autoencoders. The method was tested in a case study, predicting short-term mortality after the implementation of an implantable cardioverter-defibrillator.

Optimal Pathway Discovery Analysis of Sepsis Hospital Admissions Using the HES Database in England

https://academic.oup.com/jamiaopen/article/3/3/439/5979570

The “Bow-tie” optimal pathway discovery analysis uses large clinical event datasets to map clinical pathways and to visualize risks (improvement opportunities) before and outcomes, after a specific clinical event. This proof-of-concept study assesses the use of NHS Hospital Episode Statistics (HES) in England as a potential clinical event dataset for this pathway discovery analysis approach.

Automatic and Explainable Labeling of Medical Event Logs with Auto-encoding

Process mining is a suitable method for knowledge extraction from patient pathways. Structured in event logs, medical events are complex, often described using various medical codes. Finding an efficient method of labeling these events before applying process mining analysis was challenging.

This project focused on developing an innovative methodology to handle the complexity of events in medical event logs. Based on auto-encoding, accurate labels are created by clustering similar events in latent space. Moreover, the explanation of created labels is provided by the decoding of the corresponding events.

Optimal Process Mining of Timed Event Logs

This project focuses on solving the problem of determining the optimal process model of an event log of traces of events with temporal information. We introduced a new formalism, along with a Tabu search algorithm to determine the optimal process model that maximizes the traces' representation subject to the constraints of the maximal number of nodes and arcs. We then conducted a healthcare case study to demonstrate the applicability of the approach for clinical pathway modeling. Special attention was paid to readability, so those final users could interpret the process mining results.

Binary Classification from French Hospital Data

In this project, a benchmark of seven machine learning algorithms was performed on binary classification tasks of hospital data. We then tested seven algorithms on three data sets extracted from the French national hospital database. Lastly, we used an efficient global optimization algorithm to solve the hyperparameter tuning problem.

Meta-TAK: A Scalable Double-clustering Method for Treatment Sequence Visualization

This project focuses on the study of treatment sequences, particularly the extraction of patterns from nonclinical claim databases through clustering. For this purpose, the TAK algorithm was proposed and demonstrated its usefulness. However, the scalability of the TAK algorithm regarding the number of patients was an issue; the method was impossible to use in practice for thousands of patients. For this purpose, we developed an extension of the TAK algorithm. Referred to as Meta-TAK, this method appears to be robust and computationally efficient.

Skills

Languages

SQL, Python, R

Paradigms

Data Science, ETL

Other

Machine Learning, Deep Learning, Process Mining, Health, Data Visualization, Predictive Modeling, Data Analysis, Data Analytics, Analytics, Dashboards, Data Build Tool (dbt), Metrics, Predictive Analytics, Data Modeling, Reporting, Dash, Clustering Algorithms, Optimization, Operations Research, Explainable Artificial Intelligence (XAI), Clustering, Hyperparameters, Data Analytics (Marketing), Segment, Artificial Intelligence (AI), Healthcare & Insurance, Data Engineering, Education, Data Manipulation, ChatGPT, OpenAI GPT-3 API, Large Language Models (LLMs), Machine Learning Operations (MLOps)

Libraries/APIs

NumPy, Pandas, Scikit-learn, TensorFlow

Tools

Plotly, Google Sheets, Git, GitLab, Apache Airflow

Platforms

Visual Studio Code (VS Code), Jupyter Notebook, RStudio, Amazon Web Services (AWS), Docker, Kubernetes, Linux

Storage

Redshift, Amazon S3 (AWS S3), MySQL, Google Cloud

Education

2017 - 2020

Ph.D. in Engineering

Mines Saint-Etienne - Saint-Etienne, France

2014 - 2017

Master's Degree in Engineering

Mines Saint-Etienne - Saint-Etienne, France

Certifications

NOVEMBER 2023 - PRESENT

Machine Learning Engineering for Production (MLOps) Specialization

Coursera

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring