Hugo De Oliveira, Developer in Hamburg, Germany
Hugo is available for hire
Hire Hugo

Hugo De Oliveira

Verified Expert  in Engineering

Data Scientist and Developer

Location
Hamburg, Germany
Toptal Member Since
June 15, 2021

Hugo is a full-stack data scientist. His strong scientific education and business experience give him hands-on skills in data engineering, analytics, and predictive modeling. Hugo's research background provides him with autonomy, scientific curiosity, and creativity in developing theoretical and practical solutions to complex problems.

Portfolio

Komatsu
Dash, Data Visualization, Python, Plotly, Data Science, SQL
Overlord
Python, SQL, Flask, Supabase
Synthesis School, Inc
Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift...

Experience

Availability

Part-time

Preferred Environment

Visual Studio Code (VS Code), Jupyter Notebook, Git, Python, Redshift, SQL, Data Build Tool (dbt), Google Sheets, Dash, Apache Airflow

The most amazing...

...opportunity I've had was working on a French national health database and developing innovative predictive modeling methods for patient pathways.

Work Experience

Analytics Engineer (via Toptal)

2023 - PRESENT
Komatsu
  • Assisted the Surface Factory Analytics team in improving reporting practices for internal and external customers.
  • Worked with product engineers, infrastructure engineers, data scientists, and business intelligence developers to create dashboards to display indicators for surface mining assets (such as shovels, loaders, and trucks).
  • Developed dash templates and reusable functions and components to accelerate the process of dashboard development.
  • Helped set up dbt as a framework for data model definition and documentation.
Technologies: Dash, Data Visualization, Python, Plotly, Data Science, SQL

Consulting Data Scientist

2024 - 2024
Overlord
  • Assisted the company in formalizing and automating investment performance evaluation for their vehicle management platform.
  • Created a Python script calculating vehicle and investor performance indicators (including capital gain, internal rate of return, cash-on-cash, and net asset value).
  • Deployed the Python script via a Flask app, creating endpoints to interact with the platform.
Technologies: Python, SQL, Flask, Supabase

Senior Data Scientist (via Toptal)

2021 - 2023
Synthesis School, Inc
  • Provided metrics to different departments within the company (Product, Operations, Marketing, Finance).
  • Built and maintained the company analytics pipeline, from data engineering to reporting.
  • Used Python to create ETL scripts for different data sources, dbt for data modeling, Apache Airflow for orchestration, Redshift for data warehousing, Google Sheets, and Mode for dashboards and reporting.
  • Built a Slack notification system to send daily and weekly notifications, informing about acquisition and product metrics.
  • Created a heuristic to automatically propose planning of new classes to open every month based on waitlisted student time preferences and teacher availabilities.
  • Proposed a Python script to optimize game infrastructure scaling based on scheduled sessions to reduce the number of allocated servers not in operation while ensuring capacity for all sessions.
  • Developed a proof of concept for student progress metrics targeted for parents, including data on interactions with teammates from different locations, game results, and session participation.
  • Created a financial dashboard for company executives, including company data and financial reports extracted from the QuickBooks API via a Python script (revenue, expenses, gross margins, cash available, burn, and runway).
  • Assisted in the transition to a flat rate system for teacher payment, automating the process of hour tracking, thus saving time for teachers and HR while controlling company costs.
  • Created and maintained a budget for a company product as a bi-weekly P&L sheet, reviewed by the team every month to control expenses.
Technologies: Data Analysis, SQL, Data Visualization, Data Science, Education, Redshift, Amazon S3 (AWS S3), Apache Airflow, Segment, Google Sheets, Python, Data Build Tool (dbt), ETL, Dashboards, MySQL, Data Modeling, Reporting, Data Manipulation, Amazon Web Services (AWS)

Data Scientist

2017 - 2020
HEVA
  • Conducted health data analysis studies for public institutions, pharmaceutical, and medical device companies.
  • Collaborated with data scientists, data engineers, developers, UI/UX designers, and medical experts.
  • Participated in a range of research and development projects, from theoretical ideas to implementations of case studies, leading to scientific and technical contributions presented at international conferences or published in peer-reviewed journals.
Technologies: Visual Studio Code (VS Code), GitLab, Health, TensorFlow, Python, Data Visualization, Predictive Modeling, Jupyter Notebook, Git, Scikit-learn, NumPy, Pandas, SQL, Plotly, Machine Learning, Data Science, Deep Learning, Data Analysis, Data Analytics, Analytics, Dashboards, Artificial Intelligence (AI), Data Engineering, Predictive Analytics, Data Modeling, Data Manipulation, Dash, Clustering Algorithms

Research Intern

2016 - 2016
Polytechnique Montréal
  • Analyzed data and extracted knowledge to improve the workload distribution for the Home Care Regional Services of Montreal Island.
  • Created a database in SQL in order to structure caregivers and visit data.
  • Designed and adapted a dashboard to facilitate future data collection.
Technologies: RStudio, Data Analysis, Data Analytics, Analytics

Dash App

https://github.com/hugros-93/chat-with-data
A Dash app to ask questions about your data. The app allows users to upload PDFs and use natural language to ask questions about the uploaded documents. The app will produce an answer in natural language and citations from the uploaded documents supporting the answer.

Debate Simulation with ChatGPT

https://github.com/hugros-93/debate-ai
A Python framework to simulate a debate between two ChatGPT agents. The script will generate the discussion by providing a subject for the debate and an opinion as well as a tone for the argumentation of the two agents.

Automate Analytics with ChatGPT

https://github.com/hugros-93/chatgpt-analytics
A Python project to create a dashboard automating analytics with ChatGPT. After loading CSV data, use natural language to ask ChatGPT to plot a chart. The chart will be displayed in the dashboard, allowing for export. The module allows for context, making possible iterations to refine the visualization.

Explaining Predictive Factors in Patient Pathways Using Autoencoders

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0277135
This project focused on developing an end-to-end methodology to predict a pathway-related outcome and identifying predictive factors using autoencoders. The method was tested in a case study, predicting short-term mortality after the implementation of an implantable cardioverter-defibrillator.

Optimal Pathway Discovery Analysis of Sepsis Hospital Admissions Using the HES Database in England

https://academic.oup.com/jamiaopen/article/3/3/439/5979570
The “Bow-tie” optimal pathway discovery analysis uses large clinical event datasets to map clinical pathways and to visualize risks (improvement opportunities) before and outcomes, after a specific clinical event. This proof-of-concept study assesses the use of NHS Hospital Episode Statistics (HES) in England as a potential clinical event dataset for this pathway discovery analysis approach.

Automatic and Explainable Labeling of Medical Event Logs with Auto-encoding

Process mining is a suitable method for knowledge extraction from patient pathways. Structured in event logs, medical events are complex, often described using various medical codes. Finding an efficient method of labeling these events before applying process mining analysis was challenging.

This project focused on developing an innovative methodology to handle the complexity of events in medical event logs. Based on auto-encoding, accurate labels are created by clustering similar events in latent space. Moreover, the explanation of created labels is provided by the decoding of the corresponding events.

Optimal Process Mining of Timed Event Logs

This project focuses on solving the problem of determining the optimal process model of an event log of traces of events with temporal information. We introduced a new formalism, along with a Tabu search algorithm to determine the optimal process model that maximizes the traces' representation subject to the constraints of the maximal number of nodes and arcs. We then conducted a healthcare case study to demonstrate the applicability of the approach for clinical pathway modeling. Special attention was paid to readability, so those final users could interpret the process mining results.

Binary Classification from French Hospital Data

In this project, a benchmark of seven machine learning algorithms was performed on binary classification tasks of hospital data. We then tested seven algorithms on three data sets extracted from the French national hospital database. Lastly, we used an efficient global optimization algorithm to solve the hyperparameter tuning problem.

Meta-TAK: A Scalable Double-clustering Method for Treatment Sequence Visualization

This project focuses on the study of treatment sequences, particularly the extraction of patterns from nonclinical claim databases through clustering. For this purpose, the TAK algorithm was proposed and demonstrated its usefulness. However, the scalability of the TAK algorithm regarding the number of patients was an issue; the method was impossible to use in practice for thousands of patients. For this purpose, we developed an extension of the TAK algorithm. Referred to as Meta-TAK, this method appears to be robust and computationally efficient.
2017 - 2020

PhD in Engineering

Mines Saint-Etienne - Saint-Etienne, France

2014 - 2017

Master's Degree in Engineering

Mines Saint-Etienne - Saint-Etienne, France

MAY 2024 - PRESENT

Mining and Materials for Sustainable Development Transformations

edX

NOVEMBER 2023 - PRESENT

Machine Learning Engineering for Production (MLOps) Specialization

Coursera

Libraries/APIs

NumPy, Pandas, Scikit-learn, TensorFlow

Tools

Plotly, Google Sheets, Git, GitLab, Apache Airflow, ChatGPT

Paradigms

Data Science, ETL

Languages

SQL, Python, R, Snowflake

Frameworks

Flask

Platforms

Visual Studio Code (VS Code), Jupyter Notebook, RStudio, Amazon Web Services (AWS), Docker, Kubernetes, Linux

Storage

Redshift, Amazon S3 (AWS S3), MySQL, Google Cloud

Other

Machine Learning, Data Visualization, Deep Learning, Process Mining, Health, Predictive Modeling, Data Analysis, Data Analytics, Analytics, Dashboards, Data Build Tool (dbt), Metrics, Predictive Analytics, Data Modeling, Reporting, Dash, Clustering Algorithms, Optimization, Operations Research, Explainable Artificial Intelligence (XAI), Clustering, Hyperparameters, Data Analytics (Marketing), Segment, Artificial Intelligence (AI), Healthcare & Insurance, Data Engineering, Education, Data Manipulation, OpenAI GPT-3 API, Large Language Models (LLMs), Machine Learning Operations (MLOps), LangChain, Mining, Sustainability, Supabase

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring