Dubreu Benjamin, Developer in Paris, France
Dubreu is available for hire
Hire Dubreu

Dubreu Benjamin

Verified Expert  in Engineering

Data Scientist and Developer

Paris, France

Toptal member since November 22, 2022

Bio

Dubreu is a Kaggle competition expert and senior data scientist with extensive experience in his field and a proven track record of adding business value to all the projects he's involved in. In addition, he also teaches data science and Python at various schools and universities. Dubreu enjoys deriving all sorts of insights from all kinds of data.

Portfolio

BNP Paribas
Python 3, Spark, Azure
Mytraffic
Python, SQL, Amazon Web Services (AWS), Git, Streamlit, Data Science...
Saint-Gobain Group
PyTorch, OpenCV, Python, Data Science, Artificial Intelligence (AI)...

Experience

  • SQL - 6 years
  • Scikit-learn - 5 years
  • Pandas - 5 years
  • Python 3 - 5 years
  • Computer Vision - 4 years
  • Google BigQuery - 3 years
  • Google Cloud Platform (GCP) - 3 years
  • Spark - 2 years

Availability

Part-time

Preferred Environment

PyCharm, Python 3, SQL

The most amazing...

...jobs I've led are several successful and high-ROI data science projects I've built from the bottom up.

Work Experience

Senior Data Engineer

2022 - PRESENT
BNP Paribas
  • Created an investment-funds data wrangling pipeline.
  • Implemented a pipeline to compute, for each fund, a synthetic risk indicator that is then used on key investor documents to help potential customers assess the risk of investing in that given fund.
  • Set up the entire continuous integration and deployment pipeline from scratch using Azure Pipelines.
Technologies: Python 3, Spark, Azure

Senior Data Scientist

2022 - 2022
Mytraffic
  • Defined and implemented a data-quality monitoring procedure to ensure raw-data quality before ingestion by trend algorithms.
  • Set up daily and weekly KPIs based on key attention points, including the data provider, region of interest, and neighborhoods of interest.
  • Introduced an alerting system that sends slack messages when the data reaches certain thresholds of variation daily or weekly.
Technologies: Python, SQL, Amazon Web Services (AWS), Git, Streamlit, Data Science, Jupyter Notebook

Senior Data Scientist

2022 - 2022
Saint-Gobain Group
  • Designed and implemented a "glass-edge" detection algorithm.
  • Helped deploy this detection algorithm live on the production line.
  • Contributed to ensuring a thorough quality level by allowing field experts to determine glass density based on this algorithm.
Technologies: PyTorch, OpenCV, Python, Data Science, Artificial Intelligence (AI), Jupyter Notebook

Lead Data Scientist

2021 - 2022
Auchan Retail
  • Maintained, updated, and enhanced the forecast models for hundreds of European hypermarkets to predict turnover and number of clients.
  • Created a "trend" algorithm that uses past errors to adjust correct predictions. This algorithm helped us maintain a 90% trustworthiness score on our predictions despite the COVID-19 pandemic.
  • Led the development efforts to split our predictions at the department and section levels.
Technologies: Python 3, Google Cloud Platform (GCP), BigQuery, SQL, Git, Data Science, Artificial Intelligence (AI), Jupyter Notebook

Data Engineer

2020 - 2021
TotalEnergies
  • Contributed to setting up real-time data ingestion using Spark Streaming for data coming from drilling platforms worldwide.
  • Set up Azure Data Factory to trigger automatically when new data is collected through the pipeline. The data is then processed and sent to several third-party APIs through Azure Functions.
  • Established the system for those API calls to be stored in Azure Cosmos DB, a NoSQL database, for real-time consumption by the DrillX platform.
Technologies: Python, Azure, NoSQL, MongoDB, Kafka Streams, Jupyter Notebook

Data Scientist

2020 - 2020
Bpifrance
  • Contributed to processing data transfer objects from the front end to the back end of the PGE platform.
  • Conducted A/B testing to enhance the user experience and support quality.
  • Collaborated with the success of the PGE platform, generating more than 100 billion euros worth of loans to French companies during the COVID-19 pandemic. The platform has a net promoter score of 71.
Technologies: SQL, Python, Java, Data Science

Data Engineer

2019 - 2020
Kiabi
  • Created Python and Spark pipelines for data ingestion.
  • Integrated feedback from marketing campaigns into the company's data lake.
  • Participated in various code improvement sessions that updated the company's practices.
Technologies: Python, Spark, SQL, Hadoop

Data Scientist

2018 - 2019
ADEO
  • Modeled order receptions in stores to predict the number of broken or missing items.
  • Analyzed data that helped us realize that, contrary to previously assumed business knowledge, the main feature to focus on to find faulty deliveries was not the supplier but the kind of item supplied.
  • Developed a model that identifies more than 75% of failed orders at only 40% of the cost of the former procedure.
Technologies: Pandas, PyCharm, Python 3, Scikit-learn, Git, Google BigQuery, Google Cloud Platform (GCP), Deep Learning, Data Science, Artificial Intelligence (AI), Jupyter Notebook

Kaggle Competition Projects

https://www.kaggle.com/bdubreu
I participated in five Kaggle competitions, earning medals in three of them:

• Natural language processing: Jigsaw unintended bias in toxicity classification.
Rank obtained: 127/3165, top 5%, silver medal.
Main project challenge: using Bert, a state-of-the-art new Deep Learning architecture

• Computer vision: intracranial hemorrhage detection challenge by the Radiological Society of North America (RSNA).
Rank obtained: 84/1345, top 7%, bronze medal.
Main project challenge: pre-processing the scans into data consumable by a CNN architecture.

• Computer vision: prostate cancer grade assessment (PANDA) challenge.
Rank obtained: 58/1030, top 6%, bronze medal.
Main project challenge: pre-processing biopsies stored in .tiff format with single images up to 35000x25000 pixels. This project required a pipeline that identified relevant tissue parts, as most of the biopsy is just white background. Then those parts had to be slipped into square tiles and passed to the models as batches of packs of tiles instead of sets of images.
2017 - 2019

Master's Degree in Data Science

CentraleSupélec - Paris, France

Libraries/APIs

Scikit-learn, Pandas, PyTorch, OpenCV

Tools

PyCharm, Git, Kafka Streams, BigQuery

Languages

Python 3, Python, SQL, Java

Paradigms

ETL

Platforms

Jupyter Notebook, Google Cloud Platform (GCP), Azure, Amazon Web Services (AWS)

Frameworks

Spark, Hadoop, Streamlit

Storage

NoSQL, MongoDB

Other

Computer Vision, Machine Learning, Data Science, Google BigQuery, Deep Learning, Artificial Intelligence (AI), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT)

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring