Dubreu Benjamin
Verified Expert in Engineering
Data Scientist and Developer
Paris, France
Toptal member since November 22, 2022
Dubreu is a Kaggle competition expert and senior data scientist with extensive experience in his field and a proven track record of adding business value to all the projects he's involved in. In addition, he also teaches data science and Python at various schools and universities. Dubreu enjoys deriving all sorts of insights from all kinds of data.
Portfolio
Experience
- SQL - 6 years
- Scikit-learn - 5 years
- Pandas - 5 years
- Python 3 - 5 years
- Computer Vision - 4 years
- Google BigQuery - 3 years
- Google Cloud Platform (GCP) - 3 years
- Spark - 2 years
Availability
Preferred Environment
PyCharm, Python 3, SQL
The most amazing...
...jobs I've led are several successful and high-ROI data science projects I've built from the bottom up.
Work Experience
Senior Data Engineer
BNP Paribas
- Created an investment-funds data wrangling pipeline.
- Implemented a pipeline to compute, for each fund, a synthetic risk indicator that is then used on key investor documents to help potential customers assess the risk of investing in that given fund.
- Set up the entire continuous integration and deployment pipeline from scratch using Azure Pipelines.
Senior Data Scientist
Mytraffic
- Defined and implemented a data-quality monitoring procedure to ensure raw-data quality before ingestion by trend algorithms.
- Set up daily and weekly KPIs based on key attention points, including the data provider, region of interest, and neighborhoods of interest.
- Introduced an alerting system that sends slack messages when the data reaches certain thresholds of variation daily or weekly.
Senior Data Scientist
Saint-Gobain Group
- Designed and implemented a "glass-edge" detection algorithm.
- Helped deploy this detection algorithm live on the production line.
- Contributed to ensuring a thorough quality level by allowing field experts to determine glass density based on this algorithm.
Lead Data Scientist
Auchan Retail
- Maintained, updated, and enhanced the forecast models for hundreds of European hypermarkets to predict turnover and number of clients.
- Created a "trend" algorithm that uses past errors to adjust correct predictions. This algorithm helped us maintain a 90% trustworthiness score on our predictions despite the COVID-19 pandemic.
- Led the development efforts to split our predictions at the department and section levels.
Data Engineer
TotalEnergies
- Contributed to setting up real-time data ingestion using Spark Streaming for data coming from drilling platforms worldwide.
- Set up Azure Data Factory to trigger automatically when new data is collected through the pipeline. The data is then processed and sent to several third-party APIs through Azure Functions.
- Established the system for those API calls to be stored in Azure Cosmos DB, a NoSQL database, for real-time consumption by the DrillX platform.
Data Scientist
Bpifrance
- Contributed to processing data transfer objects from the front end to the back end of the PGE platform.
- Conducted A/B testing to enhance the user experience and support quality.
- Collaborated with the success of the PGE platform, generating more than 100 billion euros worth of loans to French companies during the COVID-19 pandemic. The platform has a net promoter score of 71.
Data Engineer
Kiabi
- Created Python and Spark pipelines for data ingestion.
- Integrated feedback from marketing campaigns into the company's data lake.
- Participated in various code improvement sessions that updated the company's practices.
Data Scientist
ADEO
- Modeled order receptions in stores to predict the number of broken or missing items.
- Analyzed data that helped us realize that, contrary to previously assumed business knowledge, the main feature to focus on to find faulty deliveries was not the supplier but the kind of item supplied.
- Developed a model that identifies more than 75% of failed orders at only 40% of the cost of the former procedure.
Experience
Kaggle Competition Projects
https://www.kaggle.com/bdubreu• Natural language processing: Jigsaw unintended bias in toxicity classification.
Rank obtained: 127/3165, top 5%, silver medal.
Main project challenge: using Bert, a state-of-the-art new Deep Learning architecture
• Computer vision: intracranial hemorrhage detection challenge by the Radiological Society of North America (RSNA).
Rank obtained: 84/1345, top 7%, bronze medal.
Main project challenge: pre-processing the scans into data consumable by a CNN architecture.
• Computer vision: prostate cancer grade assessment (PANDA) challenge.
Rank obtained: 58/1030, top 6%, bronze medal.
Main project challenge: pre-processing biopsies stored in .tiff format with single images up to 35000x25000 pixels. This project required a pipeline that identified relevant tissue parts, as most of the biopsy is just white background. Then those parts had to be slipped into square tiles and passed to the models as batches of packs of tiles instead of sets of images.
Education
Master's Degree in Data Science
CentraleSupélec - Paris, France
Skills
Libraries/APIs
Scikit-learn, Pandas, PyTorch, OpenCV
Tools
PyCharm, Git, Kafka Streams, BigQuery
Languages
Python 3, Python, SQL, Java
Paradigms
ETL
Platforms
Jupyter Notebook, Google Cloud Platform (GCP), Azure, Amazon Web Services (AWS)
Frameworks
Spark, Hadoop, Streamlit
Storage
NoSQL, MongoDB
Other
Computer Vision, Machine Learning, Data Science, Google BigQuery, Deep Learning, Artificial Intelligence (AI), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT)
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring