Benjamin Breton, Developer in South Bend, IN, United States
Benjamin is available for hire
Hire Benjamin

Benjamin Breton

Verified Expert  in Engineering

Data Scientist and Developer

Location
South Bend, IN, United States
Toptal Member Since
November 7, 2022

Benjamin is passionate about data science and enjoys operating in different sectors. He aims to identify business needs, design an adapting solution, and create value from data. Benjamin has prolific professional experience and has collaborated with 30 startups and large companies during 45 missions.

Portfolio

Sogexia
Python 3, Pandas, Data Visualization, Scikit-learn, OpenAI GPT-4 API, Plotly...
Evalmee
You Only Look Once (YOLO), PyTorch, Artificial Intelligence (AI), Deep Learning...
La Touche Musicale
Python 3, TensorFlow, Open Neural Network Exchange (ONNX), NumPy...

Experience

Availability

Full-time

Preferred Environment

Python 3, TensorFlow, Scikit-learn, Pandas, Flask, PyTorch, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Open-source LLMs

The most amazing...

...thing I've achieved is the state-of-the-art result on an OCR document, reducing response time by 75%.

Work Experience

Lead Data Scientist

2023 - 2023
Sogexia
  • Analyzed customer data to identify growth potential. Cleaned datasets for accurate analysis.
  • Filled missing data using the GPT-4 API to fill missing data using few-shots learning.
  • Conducted statistical analysis for insights. Performed customer clustering to segment user base.
  • Shared a detailed report of the analysis to the CEO with business recommendations.
Technologies: Python 3, Pandas, Data Visualization, Scikit-learn, OpenAI GPT-4 API, Plotly, APIs, OpenAI GPT-3 API, API Integration, Large Language Models (LLMs), OpenAI, Generative Pre-trained Transformer 3 (GPT-3)

Senior Machine Learning Engineer

2022 - 2023
Evalmee
  • Optimized three open-source computer vision algorithms using ONNX.
  • Deployed optimized models on AWS Lambda using Docker.
  • Developed a custom detection algorithm for cheating detection during exams.
  • Trained a YOLOv5 detection model with SparseML toolbox and deployed on AWS Lambda.
Technologies: You Only Look Once (YOLO), PyTorch, Artificial Intelligence (AI), Deep Learning, Python 3, AWS Lambda, Docker, Object Detection, Generative Pre-trained Transformers (GPT), Neural Networks, Deep Neural Networks

Machine Learning Engineer

2021 - 2023
La Touche Musicale
  • Optimized a music2midi TensorFlow model to reduce the response time for a mobile app. O.
  • Retrained the model according to the research paper.
  • Reduced the model size for mobile app integration using TensorFlow Lite.
Technologies: Python 3, TensorFlow, Open Neural Network Exchange (ONNX), NumPy, Generative Pre-trained Transformers (GPT), Neural Networks, Deep Neural Networks

Lead Data Scientist

2020 - 2023
L'Oreal
  • Led a team of five data scientists and two data engineers to create a tool to model and optimize their budget for marketing campaigns. It is called MMM or Mix Marketing Modeling.
  • Coordinated between multiple providers to model customer responses to several marketing campaigns and create "response curves." Integrated those response curves into an optimization tool.
  • Deployed the solution to business analytics through a REST API. Collected feedback from the business team to improve the tool's response.
  • Scaled the platform to optimize a marketing budget of more than $1 billion.
Technologies: API Integration, PyTorch, ChatGPT, Generative Pre-trained Transformers (GPT), Pandas, Vertex, Google Cloud Platform (GCP), Google BigQuery, Cloud Run, Jupyter Notebook, Scikit-learn, NumPy, Time Series, Data Science, Artificial Intelligence (AI), Deep Learning, Analytics, SQL, Machine Learning, Data Analysis, Mathematics, Linear Algebra, Natural Language Processing (NLP), Python, Language Models, Engineering, FastAPI, Serverless Architecture, OCR, OpenAI GPT-4 API, APIs, OpenAI GPT-3 API, Large Language Models (LLMs), OpenAI, Neural Networks, Deep Neural Networks

Senior Data Scientist

2018 - 2023
Liz.cx
  • Developed an initial classification model using TFIDF and SVC.
  • Automated the retraining pipeline for new data and categories. Deployed models via a Django REST API on AWS.
  • Incorporated a "human in the loop" pipeline for label verification using multiple deep-learning algorithms.
Technologies: Scikit-learn, Machine Learning, Natural Language Processing (NLP), Support Vector Machines (SVM), Python 3, Django, Deep Neural Networks

Lead Data Scientist

2017 - 2023
Inmind (Recurrent Client)
  • Developed a parser for data extraction from OCR. Automated data cleaning processes. Deployed the solution via FastAPI as a REST API.
  • Trained an algorithm to extract entities from PDF resumes using text and meta-data.
  • Developed a classification algorithm for job category prediction. Used transfer learning for the description algorithm. Deployed the Keras model behind a Flask API.
Technologies: Keras, Python, Flask, TensorBoard, Generative Pre-trained Transformers (GPT), APIs, Large Language Models (LLMs), Neural Networks, Deep Neural Networks

Machine Learning Engineer

2021 - 2022
PianoConvert
  • Benchmarked multiple machine learning algorithms to perform an audio to midi transcription for piano. The most efficient way I found was to split the task into two models.
  • Fine-tuned an open-source model for hand-splitting piano scores. Enhanced response time and post-processing for accuracy. Created and deployed a serverless Lambda container pipeline.
  • Benchmarked several serverless GPU platforms. Optimized cold start time and memory usage. Deployed a deep learning algorithm in ONNX on Banana.dev.
Technologies: PyTorch, Open Neural Network Exchange (ONNX), AWS Lambda, Docker, APIs, Neural Networks, Deep Neural Networks

Senior Data Scientist

2019 - 2022
Mindee
  • Directed the continuous improvement of a receipt-processing API that extracts essential information from images. Reduced response time and memory footprint by 75% and improved accuracy.
  • Developed deep-learning computer vision algorithms for document processing in TensorFlow, such as OCR, segmentation, and classification.
  • Designed synthetic data generators to train these models without manually labeled data.
  • Created a cleaning tool to improve data quality automatically.
Technologies: Python 3, TensorFlow, Scikit-learn, NumPy, Computer Vision, Data Science, PyTorch, Artificial Intelligence (AI), Deep Learning, Machine Learning, Data Analysis, Mathematics, Linear Algebra, Python, API Integration, Engineering, FastAPI, OCR, Object Detection, APIs, Neural Networks, Deep Neural Networks

Senior Data Scientist

2021 - 2021
Whyse
  • Cleaned and vectorized text and metadata using Word2Vec, Doc2Vec, and SBERT.
  • Fine-tuned multiple models for sentence similarity.
  • Benchmarked clustering and dimension reduction techniques.
Technologies: Python, Generative Pre-trained Transformers (GPT), Neural Networks, Deep Neural Networks

Senior Data Scientist

2021 - 2021
FizzUp
  • Analyzed a MySQL database containing customer and transaction data.
  • Segmented customers and projected future revenues, computed Life Time Value (LTV).
  • Developed a web-app with Plotly Dash for insights visualization.
  • Presented findings and recommendations to the COO and team.
Technologies: Python 3, Pandas, Plotly, Dash, Scikit-learn, APIs

Senior Data Scientist

2020 - 2020
Beamy
  • Consolidated and cleaned company classification datasets.
  • Developed a data cleaning and labeling process to extract entities from logs.
  • Trained a machine learning model using text and metadata.
Technologies: Python, Scikit-learn, Pandas, Machine Learning, Natural Language Processing (NLP), Regex

Senior Data Scientist

2019 - 2019
Diplomeo
  • Analyzed LinkedIn data to categorize student profiles and academic institutions.
  • Vectorized LinkedIn profiles using Word2Vec. Clusterized the institutions using DBSCAN.
  • Developed a recommendation system to match students with institutions.
  • Presented algorithm results and vectors via Tensorboard on Heroku.
Technologies: Python 3, TensorBoard, Deep Learning, Keras, Regex, Scikit-learn

Data Scientist

2017 - 2019
Orange Bank
  • Managed a team of two data scientists for a fraud detection task. Tested, supervised (XGBoost), and unsupervised (auto-encoders) algorithms with financial analysts and achieved a recall of 85%.
  • Developed NLP algorithms to improve conversational frameworks like Rasa and Watson, including sentiment analysis, entity extraction, and intent classification.
  • Aggregated and cleaned online posts from various sources, such as Twitter, Facebook, app stores, and blogs, to prepare training corpora adapted to the mobile banking industry.
  • Designed a social media post analysis tool for the marketing team.
Technologies: Python 3, Scikit-learn, Rasa.ai, Rasa NLU, Pandas, SQL, Data Analysis, Data Science, Flask, NumPy, Jupyter Notebook, Artificial Intelligence (AI), Deep Learning, Analytics, Machine Learning, Mathematics, Linear Algebra, Natural Language Processing (NLP), Python, API Integration, Engineering, FastAPI, OCR, Neural Networks, Deep Neural Networks

Data Scientist

2015 - 2017
Clustaar
  • Developed an NLP platform in French and English using Python and Scala.
  • Built an entity and intents extractor to populate chatbot conversations automatically and reduce the bot design time.
  • Installed and optimized a parallel calculus framework, Spark, to achieve the NLP tools' scalability.
Technologies: Python 3, Scikit-learn, SciPy, NumPy, Scala, Spark, Python, Analytics, API Integration, Data Science, Flask, Jupyter Notebook, Artificial Intelligence (AI), Deep Learning, SQL, Machine Learning, Data Analysis, Mathematics, Linear Algebra, Natural Language Processing (NLP), Engineering

IT Consultant

2014 - 2015
Mazars USA
  • Developed a fraud-detection system using machine learning.
  • Completed IT general-control audits, including security review, risk assessment, and automation of these processes.
  • Performed consulting technology missions, such as data mining and penetration testing in the energy and financial sectors.
Technologies: Fraud Audits, Anomaly Detection, Know Your Customer (KYC), Python 3, Excel VBA, Engineering

Twitter Dashboard | French 2017 Elections

https://bbreton3.github.io/big-bang-data/
Developed a microservice-based app to track the evolution of topics and sentiments mentioned on Twitter during the 2017 French presidential elections. The topic modeling was updated daily based on the latest trends.

Discrete Simulation Monte Carlo

Developed a new method for the US Air Force to model fluid flow in a very low-pressure environment. I wrote a model to simulate a Couette flow between two plates in Fortran 90. Established an explicit coupling between the direct simulation Monte Carlo and the Navier-Stokes equation.

Object Recognition API

I made an object-recognition API using GoogleLens and OpenAI APIs.
I used an LLM and some prompt engineering to classify objects into an eCommerce taxonomy.
I improved the optimizer response time and benchmarked multiple LLMs and several prompt engineering techniques.
2012 - 2014

Master's Degree in Mechanical Engineering

The Georgia Institute of Technology - Atlanta, USA

2007 - 2012

Master's Degree in Mechanical Engineering

Arts et Metiers ParisTech - Paris, France

Libraries/APIs

TensorFlow, Scikit-learn, Pandas, SciPy, NumPy, PyTorch, Rasa NLU, Keras

Tools

ChatGPT, Rasa.ai, Plotly, TensorBoard, Open Neural Network Exchange (ONNX), You Only Look Once (YOLO), Google Lens

Languages

Python 3, Python, Scala, Excel VBA, Fortran, SQL, Regex

Frameworks

Flask, Spark, Django

Paradigms

Anomaly Detection, Serverless Architecture, Data Science

Platforms

Google Cloud Platform (GCP), Cloud Run, Jupyter Notebook, AWS Lambda, Docker

Other

Machine Learning, Data Analysis, Natural Language Processing (NLP), Computer Vision, API Integration, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Deep Learning, Language Models, FastAPI, OCR, Object Detection, APIs, OpenAI GPT-3 API, Large Language Models (LLMs), OpenAI, Generative Pre-trained Transformer 3 (GPT-3), Neural Networks, Deep Neural Networks, Statistics, Time Series, Vertex, Google BigQuery, Fraud Audits, Know Your Customer (KYC), Numerical Methods, Simulations, Stochastic Modeling, Mechanical Engineering, Fluid Mechanics, Vibration Analysis, Physics, Mathematics, Calculus, Engineering, Linear Algebra, Advanced Physics, Analytics, Data Visualization, OpenAI GPT-4 API, Support Vector Machines (SVM), Dash, Mistral AI, LangChain, Open-source LLMs

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring