Benjamin Breton
Verified Expert in Engineering
Data Scientist and Developer
South Bend, IN, United States
Toptal member since November 7, 2022
Benjamin is passionate about data science and enjoys operating in different sectors. He aims to identify business needs, design an adapting solution, and create value from data. Benjamin has prolific professional experience and has collaborated with 30 startups and large companies during 45 missions.
Portfolio
Experience
Availability
Preferred Environment
Python 3, TensorFlow, Scikit-learn, Pandas, Flask, PyTorch, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Open-source LLMs
The most amazing...
...thing I've achieved is the state-of-the-art result on an OCR document, reducing response time by 75%.
Work Experience
Lead Data Scientist
Sogexia
- Analyzed customer data to identify growth potential. Cleaned datasets for accurate analysis.
- Filled missing data using the GPT-4 API to fill missing data using few-shots learning.
- Conducted statistical analysis for insights. Performed customer clustering to segment user base.
- Shared a detailed report of the analysis to the CEO with business recommendations.
Senior Machine Learning Engineer
Evalmee
- Optimized three open-source computer vision algorithms using ONNX.
- Deployed optimized models on AWS Lambda using Docker.
- Developed a custom detection algorithm for cheating detection during exams.
- Trained a YOLOv5 detection model with SparseML toolbox and deployed on AWS Lambda.
Machine Learning Engineer
La Touche Musicale
- Optimized a music2midi TensorFlow model to reduce the response time for a mobile app. O.
- Retrained the model according to the research paper.
- Reduced the model size for mobile app integration using TensorFlow Lite.
Lead Data Scientist
L'Oreal
- Led a team of five data scientists and two data engineers to create a tool to model and optimize their budget for marketing campaigns. It is called MMM or Mix Marketing Modeling.
- Coordinated between multiple providers to model customer responses to several marketing campaigns and create "response curves." Integrated those response curves into an optimization tool.
- Deployed the solution to business analytics through a REST API. Collected feedback from the business team to improve the tool's response.
- Scaled the platform to optimize a marketing budget of more than $1 billion.
Senior Data Scientist
Liz.cx
- Developed an initial classification model using TFIDF and SVC.
- Automated the retraining pipeline for new data and categories. Deployed models via a Django REST API on AWS.
- Incorporated a "human in the loop" pipeline for label verification using multiple deep-learning algorithms.
Lead Data Scientist
Inmind (Recurrent Client)
- Developed a parser for data extraction from OCR. Automated data cleaning processes. Deployed the solution via FastAPI as a REST API.
- Trained an algorithm to extract entities from PDF resumes using text and meta-data.
- Developed a classification algorithm for job category prediction. Used transfer learning for the description algorithm. Deployed the Keras model behind a Flask API.
Machine Learning Engineer
PianoConvert
- Benchmarked multiple machine learning algorithms to perform an audio to midi transcription for piano. The most efficient way I found was to split the task into two models.
- Fine-tuned an open-source model for hand-splitting piano scores. Enhanced response time and post-processing for accuracy. Created and deployed a serverless Lambda container pipeline.
- Benchmarked several serverless GPU platforms. Optimized cold start time and memory usage. Deployed a deep learning algorithm in ONNX on Banana.dev.
Senior Data Scientist
Mindee
- Directed the continuous improvement of a receipt-processing API that extracts essential information from images. Reduced response time and memory footprint by 75% and improved accuracy.
- Developed deep-learning computer vision algorithms for document processing in TensorFlow, such as OCR, segmentation, and classification.
- Designed synthetic data generators to train these models without manually labeled data.
- Created a cleaning tool to improve data quality automatically.
Senior Data Scientist
Whyse
- Cleaned and vectorized text and metadata using Word2Vec, Doc2Vec, and SBERT.
- Fine-tuned multiple models for sentence similarity.
- Benchmarked clustering and dimension reduction techniques.
Senior Data Scientist
FizzUp
- Analyzed a MySQL database containing customer and transaction data.
- Segmented customers and projected future revenues, computed Life Time Value (LTV).
- Developed a web-app with Plotly Dash for insights visualization.
- Presented findings and recommendations to the COO and team.
Senior Data Scientist
Beamy
- Consolidated and cleaned company classification datasets.
- Developed a data cleaning and labeling process to extract entities from logs.
- Trained a machine learning model using text and metadata.
Senior Data Scientist
Diplomeo
- Analyzed LinkedIn data to categorize student profiles and academic institutions.
- Vectorized LinkedIn profiles using Word2Vec. Clusterized the institutions using DBSCAN.
- Developed a recommendation system to match students with institutions.
- Presented algorithm results and vectors via Tensorboard on Heroku.
Data Scientist
Orange Bank
- Managed a team of two data scientists for a fraud detection task. Tested, supervised (XGBoost), and unsupervised (auto-encoders) algorithms with financial analysts and achieved a recall of 85%.
- Developed NLP algorithms to improve conversational frameworks like Rasa and Watson, including sentiment analysis, entity extraction, and intent classification.
- Aggregated and cleaned online posts from various sources, such as Twitter, Facebook, app stores, and blogs, to prepare training corpora adapted to the mobile banking industry.
- Designed a social media post analysis tool for the marketing team.
Data Scientist
Clustaar
- Developed an NLP platform in French and English using Python and Scala.
- Built an entity and intents extractor to populate chatbot conversations automatically and reduce the bot design time.
- Installed and optimized a parallel calculus framework, Spark, to achieve the NLP tools' scalability.
IT Consultant
Mazars USA
- Developed a fraud-detection system using machine learning.
- Completed IT general-control audits, including security review, risk assessment, and automation of these processes.
- Performed consulting technology missions, such as data mining and penetration testing in the energy and financial sectors.
Experience
Twitter Dashboard | French 2017 Elections
https://bbreton3.github.io/big-bang-data/Discrete Simulation Monte Carlo
Object Recognition API
I used an LLM and some prompt engineering to classify objects into an eCommerce taxonomy.
I improved the optimizer response time and benchmarked multiple LLMs and several prompt engineering techniques.
Education
Master's Degree in Mechanical Engineering
The Georgia Institute of Technology - Atlanta, USA
Master's Degree in Mechanical Engineering
Arts et Metiers ParisTech - Paris, France
Skills
Libraries/APIs
TensorFlow, Scikit-learn, Pandas, SciPy, NumPy, PyTorch, Rasa NLU, Keras
Tools
ChatGPT, Rasa.ai, Plotly, TensorBoard, Open Neural Network Exchange (ONNX), You Only Look Once (YOLO), Google Lens
Languages
Python 3, Python, Scala, Excel VBA, Fortran, SQL, Regex
Frameworks
Flask, Spark, Django
Paradigms
Anomaly Detection, Serverless Architecture
Platforms
Google Cloud Platform (GCP), Cloud Run, Jupyter Notebook, AWS Lambda, Docker
Other
Machine Learning, Data Analysis, Natural Language Processing (NLP), Computer Vision, API Integration, Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Deep Learning, Language Models, FastAPI, OCR, Object Detection, APIs, OpenAI GPT-3 API, Large Language Models (LLMs), OpenAI, Generative Pre-trained Transformer 3 (GPT-3), Neural Networks, Deep Neural Networks, Statistics, Time Series, Vertex, Google BigQuery, Fraud Audits, Know Your Customer (KYC), Numerical Methods, Simulations, Stochastic Modeling, Mechanical Engineering, Fluid Mechanics, Vibration Analysis, Physics, Mathematics, Calculus, Engineering, Linear Algebra, Advanced Physics, Data Science, Analytics, Data Visualization, OpenAI GPT-4 API, Support Vector Machines (SVM), Dash, Mistral AI, LangChain, Open-source LLMs
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring