Bruno Godefroy
Verified Expert in Engineering
Machine Learning Engineer and Developer
Paris, France
Toptal member since April 12, 2022
Bruno is a passionate and driven machine learning engineer with five years of industry experience building data pipelines and models in Python. He is well-versed in analyzing datasets, training predictive and other statistical machine learning models, and building and deploying software. Bruno has a master's degree in data science and another in computer science and engineering.
Portfolio
Experience
Availability
Preferred Environment
Linux, MacOS, Git, Visual Studio Code (VS Code)
The most amazing...
...project I've worked on was a Scala compiler school project that could almost compile itself.
Work Experience
Senior Data Engineer
Europ Assistance
- Developed an in-house GPT-powered chatbot designed to provide accurate responses regarding the content of PDF documents.
- Designed the project's architecture within the Azure framework.
- Developed using Python, Docker, Azure, and Azure DevOps in collaboration with IT teams in India.
Data Engineer | Data Scientist
Stellantis
- Released major upgrades to an internal library used by many engineering teams. Developed and improved some ETL pipelines for cleaning and aggregating connected vehicle data (volume of data: around 6TB/day).
- Collaborated with domain experts on a €40 million/year quality issue using automatic anomaly detection on sensor recordings.
- Mentored some junior data engineers and extended the contract from 3 to 12 months.
Software Engineer | Data Engineer
Implicity
- Developed a Python module for monitoring the daily weight of about one thousand patients at risk of heart failure, using data from connected weight scales. Developed internal tools for anonymizing and parsing some PDF documents.
- Helped significantly improve the speed and reliability of some ETL pipelines for importing data of around 40,000 patients from French hospitals and cardiac device manufacturers. Participated in refactoring the database of patient data.
- Contributed to recruiting efforts by interviewing college-level and experienced data scientists.
Software Engineer | Data Scientist
Roam Analytics (acquired by Parexel)
- Integrated GloVe, FastText, and ELMo into an internal natural language processing (NLP) library. Finetuned GloVe vectors on the PubMed dataset. Analyzed some customer text corpus and trained NER models.
- Contributed to two scientific papers for NLP on clinical text with Christopher Potts (implementation of the models, experiments, redaction); (cf. https://arxiv.org/search/cs?searchtype=author&query=Godefroy%2C+B).
- Managed crowdsourcing for the company (budget: $80,000/year) with the partner Appen. Implemented an expectation–maximization algorithm for inferring gold labels from noisy labels.
Experience
A Specialized Search Engine
Deliverables:
• Developed modules for scraping Slack, Twitter, and Roam Research using Python and Selenium.
• Developed a simple search engine web app using Python, Elasticsearch, and JavaScript.
• Deployed the project with Docker, GitHub Actions, and AWS.
Syllabics.io
https://syllabics.io• Deep Learning: Studied the state of the art in automatic speech recognition. Trained a denoising autoencoder to make speech features more robust to variations in the speaker's voice, background noise, and microphone. Trained and evaluated many deep-learning models for phoneme recognition and voice activity detection.
• User Experience: Developed multiple web app prototypes and collected feedback from potential users.
• Serverless Architecture: Optimized cost, response time, and scalability by switching to a serverless architecture. The analysis of one thousand recordings by the website currently costs around €0.05, with a response time of around one second, and the system can scale almost infinitely.
Education
Master's Degree in Data Science
KTH Royal Institute of Technology - Stockholm, Sweden
Master's Degree in Computer Science and Engineering
Institut National des Sciences Appliquées (INSA) - Lyon, France
Skills
Libraries/APIs
Keras, Scikit-learn, SpaCy, Pandas, P5.js, TensorFlow, Natural Language Toolkit (NLTK), PiLLoW, PyTorch, NumPy, PySpark
Tools
Jupyter, GitHub, GitLab CI/CD, Git, Sublime Text 3, MATLAB, Named-entity Recognition (NER), LaTeX, Jenkins, GitLab, Apache Airflow, TeamCity, Wav2Vec 2.0
Languages
Python, SQL, Java, C++, Scala, JavaScript, Bash
Frameworks
Spark, Flask, Selenium
Paradigms
Agile Software Development, Azure DevOps
Platforms
Amazon Web Services (AWS), Docker, Linux, MacOS, Oracle, Kubernetes, AWS Elastic Beanstalk, Visual Studio Code (VS Code), Databricks, Azure, Azure Functions
Storage
Neo4j, Amazon S3 (AWS S3), Databases, Elasticsearch
Other
Machine Learning, Data Science, API Integration, Deep Learning, Natural Language Processing (NLP), Data Visualization, Data Analysis, Amazon Mechanical Turk (MTurk), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Crowdsourcing, Appen, CI/CD Pipelines, Serverless, OpenAI GPT-4 API, Datasets
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring