Bruno Godefroy, Developer in Paris, France
Bruno is available for hire
Hire Bruno

Bruno Godefroy

Verified Expert  in Engineering

Bio

Bruno is a passionate and driven machine learning engineer with five years of industry experience building data pipelines and models in Python. He is well-versed in analyzing datasets, training predictive and other statistical machine learning models, and building and deploying software. Bruno has a master's degree in data science and another in computer science and engineering.

Portfolio

Europ Assistance
Azure, Azure DevOps, OpenAI GPT-4 API, Azure Functions, Python...
Stellantis
Python, PySpark, Apache Airflow, SQL, Pandas, Databricks, Git, TeamCity
Implicity
Python, Pandas, GitLab, CI/CD Pipelines, Serverless, Docker, PySpark, SQL...

Experience

Availability

Part-time

Preferred Environment

Linux, MacOS, Git, Visual Studio Code (VS Code)

The most amazing...

...project I've worked on was a Scala compiler school project that could almost compile itself.

Work Experience

Senior Data Engineer

2023 - PRESENT
Europ Assistance
  • Developed an in-house GPT-powered chatbot designed to provide accurate responses regarding the content of PDF documents.
  • Designed the project's architecture within the Azure framework.
  • Developed using Python, Docker, Azure, and Azure DevOps in collaboration with IT teams in India.
Technologies: Azure, Azure DevOps, OpenAI GPT-4 API, Azure Functions, Python, Artificial Intelligence (AI), API Integration

Data Engineer | Data Scientist

2022 - 2023
Stellantis
  • Released major upgrades to an internal library used by many engineering teams. Developed and improved some ETL pipelines for cleaning and aggregating connected vehicle data (volume of data: around 6TB/day).
  • Collaborated with domain experts on a €40 million/year quality issue using automatic anomaly detection on sensor recordings.
  • Mentored some junior data engineers and extended the contract from 3 to 12 months.
Technologies: Python, PySpark, Apache Airflow, SQL, Pandas, Databricks, Git, TeamCity

Software Engineer | Data Engineer

2019 - 2022
Implicity
  • Developed a Python module for monitoring the daily weight of about one thousand patients at risk of heart failure, using data from connected weight scales. Developed internal tools for anonymizing and parsing some PDF documents.
  • Helped significantly improve the speed and reliability of some ETL pipelines for importing data of around 40,000 patients from French hospitals and cardiac device manufacturers. Participated in refactoring the database of patient data.
  • Contributed to recruiting efforts by interviewing college-level and experienced data scientists.
Technologies: Python, Pandas, GitLab, CI/CD Pipelines, Serverless, Docker, PySpark, SQL, PiLLoW

Software Engineer | Data Scientist

2018 - 2019
Roam Analytics (acquired by Parexel)
  • Integrated GloVe, FastText, and ELMo into an internal natural language processing (NLP) library. Finetuned GloVe vectors on the PubMed dataset. Analyzed some customer text corpus and trained NER models.
  • Contributed to two scientific papers for NLP on clinical text with Christopher Potts (implementation of the models, experiments, redaction); (cf. https://arxiv.org/search/cs?searchtype=author&query=Godefroy%2C+B).
  • Managed crowdsourcing for the company (budget: $80,000/year) with the partner Appen. Implemented an expectation–maximization algorithm for inferring gold labels from noisy labels.
Technologies: Machine Learning, Data Science, Deep Learning, Python, TensorFlow, Docker, GitHub, Jenkins, NumPy, Pandas, Kubernetes, Neo4j, SpaCy, LaTeX, Artificial Intelligence (AI)

A Specialized Search Engine

A simple search engine for everything related to the Roam Research web app (roamresearch.com). The search engine won a $7,000 cash prize at an international contest, and around 50 people use the website daily.

Deliverables:
• Developed modules for scraping Slack, Twitter, and Roam Research using Python and Selenium.
• Developed a simple search engine web app using Python, Elasticsearch, and JavaScript.
• Deployed the project with Docker, GitHub Actions, and AWS.

Syllabics.io

https://syllabics.io
I have created a tool to improve my English pronunciation.

• Deep Learning: Studied the state of the art in automatic speech recognition. Trained a denoising autoencoder to make speech features more robust to variations in the speaker's voice, background noise, and microphone. Trained and evaluated many deep-learning models for phoneme recognition and voice activity detection.

• User Experience: Developed multiple web app prototypes and collected feedback from potential users.

• Serverless Architecture: Optimized cost, response time, and scalability by switching to a serverless architecture. The analysis of one thousand recordings by the website currently costs around €0.05, with a response time of around one second, and the system can scale almost infinitely.
2017 - 2018

Master's Degree in Data Science

KTH Royal Institute of Technology - Stockholm, Sweden

2013 - 2018

Master's Degree in Computer Science and Engineering

Institut National des Sciences Appliquées (INSA) - Lyon, France

Libraries/APIs

Keras, Scikit-learn, SpaCy, Pandas, P5.js, TensorFlow, Natural Language Toolkit (NLTK), PiLLoW, PyTorch, NumPy, PySpark

Tools

Jupyter, GitHub, GitLab CI/CD, Git, Sublime Text 3, MATLAB, Named-entity Recognition (NER), LaTeX, Jenkins, GitLab, Apache Airflow, TeamCity, Wav2Vec 2.0

Languages

Python, SQL, Java, C++, Scala, JavaScript, Bash

Frameworks

Spark, Flask, Selenium

Paradigms

Agile Software Development, Azure DevOps

Platforms

Amazon Web Services (AWS), Docker, Linux, MacOS, Oracle, Kubernetes, AWS Elastic Beanstalk, Visual Studio Code (VS Code), Databricks, Azure, Azure Functions

Storage

Neo4j, Amazon S3 (AWS S3), Databases, Elasticsearch

Other

Machine Learning, Data Science, API Integration, Deep Learning, Natural Language Processing (NLP), Data Visualization, Data Analysis, Amazon Mechanical Turk (MTurk), Generative Pre-trained Transformers (GPT), Artificial Intelligence (AI), Crowdsourcing, Appen, CI/CD Pipelines, Serverless, OpenAI GPT-4 API, Datasets

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring