Olivier Manns, Developer in Toulouse, France
Olivier is available for hire
Hire Olivier

Olivier Manns

Verified Expert  in Engineering

Data Engineer and Software Developer

Location
Toulouse, France
Toptal Member Since
June 12, 2020

Olivier is a big data engineer skilled in distributed processing, cloud architecture, data visualization, and machine learning. He has been processing terabytes of data as a team player to help hundreds of automotive engineers build your next smart vehicle. With experiences in R&D centers, retail and banking environments, Olivier knows how to make the most of your data.

Portfolio

Clas Ohlson AB
Python, SQL, Data Pipelines, Data Engineering, Azure, Azure Synapse...
Continental
Distributed Computing, Data Pipelines, Metabase, NumPy, Spark ML, Pandas...
Continental
Distributed Computing, Data Pipelines, Metabase, NumPy, Spark ML, Pandas...

Experience

Availability

Part-time

Preferred Environment

Amazon Web Services (AWS), Data Pipelines, Jupyter Notebook, Visual Studio Code (VS Code), Unix, Azure, Python, Data Build Tool (dbt)

The most amazing...

...cloud data platform I've built has made it possible to track the state of health of thousands of vehicles and components across the globe, in real time.

Work Experience

Data Engineer

2020 - PRESENT
Clas Ohlson AB
  • Implemented a production-ready data platform, infrastructure as code, relying on a big data stack for scaling, enabling company-wide data science, machine learning, and data services projects to run.
  • Created a fully automated ELT system, getting data from more than six various sources. Ran daily, fully automated with tests and alerts, using a mix of Azure services and DBT to keep the cost low.
  • Deployed and configured an "Airflow-like" orchestration tool (Prefect) to reduce the manual work and ease data pipelines and ML pipeline management.
  • Created and configured tools and data-related services for data scientists, data analysts, and business workers.
  • Provided expertise formation and advice to the teams over data engineering concepts, cloud infrastructure, network security, and best practices.
Technologies: Python, SQL, Data Pipelines, Data Engineering, Azure, Azure Synapse, Data Science, Azure DevOps, Azure Data Factory, Azure Databricks, Terraform, Azure Machine Learning, Azure Virtual Machines, Docker, Spark, Serverless, ELT, Data Build Tool (dbt), Databases, PySpark, GitHub, Data, Microsoft SQL Server, Data Warehousing, Data Modeling, Data Warehouse Design, Azure Logic Apps, ETL Tools, Query Optimization

Big Data Engineer

2017 - 2020
Continental
  • Designed and deployed an entire scalable data platform for vehicle component real-time monitoring, from data collection to interactive visualization.
  • Made it possible for 100+ automotive engineers to query terabytes of structured data daily within seconds.
  • Implemented every piece of architecture as infrastructure as code (IaC) using Terraform for modularity and flexibility.
  • Built streaming and batch data pipelines for real-time and specific data ingestion and analysis.
  • Anticipated the increase of data quantity: taking advantage of serverless code, distributed processing with Spark (Scala), and queries with Athena (SQL).
  • Plugged in an interchangeable data visualization tool for business needs versatility.
  • Implemented an automated data pipeline creation process for new projects and customers, requiring an engineer for custom needs only.
Technologies: Distributed Computing, Data Pipelines, Metabase, NumPy, Spark ML, Pandas, Jupyter Notebook, Visual Studio Code (VS Code), Parallel Computing, Terraform, Shell, Data Architecture, Unix, Big Data, Amazon Athena, Data Visualization, Data Engineering, Go, Scala, Machine Learning, Amazon Kinesis, Data Science, Amazon Web Services (AWS), Apache Spark, Amazon S3 (AWS S3), SQL, AWS Lambda, Serverless, Python, Spark, ELT, Databases, GitHub, Data, Data Warehousing, Data Modeling, Data Warehouse Design, ETL Tools, Query Optimization

Machine Learning Engineer

2017 - 2020
Continental
  • Developed many methods and machine learning models to improve Continental's component lifespan thanks to vehicle data exploitation.
  • Filed two patents for predictive diagnosis and failure prevention of vehicle components thanks to data acquisition and machine learning models.
  • Created machine learning models, tuned feature engineering to improve physical models of engine behaviors, and pollutant emissions.
  • Analyzed a large quantity of data for exploration and feasibility studies using Amazon EMR/EC2 with Spark ML/scikit-learn.
Technologies: Distributed Computing, Data Pipelines, Metabase, NumPy, Spark ML, Pandas, Jupyter Notebook, Parallel Computing, Big Data, Amazon Athena, Data Analysis, Data Visualization, Machine Learning, Data Science, Amazon Web Services (AWS), Apache Spark, Amazon S3 (AWS S3), Scikit-learn, Keras, Python, GitHub, Data

Data Engineer

2016 - 2017
Société Générale
  • Wrote technical specifications and gave expert advice to a team of ETL developers.
  • ETL development, complex transformation, and loading of TB of data into Teradata thanks to both IBM Datastage and TPT scripts.
  • Optimized many SQL queries for performance improvement (Insert, update, and select).
  • Managed the Teradata database to ensure availability, scalability, and performance.
  • Resolved critical and complex issues and bugs in ETL pipelines, database management, and Unix systems.
Technologies: IBM InfoSphere (DataStage), Shell, Data Architecture, Unix, Big Data, Data Engineering, Python, Teradata, Datastage, SQL, ETL, Databases, Data, Data Warehousing, Data Modeling, Data Warehouse Design, ETL Tools, Query Optimization

Data Engineer

2016 - 2016
Thales
  • Developed and improved ETL and BI processing on Oracle tools: ODI, OBI, and Oracle database (11g).
  • Analyzed data integrity and custom calculation bugs.
  • Optimized SQL queries for business stakeholders and general performance.
  • Managed Oracle database to ensure availability, scalability, and performance.
  • Resolved critical and complex issues and bugs in ETL pipelines, database management, and Unix systems.
Technologies: Shell, Unix, Data Visualization, Data Engineering, Business Intelligence (BI), Oracle Business Intelligence Applications (OBIA), Oracle Database, Oracle Data Integrator (ODI), SQL, Databases, Data, Data Warehousing, Data Modeling, ETL Tools, Query Optimization

Data Engineer

2016 - 2016
La Banque Postale
  • Implemented a model for bank check fraud detection in the ETL step.
  • Monitored metrics and traceability to ensure performance and data veracity.
  • Optimized SQL queries for automatic insert, update, and select statements.
Technologies: Data Engineering, ETL, Python, SAS, SQL, Databases, Data, Data Warehousing, Data Modeling, ETL Tools

Data Miner

2015 - 2015
CEA
  • United and helped scientific researchers and industrial companies for CEA's research valorization.
  • Analyzed competitors thanks to data mining and analysis on patents and scientific publications.
  • Analyzed and confirmed the patentability of CEA scientific inventions compared to state-of-the-art techniques.
Technologies: Data Analysis, Data Visualization, Machine Learning, Business Intelligence (BI), Orbit Intellixir, SQL, Data

Distributing Calculation of Rainflow-counting Algorithm with Spark

In order to design an automotive component, engineers need to execute multiple stress and fatigue tests to ensure reliability. Thanks to IoT and cloud services, I designed a data pipeline to get sensors data directly from on-street vehicle in order to have a daily overview of the fatigue of components across a fleet of vehicles. The fatigue can be evaluated thanks to the "Rainflow-counting" algorithm, by taking advantage of a sensor representative of the component's state-of-health.
This algorithm is time-series oriented and can not be easily distributed over a cluster of worker. This involves long processing times and very limited scalability. The input data consists of vehicles ID, components ID and multiple sensors information sampled at 20 milliseconds, over thousands of hours of driving.
By taking advantage of the specificity in the data, understanding the automotive engineers' real needs, partitioning the data in an effective way and re-implementing the algorithm, I have achieved the parallelization of this processing with Apache Spark. Thus, by accepting a 0.03% mean error on the results, I sped up this processing duration from 28 hours to 5 minutes on the same cluster size.

Languages

SQL, Python, Scala, Go, SAS

Tools

Amazon Athena, Terraform, IBM InfoSphere (DataStage), Shell, GitHub, Oracle Business Intelligence Applications (OBIA), Azure Machine Learning, Azure Logic Apps

Paradigms

ETL, Parallel Computing, Business Intelligence (BI), Data Science, Distributed Computing, Azure DevOps

Platforms

Amazon Web Services (AWS), Oracle Data Integrator (ODI), Oracle Database, Unix, Jupyter Notebook, AWS Lambda, Azure, Azure Synapse, Docker, Visual Studio Code (VS Code)

Storage

Data Pipelines, Amazon S3 (AWS S3), Databases, Teradata, Microsoft SQL Server, Datastage

Other

Data Engineering, Data Architecture, ELT, Data, Data Warehousing, Data Modeling, Data Warehouse Design, ETL Tools, Query Optimization, Data Visualization, Big Data, Data Analysis, Amazon Kinesis, Azure Data Factory, Data Build Tool (dbt), Machine Learning, Serverless, Metabase, Orbit Intellixir, Azure Databricks, Azure Virtual Machines

Frameworks

Apache Spark, Spark

Libraries/APIs

PySpark, Keras, Scikit-learn, Pandas, Spark ML, NumPy

2012 - 2015

Master's Degree in Industrial Engineering

ENSIACET, part of Grandes Écoles of Engineering - Toulouse, France

2010 - 2012

Bachelor of Science Degree in Mathematics and Physics

Lycée Bellevue - Toulouse, France

AUGUST 2016 - PRESENT

Certified Oracle BI 12 Administrator

Oracle

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring