Valentin Lehuger, Software Developer in Paris, France
Valentin Lehuger

Software Developer in Paris, France

Member since June 18, 2020
Valentin has seven years of experience in both startups and big French tech companies. He mainly worked as a back-end data engineer with Scala and Python. He is also familiar with working with Hadoop and Spark, developing data pipelines, and architecting data warehouses to extract value from terabytes of data. Valentin has recently been CTO for a YC tech startup leading a team of 10 to build complex front-end software as well as a full data processing engine in the back end.
Valentin is now available for hire


  • Actiondesk
    Scala, Vue, Vuex, TypeScript, Apache Kafka, PostgreSQL, Google BigQuery...
  • Deezer
    Apache Hive, HDFS, Hadoop, Apache Kafka, Spark, Scala, SQL, ETL...
  • Artefact
    Apache Kafka, Storm, Hadoop, PostgreSQL, Spark, Scala, Django, Python, SQL...



Paris, France



Preferred Environment

Git, MacOS, Visual Studio Code, Scala, Python, TypeScript, JavaScript, Vue, Google Cloud Platform (GCP)

The most amazing...

...project I've worked on is the refactoring of the most critical ETL of Deezer that calculates data used to compute recommendation, royalties, and analytics.


  • CTO

    2019 - 2022
    • Developed a spreadsheet application connected to dozens of integrations to make data engineering accessible to non-technical business users and automate their dashboards and reporting.
    • Conceived and led the implementation—first committer on every service for a long time—of the entire back end.
    • Managed a tech team of up to 10 engineers, including front-end, back-end, DevOps, and QA engineers.
    • Built dozens of connectors to databases and various APIs like Stripe, Hubspot, Google Analytics, and Quickbooks.
    • Created a formula engine to compute Excel-like formulas. The project was in ScalaJS to work both in the back end and in front end and have the exact same results.
    • Maintained a Kubernetes cluster for two years until I hired a DevOps engineer that I managed.
    • Implemented a sharing feature to share reports with charts on multiple channels, such as Slack, emails, etc.
    Technologies: Scala, Vue, Vuex, TypeScript, Apache Kafka, PostgreSQL, Google BigQuery, WebSockets, Canvas, Redis, Kubernetes, SQL, Data Pipelines, Data Build Tool (dbt), Data Engineering, APIs, Microsoft Excel
  • Data Engineer

    2017 - 2019
    • Developed and maintained the core ETLs in Scala Spark and streaming pipelines with Kafka and Spark.
    • Streamed to process 2.5TB/day to support 50+ engineers, analysts, scientists, and product managers.
    • Managed data warehousing on HDFS in ORC, Parquet, and AVRO formats.
    • Developed our own scheduler in Python that runs 2,000 jobs per day.
    Technologies: Apache Hive, HDFS, Hadoop, Apache Kafka, Spark, Scala, SQL, ETL, Data Pipelines, Data Engineering
  • Data Engineer

    2016 - 2017
    • Developed ETLs in PySpark in collaboration with data scientists.
    • Led as main contributor the internal data collection software processing 500GB per day.
    • Performed R&D for a stream processing project using Storm and Kafka.
    Technologies: Apache Kafka, Storm, Hadoop, PostgreSQL, Spark, Scala, Django, Python, SQL, ETL, Data Pipelines, Data Engineering, APIs
  • Data Scientist

    2015 - 2016
    • Optimized item ordering of product listings for major clothing retailers websites.
    • Developed user segmentation and buying prediction algorithms.
    • Optimized recommender systems parallelizing algorithms (ALS-WR) with CUDA.
    Technologies: CUDA, C, BigQuery, Python, R, SQL
  • Back-end Engineer

    2014 - 2015
    Pricing Assistant
    • Developed an eCommerce page parser.
    • Developed product matchers in Python.
    Technologies: Pandas, Flask, Python


  • Full Spreadsheet Application

    A spreadsheet SAAS connected to 80+ databases and external tools.
    As the CTO of Actiondesk, I created the architecture and was the lead developer to build the front-end application with Vue and a canvas rendering, as well as a complex data engine back end integrated with dozens of different DBs and external tools. I managed a team of up to 10 engineers, including front-end, DevOps, and all in between.

  • Facial Recognition

    Wrote an entire facial recognition system. I reimplemented a math library based on standard library only and the computer vision algorithms (Eigenface and neural network). This was for a school project.

  • Migrated Critical Data Pipelines from Pig to Spark

    Migrated the most critical data pipeline that made available the streams data (2.5+TB per day) ingested in daily batch to 60+ analysts and scientists for a leading music streaming company.
    The migration saved more than 33% of computing time, making the data available before the analysts started their working day.
    I worked on migrating the pipeline from Hive and Pig script to Spark with Scala, optimizing and simplifying the transformations.


  • Languages

    SQL, Scala, Python, TypeScript, JavaScript, R, C, C++
  • Platforms

    Visual Studio Code, MacOS, Docker, Amazon Web Services (AWS), Google Cloud Platform (GCP), CUDA, Apache Pig, Apache Kafka, Kubernetes
  • Storage

    PostgreSQL, Data Pipelines, Redis, Apache Hive, HDFS, MySQL, MongoDB
  • Other

    APIs, Google BigQuery, Data Engineering, Distributed Systems, WebSockets, Data Build Tool (dbt)
  • Frameworks

    Hadoop, Spark, Storm, Akka, Flask, Django
  • Libraries/APIs

    Vue, Pandas, Vuex
  • Tools

    Git, BigQuery, Microsoft Excel, IntelliJ IDEA, PyCharm, Ansible, Canvas
  • Paradigms

    Functional Programming, ETL, Agile Software Development, Actor Model


  • Master's Degree in Computer Engineering
    2013 - 2016
    42 University - Paris, France

To view more profiles

Join Toptal
Share it with others