François Le Lay, Developer in Setauket- East Setauket, NY, United States
François is available for hire
Hire François

François Le Lay

Verified Expert  in Engineering

Bio

François is a seasoned leader with experience building data platforms and machine learning solutions at major technology companies and startups in B2C and B2B settings. François has spent seven years at Spotify building data infrastructure teams as a manager and leveraging machine learning techniques to improve the music catalog as a staff engineer.

Portfolio

Psykhe AI
MLflow, PyTorch, Kubernetes, Red Panda, Google Cloud Platform (GCP)...
Spark Space Inc
AI Programming, Modal Labs, Generative Artificial Intelligence (GenAI), Luau...
Kensu
Amazon Web Services (AWS), Azure, Databricks, Spark, MLflow, Python 3, Data...

Experience

Availability

Part-time

Preferred Environment

Python 3, TensorFlow, PyTorch, Hugging Face Transformers, Jupyter Notebook, Computer Vision, Pandas, Amazon Web Services (AWS), Generative Artificial Intelligence (GenAI), Natural Language Processing (NLP)

The most amazing...

...AI project I've worked on has allowed Spotify to increase the data quality of artist entities and their related work in the company's music knowledge graph.

Work Experience

AI Engineer Advisor

2024 - 2024
Psykhe AI
  • Audited and provided guidance related to a machine learning stack (infrastructure and MLOps).
  • Conducted machine learning models research (multi-modal AI).
  • Handled R&D related to the integration of reinforcement learning into the recommender system.
Technologies: MLflow, PyTorch, Kubernetes, Red Panda, Google Cloud Platform (GCP), Generative Artificial Intelligence (GenAI), Recommendation Systems, Machine Learning Operations (MLOps), Docker

AI Engineer Advisor

2024 - 2024
Spark Space Inc
  • Developed a portfolio of GenAI models (Stable Diffusion) to produce Roblox avatar clothing based on a user prompt.
  • Learned Lua to contribute to the codebase of our Roblox games and contributed features both on the Python back end and Lua front end.
  • Deployed a Mixpanel solution to the Lua codebase (a very rare match due to the lack of integration SDK).
  • Pushed our product discovery process towards multi-modal AI (text, images, 3D, and sound). Implemented multiple games focused on music creation in the Roblox metaverse.
Technologies: AI Programming, Modal Labs, Generative Artificial Intelligence (GenAI), Luau, Roblox, Machine Learning Operations (MLOps)

Head of Solution Engineering and Integration

2022 - 2023
Kensu
  • Led the implementation and deployment of the Kensu data observability solution in a prospects environment (proofs of concept).
  • Carried out improvements to the product documentation and built a custom demo generator in Python.
  • Supported the sales team with their understanding of our ideal customer profile and the related technical discovery process.
Technologies: Amazon Web Services (AWS), Azure, Databricks, Spark, MLflow, Python 3, Data, Hugging Face, Project Management, Regex, Data Pipelines, Google Cloud Platform (GCP), Llama 2, AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling, Machine Learning Operations (MLOps), Docker

Director of Data Engineering and Data Science

2021 - 2022
The Farmer's Dog
  • Supervised time series forecasting efforts in relation to customer service dynamic staffing and food logistics.
  • Supported a data product strategy aiming to leverage large language models to gain insights into the voice of the customer in real time.
  • Acted as a key stakeholder of engineering teams in their quest to transition toward a more decoupled architecture of microservices by identifying and prioritizing the work required to transform any related ETL ingestion logic.
  • Contributed to a failover plan to build resilience in the company's analytics stack, focusing on ETL redundancy, vendor, and contractor management.
  • Doubled the size of the data engineering team to better support stakeholders' needs across marketing, finance, operations, and engineering departments.
  • Performed as a key stakeholder of the customer experience and engineering teams in the context of migration from Kustomer to Gladly, with a particular emphasis on downstream data processing and API integrations.
Technologies: SQL, Python 3, ETL Tools, Google BigQuery, R, Python, Pandas, Team Leadership, Data Science, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Statistics, Data Engineering, ETL, PostgreSQL, REST APIs, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Web Scraping, Artificial Intelligence (AI), Hugging Face, Project Management, Regex, Data Pipelines, Financial Forecasting, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Google Cloud Platform (GCP), AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling, Machine Learning Operations (MLOps), Docker

Engineering Manager

2021 - 2021
Hugging Face
  • Acted as the first engineering manager ever hired at the company, supporting the solution engineering team and go-to-market efforts. Hugging Face has been leading the charge of open source generative AI for years now.
  • Promoted Hugging Face's proprietary acceleration solution (called Optimum) for inference workloads of open source large language models available on the Hugging Face Hub.
  • Hired key talent across multiple functions, including head of talent acquisition, sales development rep, research scientists, and full-stack and machine learning engineers in collaboration with the cofounding team.
  • Leveraged my people management skills to establish myself as a helpful servant leader, with a dotted line towards various individual contributors in all four teams, including science, open-source, hub, and growth.
  • Contributed to key initiatives around diversity, equity, and inclusion as an extension to work done on the company charter focused on democratizing ethical machine learning.
  • Performed code reviews in the context of our hiring process, which involved a take-home assignment.
Technologies: Python 3, Transformers, GPU Computing, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), DataViz, Sales, Hiring, TensorFlow, PyTorch, Open Neural Network Exchange (ONNX), Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Deep Learning, Diffusion Models, Statistics, REST APIs, JSON, CSV, BERT, JavaScript, Neural Networks, Artificial Intelligence (AI), Language Models, Hugging Face, Project Management, Regex, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling, Machine Learning Operations (MLOps), Docker

Staff Machine Learning Engineer

2019 - 2021
Spotify
  • Spearheaded a research initiative that fine-tuned a series of large language models (LLMs) to explore the power of transformer models as knowledge retrievers in collaboration with the Content Intelligence ML research team.
  • Laid out and pitched the conceptual foundation for a feature that would later become the Spotify AI DJ (built and released after I left the company). The idea was to use generative AI to produce personalized stories about music.
  • Surveyed the state of the art in the domain of knowledge graph identification, entity resolution, and graph neural networks to prototype a working data enrichment solution tapping into 3rd-party datasets.
  • Acted as a key resource for machine learning tasks in the content intelligence team, focused on improving Spotify's music catalog through better data reconciliation capabilities and the proper integration of human expertise in the learning loop.
  • Deployed an end-to-end pipeline leveraging large language models on Spotify's "blessed" ML infrastructure, using TensorFlow Extended and Kubeflow pipelines (MLOps).
  • Trained a scoring model using audio features and standard music metadata, including track titles and artists' names. Demoed its use and deployment to my cross-functional team.
  • Authored a tutorial as a series of Jupyter notebooks to teach the use of convolutional neural networks (CNN) for speaker segmentation in audio files (TensorFlow).
Technologies: Python 3, TensorFlow, Jupyter Notebook, Google AI Platform, Elastic, Python, Pandas, Scikit-learn, Kubernetes, Data Science, Deep Learning, Generative Adversarial Networks (GANs), Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Statistics, Data Engineering, REST APIs, JSON, CSV, BERT, Word2Vec, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data Analytics, Music, Artificial Intelligence (AI), Audio, Mastering, Hugging Face, Regex, Data Pipelines, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), Google Cloud Platform (GCP), AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling, Machine Learning Operations (MLOps), Docker, Ray

Data Engineering Manager

2014 - 2019
Spotify
  • Hired and managed over 30 individual contributors, not simultaneously, across multiple squads in the data infrastructure tribe in NYC.
  • Advocated and encouraged using our home-grown library called Scio, a Scala API for Apache Beam that is powering almost every data pipeline at Spotify nowadays.
  • Contributed multiple machine learning hacks leveraging the latest advances in deep learning applied to audio, knowledge graphs, and recommender systems.
  • Supported technical and scientific delivery, as well as the people processes related to one of the squads in charge of building the experimentation framework A/B testing, used by Spotify at large.
  • Collaborated with technical orientation and people processes related to one of the squads building the machine learning infrastructure, based on the Google stack, GPU computing, TensorFlow, TFX, and GCP in general.
Technologies: Scala, Apache Beam, ClickHouse, Google BigQuery, Experimental Design, Distributed Systems, Business Intelligence (BI), Machine Learning Operations (MLOps), Data Quality, Management, Hiring, Python 3, CI/CD Pipelines, Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Data Science, Deep Learning, Statistics, Data Engineering, ETL, REST APIs, JSON, CSV, Time Series, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Amazon Web Services (AWS), Music, Artificial Intelligence (AI), Audio, Mastering, Hugging Face, Project Management, Regex, Data Pipelines, Google Cloud Platform (GCP), AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling, Docker

Director of Data

2012 - 2013
JDNviadeo
  • Laid out the vision for a fully integrated in-house CRM solution, built from scratch and able to handle content personalization and real-time communications towards the professional social network user base.
  • Hired research scientists with PhD degrees to pilot machine learning initiatives related to data quality, including people skills clustering and improving the UX.
  • Identified and contracted a Paris-based consulting company where the Play framework had been invented to implement future CRM system components.
  • Added a layer of managerial leadership to the analytics group, focused on Web Analytics (GA) and BI dashboarding.
  • Bootstrapped Agile practices in software engineering under the helm of expert consultants assisting the company in its transition towards building a healthy product and high-performing teams.
Technologies: R, Scala, Elastic, MongoDB, Play 2, Linear Discriminant Analysis (LDA), Business Intelligence (BI), Email Marketing, Spark, Scikit-learn, Team Leadership, Data Science, Natural Language Processing (NLP), Generative Pre-trained Transformers (GPT), Statistics, Data Engineering, ETL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data Analytics, Amazon Web Services (AWS), Artificial Intelligence (AI), Project Management, Regex, Data Pipelines, AI Programming, Data Scientist, Statistical Analysis, Databases, Data Analysis, Statistical Modeling

Manager of Business Intelligence

2007 - 2012
Photobox
  • Implemented the first business intelligence solution of the company, based on Oracle BIEE and an Oracle 11gR2 database feeding from a MySQL transactional system using Talend and OWB ETL.
  • Implemented and administrated, jointly with my team, a strategic investment made in Neolane used for cross-channel email marketing. It was later acquired by Adobe and rebranded as Adobe Marketing Suite.
  • Prototyped the design of a Hadoop-based data warehouse using the Cascalog DSL (a Clojure library) to run distributed data processing jobs on top of the Cascading library.
  • Researched customer survey solutions and integrated Vovici into the analytics system so that the user research manager could quickly gain insights into the voice of the customer.
  • Mentored dozens of country managers in the marketing team so that they could become autonomous with their email campaigns.
  • Guaranteed robust system availability to meet the service level objectives required by our business stakeholders.
  • Collaborated on the customer segmentation with our data mining manager using SPSS on top of the Oracle stack in a fully productionized manner.
Technologies: Perl, Oracle, SQL, PL/SQL, Adobe Marketing Cloud, Customer Segmentation, Business Intelligence (BI), Amazon Web Services (AWS), Scikit-learn, Team Leadership, Data Science, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Statistics, Data Engineering, ETL, PostgreSQL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Data Analytics, Clojure, Artificial Intelligence (AI), Project Management, Regex, Data Pipelines, Financial Forecasting, Statistical Analysis, Databases, Data Analysis, Statistical Modeling

Business Intelligence Engineer

2005 - 2007
PriceMinister
  • Created daily ETL processes against operational data from a 2-sided marketplace, an eBay competitor in France.
  • Developed business intelligence reports in business objects, serving finance and marketing needs.
  • Gained exposure to the statistical analysis performed by a third-party agency to further understand marketplace dynamics, sellers vs. buyers, and contributed my findings using R language.
Technologies: Oracle, R, PL/SQL, Perl, Oracle Warehouse Builder (OWB), Data Science, Statistics, Data Engineering, ETL, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, Data Analytics, Regex, Data Pipelines, Statistical Analysis, Databases, Data Analysis, Statistical Modeling

Web Developer

2003 - 2004
Lycos Inc.
  • Developed multiple bricks of an affiliation portal, allowing Lycos to sell its web hosting services as a white label.
  • Created parts of a back office interface to surface various usage statistics.
  • Performed comprehensive statistical analysis of customer lifetime value on the free web hosting user base in the context of a revamp of the offering.
Technologies: PHP, Apache2, Linux, MySQL, SAS, CSS2, HTML, Clustering, Customer Segmentation, Customer Lifetime Value (CLV), Data Science, Statistics, Time Series, Data Mining, Data Reporting, JavaScript, Data Analytics, Regex, Statistical Analysis, Databases, Data Analysis, Statistical Modeling

Proprietary Investing | Algorithmic Trading

I genuinely enjoy identifying weak signals in noisy data and have been designing algorithmic trading strategies for the last 15 years. Initially, with a focus on Forex, then cryptocurrencies, and finally equity index futures like the SP500, the Nasdaq, or the Russell.

Nowadays, I implement my strategies in C++ and Sierra Chart, but I have also used C#, Java, TypeScript, and Pine Script in other environments. I work with large amounts of data managed with BigQuery and research strategies using Python and MLflow to track my experiments.

I have a lot of respect and admiration for the work done by Marcos Lopez de Prado to democratize best practices in financial machine learning. I am also a fan of the work done by Jean-Phillippe Bouchaud and Julien Guyon.

As a student of market liquidity, I have developed a better understanding of institutional order flow under the mentorship of ICT (@InnerCircleTrader).

National Data Science Bowl: Plankton Recognition

I participated in the 2015 National Data Science Bowl hosted on the Kaggle platform. This was a computer vision competition with data provided by the Hatfield Marine Science Center at Oregon State University, a large collection of labeled images, approximately 30k of which are provided as a training set. Each raw image was run through an automatic process to extract regions of interest, resulting in smaller images that contain a single organism/entity. I created an algorithm that assigns class probabilities to a given image.

Large-scale QA-SRL Parsing | Minor Contribution

https://github.com/lelayf/nrl-qasrl
Question answering is an important machine learning task in the field of NLP.

Pairs of questions and their answers can be used to identify the semantic role of specific parts of speech in a sentence, Semantic Role Labelling. I contributed a minor PyTorch tweak to this academic work.

Adobe XD | Animated Digital Clock Timer

https://github.com/lelayf/AdobeXD-animated-digital-clock-timer
I prototyped a mobile app to handle youth soccer game durations and facilitate substitutions during the game.

This was when I realized Adobe XD is an excellent piece of software for designing UI interactions, and I regularly go back to it whenever I want to bring an idea to life.

Gimp-LOMO

https://github.com/lelayf/gimp-lomo
A Scheme Script-Fu plugin for the GNU image manipulation program to apply a Lomo LC-A effect on users' photos.

Scheme is a Lisp dialect used by the GIMP, a leading open source software for image processing and photo editing. It is the distant equivalent of Adobe Photoshop.
2000 - 2003

Master's Degree in Statistics

National School of Statistics and Information Analysis (ENSAI) - Rennes, France

1997 - 1999

Bachelor's Degree in Informatics and Applied Mathematics

Pierre and Marie Curie University - Paris, France

MARCH 2021 - PRESENT

Fundamentals of Reinforcement Learning

University of Alberta

Libraries/APIs

Scikit-learn, TensorFlow, Hugging Face Transformers, Pandas, REST APIs, PyTorch

Tools

Git, DataViz, Google AI Platform, Apache Beam, SPSS, Open Neural Network Exchange (ONNX), Elastic, Oracle Warehouse Builder (OWB), Adobe Experience Design (XD), Tableau

Languages

Python 3, R, SQL, Python, JavaScript, Regex, PHP, CSS2, GraphQL, C++, SAS, Fortran, Scala, Lisp, Perl, HTML, Java, TypeScript, Pine Script, Clojure, Scheme, Luau

Frameworks

Ray, Play 2, Spark, MXNet, Caffe

Paradigms

Business Intelligence (BI), Management, ETL

Platforms

Jupyter Notebook, Adobe Marketing Cloud, Linux, Google Cloud Platform (GCP), Oracle, Amazon Web Services (AWS), Docker, Apache2, Kubernetes, Azure, Databricks, Roblox

Storage

MySQL, JSON, Data Pipelines, Databases, PostgreSQL, ClickHouse, MongoDB, PL/SQL

Industry Expertise

Project Management

Other

Artificial Intelligence (AI), ETL Tools, Google BigQuery, Natural Language Processing (NLP), Hiring, Machine Learning Operations (MLOps), Data Quality, Machine Learning, Clustering, Customer Lifetime Value (CLV), Computer Vision, Team Leadership, Data Science, Deep Learning, Statistics, Data Engineering, CSV, BERT, Word2Vec, Time Series, Data Mining, Data Modeling, Data Reporting, Neural Networks, Data Analytics, Web Scraping, Generative Pre-trained Transformers (GPT), Music, Audio, Causal Inference, Data, Trading, Hugging Face, OpenAI, OpenAI GPT-3 API, OpenAI GPT-4 API, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI Programming, Data Scientist, Statistical Analysis, Data Analysis, Statistical Modeling, Time Series Analysis, Statistical Methods, Transformers, Experimental Design, Distributed Systems, Email Marketing, Customer Segmentation, Futures & Options, Financial Modeling, Generative Adversarial Networks (GANs), Financial Forecasting, Llama 2, Numerical Analysis, Algebra, GPU Computing, Sales, CI/CD Pipelines, Scheme Script-Fu, Reinforcement Learning, Linear Discriminant Analysis (LDA), Diffusion Models, Mastering, Language Models, MLflow, Genomics, Modal Labs, Red Panda, Recommendation Systems

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring