François Le Lay
Verified Expert in Engineering
Artificial Intelligence Engineer and Developer
François is a seasoned leader with experience building data platforms and machine learning solutions at major technology companies and startups in B2C and B2B settings. François has spent seven years at Spotify building data infrastructure teams as a manager and leveraging machine learning techniques to improve the music catalog as a staff engineer.
Python 3, R, Git, TensorFlow, PyTorch, Hugging Face Transformers, Jupyter Notebook, GraphQL, Computer Vision, Python, Pandas, Amazon Web Services (AWS)
The most amazing...
...AI project I've worked on has allowed Spotify to increase the data quality of artist entities and their related work in the company's music knowledge graph.
Head of Solution Engineering and Integration
- Led the implementation and deployment of the Kensu data observability solution in a prospects environment (proofs of concept).
- Carried out improvements to the product documentation and built a custom demo generator in Python.
- Supported the sales team with their understanding of our ideal customer profile and the related technical discovery process.
Director of Data Engineering and Data Science
The Farmer's Dog
- Supervised time series forecasting efforts in relation to customer service dynamic staffing and food logistics.
- Supported a data product strategy aiming to leverage large language models to gain insights into the voice of the customer in real time.
- Acted as a key stakeholder of engineering teams in their quest to transition toward a more decoupled architecture of microservices by identifying and prioritizing the work required to transform any related ETL ingestion logic.
- Contributed to a failover plan to build resilience in the company's analytics stack, focusing on ETL redundancy, vendor, and contractor management.
- Doubled the size of the data engineering team to better support stakeholders' needs across marketing, finance, operations, and engineering departments.
- Performed as a key stakeholder of the customer experience and engineering teams in the context of migration from Kustomer to Gladly, with a particular emphasis on downstream data processing and API integrations.
- Acted as the first engineering manager ever hired at the company, supporting the solution engineering team and go-to-market efforts. Hugging Face has been leading the charge of open source generative AI for years now.
- Promoted Hugging Face's proprietary acceleration solution (called Optimum) for inference workloads of open source large language models available on the Hugging Face Hub.
- Hired key talent across multiple functions, including head of talent acquisition, sales development rep, research scientists, and full-stack and machine learning engineers in collaboration with the cofounding team.
- Leveraged my people management skills to establish myself as a helpful servant leader, with a dotted line towards various individual contributors in all four teams, including science, open-source, hub, and growth.
- Contributed to key initiatives around diversity, equity, and inclusion as an extension to work done on the company charter focused on democratizing ethical machine learning.
- Performed code reviews in the context of our hiring process, which involved a take-home assignment.
Staff Machine Learning Engineer
- Spearheaded a research initiative that fine-tuned a series of large language models (LLMs) to explore the power of transformer models as knowledge retrievers in collaboration with the Content Intelligence ML research team.
- Laid out and pitched the conceptual foundation for a feature that would later become the Spotify AI DJ (built and released after I left the company). The idea was to use generative AI to produce personalized stories about music.
- Surveyed the state of the art in the domain of knowledge graph identification, entity resolution, and graph neural networks to prototype a working data enrichment solution tapping into 3rd-party datasets.
- Acted as a key resource for machine learning tasks in the content intelligence team, focused on improving Spotify's music catalog through better data reconciliation capabilities and the proper integration of human expertise in the learning loop.
- Deployed an end-to-end pipeline leveraging large language models on Spotify's "blessed" ML infrastructure, using TensorFlow Extended and Kubeflow pipelines (MLOps).
- Trained a scoring model using audio features and standard music metadata, including track titles and artists' names. Demoed its use and deployment to my cross-functional team.
- Authored a tutorial as a series of Jupyter notebooks to teach the use of convolutional neural networks (CNN) for speaker segmentation in audio files (TensorFlow).
Data Engineering Manager
- Hired and managed over 30 individual contributors, not simultaneously, across multiple squads in the data infrastructure tribe in NYC.
- Advocated and encouraged using our home-grown library called Scio, a Scala API for Apache Beam that is powering almost every data pipeline at Spotify nowadays.
- Contributed multiple machine learning hacks leveraging the latest advances in deep learning applied to audio, knowledge graphs, and recommender systems.
- Supported technical and scientific delivery, as well as the people processes related to one of the squads in charge of building the experimentation framework A/B testing, used by Spotify at large.
- Collaborated with technical orientation and people processes related to one of the squads building the machine learning infrastructure, based on the Google stack, GPU computing, TensorFlow, TFX, and GCP in general.
Director of Data
- Laid out the vision for a fully integrated in-house CRM solution, built from scratch and able to handle content personalization and real-time communications towards the professional social network user base.
- Hired research scientists with PhD degrees to pilot machine learning initiatives related to data quality, including people skills clustering and improving the UX.
- Identified and contracted a Paris-based consulting company where the Play framework had been invented to implement future CRM system components.
- Added a layer of managerial leadership to the analytics group, focused on Web Analytics (GA) and BI dashboarding.
- Bootstrapped Agile practices in software engineering under the helm of expert consultants assisting the company in its transition towards building a healthy product and high-performing teams.
Manager of Business Intelligence
- Implemented the first business intelligence solution of the company, based on Oracle BIEE and an Oracle 11gR2 database feeding from a MySQL transactional system using Talend and OWB ETL.
- Implemented and administrated, jointly with my team, a strategic investment made in Neolane used for cross-channel email marketing. It was later acquired by Adobe and rebranded as Adobe Marketing Suite.
- Prototyped the design of a Hadoop-based data warehouse using the Cascalog DSL (a Clojure library) to run distributed data processing jobs on top of the Cascading library.
- Researched customer survey solutions and integrated Vovici into the analytics system so that the user research manager could quickly gain insights into the voice of the customer.
- Mentored dozens of country managers in the marketing team so that they could become autonomous with their email campaigns.
- Guaranteed robust system availability to meet the service level objectives required by our business stakeholders.
- Collaborated on the customer segmentation with our data mining manager using SPSS on top of the Oracle stack in a fully productionized manner.
Business Intelligence Engineer
- Created daily ETL processes against operational data from a 2-sided marketplace, an eBay competitor in France.
- Developed business intelligence reports in business objects, serving finance and marketing needs.
- Gained exposure to the statistical analysis performed by a third-party agency to further understand marketplace dynamics, sellers vs. buyers, and contributed my findings using R language.
- Developed multiple bricks of an affiliation portal, allowing Lycos to sell its web hosting services as a white label.
- Created parts of a back office interface to surface various usage statistics.
- Performed comprehensive statistical analysis of customer lifetime value on the free web hosting user base in the context of a revamp of the offering.
Proprietary Investing | Algorithmic Trading
Nowadays, I implement my strategies in C++ and Sierra Chart, but I have also used C#, Java, TypeScript, and Pine Script in other environments. I work with large amounts of data managed with BigQuery and research strategies using Python and MLflow to track my experiments.
I have a lot of respect and admiration for the work done by Marcos Lopez de Prado to democratize best practices in financial machine learning. I am also a fan of the work done by Jean-Phillippe Bouchaud and Julien Guyon.
As a student of market liquidity, I have developed a better understanding of institutional order flow under the mentorship of ICT (@InnerCircleTrader).
National Data Science Bowl: Plankton Recognition
Large-scale QA-SRL Parsing | Minor Contributionhttps://github.com/lelayf/nrl-qasrl
Pairs of questions and their answers can be used to identify the semantic role of specific parts of speech in a sentence, Semantic Role Labelling. I contributed a minor PyTorch tweak to this academic work.
Adobe XD | Animated Digital Clock Timerhttps://github.com/lelayf/AdobeXD-animated-digital-clock-timer
This was when I realized Adobe XD is an excellent piece of software for designing UI interactions, and I regularly go back to it whenever I want to bring an idea to life.
Scheme is a Lisp dialect used by the GIMP, a leading open source software for image processing and photo editing. It is the distant equivalent of Adobe Photoshop.
Scikit-learn, TensorFlow, Pandas, REST APIs, PyTorch
Business Intelligence (BI), Management, Data Science, ETL
Jupyter Notebook, Adobe Marketing Cloud, Linux, Google Cloud Platform (GCP), Oracle, Amazon Web Services (AWS), Apache2, Kubernetes, Azure, Databricks
MySQL, JSON, Data Pipelines, Databases, PostgreSQL, ClickHouse, MongoDB, PL/SQL
Artificial Intelligence (AI), ETL Tools, Google BigQuery, Natural Language Processing (NLP), Hiring, Data Quality, Machine Learning, Clustering, Customer Lifetime Value, Computer Vision, Team Leadership, Deep Learning, Statistics, Data Engineering, CSV, BERT, Word2Vec, Time Series, Data Mining, Data Modeling, Data Reporting, Neural Networks, Data Analytics, Web Scraping, GPT, Generative Pre-trained Transformers (GPT), Music, Audio, Causal Inference, Data, Trading, Hugging Face, OpenAI, OpenAI GPT-3 API, OpenAI GPT-4 API, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI Programming, Data Scientist, Statistical Analysis, Data Analysis, Statistical Modeling, Hugging Face Transformers, Time Series Analysis, Statistical Methods, Transformers, Experimental Design, Distributed Systems, Machine Learning Operations (MLOps), Email Marketing, Customer Segmentation, Futures & Options, Financial Modeling, Generative Adversarial Networks (GANs), Financial Forecasting, Llama 2, Numerical Analysis, Algebra, GPU Computing, Sales, Open Neural Network Exchange (ONNX), CI/CD Pipelines, Scheme Script-Fu, Reinforcement Learning, Discriminant Analysis (LDA), Diffusion Models, Mastering, Language Models, MLflow, Genomics
Git, DataViz, Google Cloud AI, Apache Beam, SPSS, Elastic, Oracle Warehouse Builder (OWB), Adobe Experience Design (XD), Tableau
Play 2, Spark, MXNet, Caffe
Master's Degree in Statistics
National School of Statistics and Information Analysis (ENSAI) - Rennes, France
Bachelor's Degree in Informatics and Applied Mathematics
Pierre and Marie Curie University - Paris, France
Fundamentals of Reinforcement Learning
University of Alberta
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.Start hiring