François Le Lay
Verified Expert in Engineering
Artificial Intelligence Engineer and Developer
Setauket- East Setauket, NY, United States
Toptal member since July 27, 2022
François is a seasoned leader with experience building data platforms and machine learning solutions at major technology companies and startups in B2C and B2B settings. François has spent seven years at Spotify building data infrastructure teams as a manager and leveraging machine learning techniques to improve the music catalog as a staff engineer.
Portfolio
Experience
Availability
Preferred Environment
Python 3, TensorFlow, PyTorch, Hugging Face Transformers, Jupyter Notebook, Computer Vision, Pandas, Amazon Web Services (AWS), Generative Artificial Intelligence (GenAI), Natural Language Processing (NLP)
The most amazing...
...AI project I've worked on has allowed Spotify to increase the data quality of artist entities and their related work in the company's music knowledge graph.
Work Experience
AI Engineer Advisor
Psykhe AI
- Audited and provided guidance related to a machine learning stack (infrastructure and MLOps).
- Conducted machine learning models research (multi-modal AI).
- Handled R&D related to the integration of reinforcement learning into the recommender system.
AI Engineer Advisor
Spark Space Inc
- Developed a portfolio of GenAI models (Stable Diffusion) to produce Roblox avatar clothing based on a user prompt.
- Learned Lua to contribute to the codebase of our Roblox games and contributed features both on the Python back end and Lua front end.
- Deployed a Mixpanel solution to the Lua codebase (a very rare match due to the lack of integration SDK).
- Pushed our product discovery process towards multi-modal AI (text, images, 3D, and sound). Implemented multiple games focused on music creation in the Roblox metaverse.
Head of Solution Engineering and Integration
Kensu
- Led the implementation and deployment of the Kensu data observability solution in a prospects environment (proofs of concept).
- Carried out improvements to the product documentation and built a custom demo generator in Python.
- Supported the sales team with their understanding of our ideal customer profile and the related technical discovery process.
Director of Data Engineering and Data Science
The Farmer's Dog
- Supervised time series forecasting efforts in relation to customer service dynamic staffing and food logistics.
- Supported a data product strategy aiming to leverage large language models to gain insights into the voice of the customer in real time.
- Acted as a key stakeholder of engineering teams in their quest to transition toward a more decoupled architecture of microservices by identifying and prioritizing the work required to transform any related ETL ingestion logic.
- Contributed to a failover plan to build resilience in the company's analytics stack, focusing on ETL redundancy, vendor, and contractor management.
- Doubled the size of the data engineering team to better support stakeholders' needs across marketing, finance, operations, and engineering departments.
- Performed as a key stakeholder of the customer experience and engineering teams in the context of migration from Kustomer to Gladly, with a particular emphasis on downstream data processing and API integrations.
Engineering Manager
Hugging Face
- Acted as the first engineering manager ever hired at the company, supporting the solution engineering team and go-to-market efforts. Hugging Face has been leading the charge of open source generative AI for years now.
- Promoted Hugging Face's proprietary acceleration solution (called Optimum) for inference workloads of open source large language models available on the Hugging Face Hub.
- Hired key talent across multiple functions, including head of talent acquisition, sales development rep, research scientists, and full-stack and machine learning engineers in collaboration with the cofounding team.
- Leveraged my people management skills to establish myself as a helpful servant leader, with a dotted line towards various individual contributors in all four teams, including science, open-source, hub, and growth.
- Contributed to key initiatives around diversity, equity, and inclusion as an extension to work done on the company charter focused on democratizing ethical machine learning.
- Performed code reviews in the context of our hiring process, which involved a take-home assignment.
Staff Machine Learning Engineer
Spotify
- Spearheaded a research initiative that fine-tuned a series of large language models (LLMs) to explore the power of transformer models as knowledge retrievers in collaboration with the Content Intelligence ML research team.
- Laid out and pitched the conceptual foundation for a feature that would later become the Spotify AI DJ (built and released after I left the company). The idea was to use generative AI to produce personalized stories about music.
- Surveyed the state of the art in the domain of knowledge graph identification, entity resolution, and graph neural networks to prototype a working data enrichment solution tapping into 3rd-party datasets.
- Acted as a key resource for machine learning tasks in the content intelligence team, focused on improving Spotify's music catalog through better data reconciliation capabilities and the proper integration of human expertise in the learning loop.
- Deployed an end-to-end pipeline leveraging large language models on Spotify's "blessed" ML infrastructure, using TensorFlow Extended and Kubeflow pipelines (MLOps).
- Trained a scoring model using audio features and standard music metadata, including track titles and artists' names. Demoed its use and deployment to my cross-functional team.
- Authored a tutorial as a series of Jupyter notebooks to teach the use of convolutional neural networks (CNN) for speaker segmentation in audio files (TensorFlow).
Data Engineering Manager
Spotify
- Hired and managed over 30 individual contributors, not simultaneously, across multiple squads in the data infrastructure tribe in NYC.
- Advocated and encouraged using our home-grown library called Scio, a Scala API for Apache Beam that is powering almost every data pipeline at Spotify nowadays.
- Contributed multiple machine learning hacks leveraging the latest advances in deep learning applied to audio, knowledge graphs, and recommender systems.
- Supported technical and scientific delivery, as well as the people processes related to one of the squads in charge of building the experimentation framework A/B testing, used by Spotify at large.
- Collaborated with technical orientation and people processes related to one of the squads building the machine learning infrastructure, based on the Google stack, GPU computing, TensorFlow, TFX, and GCP in general.
Director of Data
JDNviadeo
- Laid out the vision for a fully integrated in-house CRM solution, built from scratch and able to handle content personalization and real-time communications towards the professional social network user base.
- Hired research scientists with PhD degrees to pilot machine learning initiatives related to data quality, including people skills clustering and improving the UX.
- Identified and contracted a Paris-based consulting company where the Play framework had been invented to implement future CRM system components.
- Added a layer of managerial leadership to the analytics group, focused on Web Analytics (GA) and BI dashboarding.
- Bootstrapped Agile practices in software engineering under the helm of expert consultants assisting the company in its transition towards building a healthy product and high-performing teams.
Manager of Business Intelligence
Photobox
- Implemented the first business intelligence solution of the company, based on Oracle BIEE and an Oracle 11gR2 database feeding from a MySQL transactional system using Talend and OWB ETL.
- Implemented and administrated, jointly with my team, a strategic investment made in Neolane used for cross-channel email marketing. It was later acquired by Adobe and rebranded as Adobe Marketing Suite.
- Prototyped the design of a Hadoop-based data warehouse using the Cascalog DSL (a Clojure library) to run distributed data processing jobs on top of the Cascading library.
- Researched customer survey solutions and integrated Vovici into the analytics system so that the user research manager could quickly gain insights into the voice of the customer.
- Mentored dozens of country managers in the marketing team so that they could become autonomous with their email campaigns.
- Guaranteed robust system availability to meet the service level objectives required by our business stakeholders.
- Collaborated on the customer segmentation with our data mining manager using SPSS on top of the Oracle stack in a fully productionized manner.
Business Intelligence Engineer
PriceMinister
- Created daily ETL processes against operational data from a 2-sided marketplace, an eBay competitor in France.
- Developed business intelligence reports in business objects, serving finance and marketing needs.
- Gained exposure to the statistical analysis performed by a third-party agency to further understand marketplace dynamics, sellers vs. buyers, and contributed my findings using R language.
Web Developer
Lycos Inc.
- Developed multiple bricks of an affiliation portal, allowing Lycos to sell its web hosting services as a white label.
- Created parts of a back office interface to surface various usage statistics.
- Performed comprehensive statistical analysis of customer lifetime value on the free web hosting user base in the context of a revamp of the offering.
Experience
Proprietary Investing | Algorithmic Trading
Nowadays, I implement my strategies in C++ and Sierra Chart, but I have also used C#, Java, TypeScript, and Pine Script in other environments. I work with large amounts of data managed with BigQuery and research strategies using Python and MLflow to track my experiments.
I have a lot of respect and admiration for the work done by Marcos Lopez de Prado to democratize best practices in financial machine learning. I am also a fan of the work done by Jean-Phillippe Bouchaud and Julien Guyon.
As a student of market liquidity, I have developed a better understanding of institutional order flow under the mentorship of ICT (@InnerCircleTrader).
National Data Science Bowl: Plankton Recognition
Large-scale QA-SRL Parsing | Minor Contribution
https://github.com/lelayf/nrl-qasrlPairs of questions and their answers can be used to identify the semantic role of specific parts of speech in a sentence, Semantic Role Labelling. I contributed a minor PyTorch tweak to this academic work.
Adobe XD | Animated Digital Clock Timer
https://github.com/lelayf/AdobeXD-animated-digital-clock-timerThis was when I realized Adobe XD is an excellent piece of software for designing UI interactions, and I regularly go back to it whenever I want to bring an idea to life.
Gimp-LOMO
https://github.com/lelayf/gimp-lomoScheme is a Lisp dialect used by the GIMP, a leading open source software for image processing and photo editing. It is the distant equivalent of Adobe Photoshop.
Education
Master's Degree in Statistics
National School of Statistics and Information Analysis (ENSAI) - Rennes, France
Bachelor's Degree in Informatics and Applied Mathematics
Pierre and Marie Curie University - Paris, France
Certifications
Fundamentals of Reinforcement Learning
University of Alberta
Skills
Libraries/APIs
Scikit-learn, TensorFlow, Hugging Face Transformers, Pandas, REST APIs, PyTorch
Tools
Git, DataViz, Google AI Platform, Apache Beam, SPSS, Open Neural Network Exchange (ONNX), Elastic, Oracle Warehouse Builder (OWB), Adobe Experience Design (XD), Tableau
Languages
Python 3, R, SQL, Python, JavaScript, Regex, PHP, CSS2, GraphQL, C++, SAS, Fortran, Scala, Lisp, Perl, HTML, Java, TypeScript, Pine Script, Clojure, Scheme, Luau
Frameworks
Ray, Play 2, Spark, MXNet, Caffe
Paradigms
Business Intelligence (BI), Management, ETL
Platforms
Jupyter Notebook, Adobe Marketing Cloud, Linux, Google Cloud Platform (GCP), Oracle, Amazon Web Services (AWS), Docker, Apache2, Kubernetes, Azure, Databricks, Roblox
Storage
MySQL, JSON, Data Pipelines, Databases, PostgreSQL, ClickHouse, MongoDB, PL/SQL
Industry Expertise
Project Management
Other
Artificial Intelligence, ETL Tools, Google BigQuery, Natural Language Processing (NLP), Hiring, Machine Learning Operations (MLOps), Data Quality, Machine Learning, Clustering, Customer Lifetime Value (CLV), Computer Vision, Team Leadership, Data Science, Deep Learning, Statistics, Data Engineering, CSV, BERT, Word2Vec, Time Series, Data Mining, Data Modeling, Data Reporting, Neural Networks, Data Analytics, Web Scraping, Generative Pre-trained Transformers (GPT), Music, Audio, Causal Inference, Data, Trading, Hugging Face, OpenAI, OpenAI GPT-3 API, OpenAI GPT-4 API, Generative Artificial Intelligence (GenAI), Large Language Models (LLMs), AI Programming, Data Scientist, Statistical Analysis, Data Analysis, Statistical Modeling, Time Series Analysis, Statistical Methods, Transformers, Experimental Design, Distributed Systems, Email Marketing, Customer Segmentation, Futures & Options, Financial Modeling, Generative Adversarial Networks (GANs), Financial Forecasting, Llama 2, Numerical Analysis, Algebra, GPU Computing, Sales, CI/CD Pipelines, Scheme Script-Fu, Reinforcement Learning, Linear Discriminant Analysis (LDA), Diffusion Models, Mastering, Language Models, MLflow, Genomics, Modal Labs, Red Panda, Recommendation Systems
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring