François Le Lay, Artificial Intelligence Engineer and Developer in Setauket- East Setauket, United States
François Le Lay

Artificial Intelligence Engineer and Developer in Setauket- East Setauket, United States

Member since July 27, 2022
François is a seasoned leader with experience building data platforms and machine learning solutions at major technology companies and startups in B2C and B2B settings. He has spent seven years at Spotify building data infrastructure teams as a manager and leveraging machine learning techniques to improve the music catalog as a staff engineer.
François is now available for hire

Portfolio

  • The Farmer's Dog
    SQL, Python 3, ETL Tools, Google BigQuery, R, Python, Pandas, Team Leadership...
  • Hugging Face
    Python 3, Transformers, GPU Computing, Natural Language Processing (NLP)...
  • Spotify
    Python 3, TensorFlow, Jupyter Notebook, Google Cloud AI, Elastic, Python...

Experience

Location

Setauket- East Setauket, United States

Availability

Part-time

Preferred Environment

Python 3, R, Git, TensorFlow, PyTorch, Hugging Face Transformers, Jupyter Notebook, GraphQL, Computer Vision, Python, Pandas, Amazon Web Services (AWS)

The most amazing...

...AI project I've worked on has allowed Spotify to increase the data quality of artist entities and their related work in the company's music knowledge graph.

Employment

  • Director of Data Engineering and Data Science

    2021 - 2022
    The Farmer's Dog
    • Contributed a failover plan to build resilience in the company's analytics stack, focusing on ETL redundancy, vendor, and contractor management.
    • Supported a data product strategy leveraging Natural Language Processing (NLP) technology to gain insights into the voice of the customer in real-time.
    • Doubled the size of the data engineering team to better support stakeholders' needs across marketing, finance, operations, and engineering departments.
    • Acted as a key stakeholder of engineering teams in their quest to transition towards a more decoupled architecture of microservices by identifying and prioritizing the work required to transform any related ETL ingestion logic.
    • Performed as a key stakeholder of the customer experience and engineering teams in the context of migration from Kustomer to Gladly, with a particular emphasis on downstream data processing and API integrations.
    Technologies: SQL, Python 3, ETL Tools, Google BigQuery, R, Python, Pandas, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, PostgreSQL, REST APIs, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Web Scraping
  • Engineering Manager

    2021 - 2021
    Hugging Face
    • Hired key talent across multiple functions, including head of talent acquisition, sales development rep, research scientists, and full-stack and machine learning engineers in collaboration with the cofounding team.
    • Leveraged my people management skills to establish myself as a helpful servant leader, with a dotted line towards various individual contributors in all four teams, including science, open-source, hub, and growth.
    • Contributed to key initiatives around diversity, equity, and inclusion, as an extension to work done on the company charter focused on democratizing ethical machine learning.
    • Performed code reviews in the context of our hiring process, which involved a take-home assignment.
    • Supported the growth team by sourcing and participating in various pre-sales calls in the context of our go-to-market strategy related to a proprietary acceleration of inference workloads for NLP.
    Technologies: Python 3, Transformers, GPU Computing, Natural Language Processing (NLP), DataViz, Sales, Hiring, TensorFlow, PyTorch, Open Neural Network Exchange (ONNX), Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Deep Learning, Diffusion Models, Statistics, REST APIs, JSON, CSV, BERT, JavaScript, Neural Networks
  • Staff Machine Learning Engineer

    2019 - 2021
    Spotify
    • Acted as a key resource for machine learning tasks in the content intelligence team, focused on improving Spotify's music catalog through better data reconciliation capabilities and the proper integration of human expertise in the learning loop.
    • Deployed end-to-end pipeline leveraging Roberta transformer model to make use of "blessed" company infrastructure, using TensorFlow Extended and Kubeflow (MLOps).
    • Surveyed state of the art in the domain of knowledge graph identification and entity resolution to prototype a working data enrichment solution tapping into third-party datasets.
    • Delivered another series of machine learning models, following a proposal to leverage transformer models in novel ways and a collaboration with our research science team to further iterate on that premise.
    • Carried out another entity resolution model feeding audio features and standard music metadata, including track titles and artists' names. Demoed the use and deployment of the model as a peer mentor for the rest of the team.
    Technologies: Python 3, TensorFlow, Jupyter Notebook, Google Cloud AI, Elastic, Python, Pandas, Scikit-learn, Kubernetes, Data Science, Deep Learning, Generative Adversarial Networks (GANs), Natural Language Processing (NLP), Statistics, Data Engineering, REST APIs, JSON, CSV, BERT, Word2Vec, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data Analytics
  • Data Engineering Manager

    2014 - 2019
    Spotify
    • Hired and managed over 30 individual contributors, not simultaneously, across multiple squads in the data infrastructure tribe in NYC.
    • Advocated and encouraged using our home-grown library called Scio, a Scala API for Apache Beam that is powering almost every data pipeline at Spotify nowadays.
    • Contributed multiple machine learning hacks leveraging the latest advances in deep learning applied to audio, knowledge graphs, and recommender systems.
    • Supported technical and scientific delivery, as well as the people processes related to one of the squads in charge of building the experimentation framework A/B testing, used by Spotify at large.
    • Collaborated with technical orientation and people processes related to one of the squads building the machine learning infrastructure, based on the Google stack, GPU computing, TensorFlow, TFX, and GCP in general.
    Technologies: Scala, Apache Beam, ClickHouse, Google BigQuery, Experimental Design, Distributed Systems, Business Intelligence (BI), Machine Learning Operations (MLOps), Data Quality, Management, Hiring, Python 3, CI/CD Pipelines, Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Data Science, Deep Learning, Statistics, Data Engineering, ETL, REST APIs, JSON, CSV, Time Series, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Amazon Web Services (AWS)
  • Director of Data

    2012 - 2013
    JDNviadeo
    • Laid out the vision for a fully integrated in-house CRM solution, built from scratch and able to handle content personalization and real-time communications towards the professional social network user base.
    • Hired research scientists with PhD degrees to pilot machine learning initiatives related to data quality, including people skills clustering and improving the UX.
    • Identified and contracted a Paris-based consulting company where the Play framework had been invented to implement future CRM system components.
    • Added a layer of managerial leadership to the analytics group, focused on Web Analytics (GA) and BI dashboarding.
    • Bootstrapped Agile practices in software engineering under the helm of expert consultants assisting the company in its transition towards building a healthy product and high-performing teams.
    Technologies: R, Scala, Elastic, MongoDB, Play 2, Discriminant Analysis (LDA), Business Intelligence (BI), Email Marketing, Spark, Scikit-learn, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data Analytics, Amazon Web Services (AWS)
  • Manager of Business Intelligence

    2007 - 2012
    Photobox
    • Implemented the first business intelligence solution of the company, based on Oracle BIEE and an Oracle 11gR2 database feeding from a MySQL transactional system using Talend and OWB ETL.
    • Implemented and administrated, jointly with my team, a strategic investment made in Neolane used for cross-channel email marketing. It was later acquired by Adobe and rebranded as Adobe Marketing Suite.
    • Prototyped the design of a Hadoop-based data warehouse using the Cascalog DSL (a Clojure library) to run distributed data processing jobs on top of the Cascading library.
    • Researched customer survey solutions and integrated Vovici into the analytics system so that the user research manager could quickly gain insights into the voice of the customer.
    • Mentored dozens of country managers in the marketing team so that they could become autonomous with their email campaigns.
    • Guaranteed robust system availability to meet the service level objectives required by our business stakeholders.
    • Collaborated on the customer segmentation with our data mining manager using SPSS on top of the Oracle stack in a fully productionized manner.
    Technologies: Perl, Oracle, SQL, PL/SQL, Adobe Marketing Cloud, Customer Segmentation, Business Intelligence (BI), Amazon Web Services (AWS), Scikit-learn, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, PostgreSQL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Data Analytics, Clojure
  • Business Intelligence Engineer

    2005 - 2007
    PriceMinister
    • Created daily ETL processes against operational data from a 2-sided marketplace, an eBay competitor in France.
    • Developed business intelligence reports in business objects, serving finance and marketing needs.
    • Gained exposure to the statistical analysis performed by a third-party agency to further understand marketplace dynamics, sellers vs. buyers, and contributed my findings using R language.
    Technologies: Oracle, R, PL/SQL, Perl, Oracle Warehouse Builder (OWB), Data Science, Statistics, Data Engineering, ETL, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, Data Analytics
  • Web Developer

    2003 - 2004
    Lycos Inc.
    • Developed multiple bricks of an affiliation portal, allowing Lycos to sell its web hosting services as a white label.
    • Created parts of a back office interface to surface various usage statistics.
    • Performed comprehensive statistical analysis of customer lifetime value on the free web hosting user base in the context of a revamp of the offering.
    Technologies: PHP, Apache2, Linux, MySQL, SAS, CSS2, HTML, Clustering, Customer Segmentation, Customer Lifetime Value, Data Science, Statistics, Time Series, Data Mining, Data Reporting, JavaScript, Data Analytics

Experience

  • Proprietary Investing | Algorithmic Trading

    I truly enjoy identifying weak signals in noisy data and have been designing algorithmic trading strategies for the last 15 years. Initially, with a focus on Forex, then cryptocurrencies, and finally equity index futures like SP500, Nasdaq, and Russell.

    I implemented C++, C#, Java, TypeScript, and Pine Script depending on the broker environment. I work with large amounts of data managed with BigQuery and researched strategies by analyzing it with Python.

    I have a lot of respect and admiration for the work done by Marcos Lopez de Prado to democratize best practices in financial machine learning.

  • National Data Science Bowl: Plankton Recognition

    I participated in the 2015 National Data Science Bowl hosted on the Kaggle platform. This was a computer vision competition with data provided by the Hatfield Marine Science Center at Oregon State University, a large collection of labeled images, approximately 30k of which are provided as a training set. Each raw image was run through an automatic process to extract regions of interest, resulting in smaller images that contain a single organism/entity. I created an algorithm that assigns class probabilities to a given image.

  • Large-scale QA-SRL Parsing | Minor Contribution
    https://github.com/lelayf/nrl-qasrl

    Question answering is an important machine learning task in the field of NLP.

    Pairs of questions and their answers can be used to identify the semantic role of specific parts of speech in a sentence, Semantic Role Labelling. I contributed a minor PyTorch tweak to this academic work.

  • Adobe XD | Animated Digital Clock Timer
    https://github.com/lelayf/AdobeXD-animated-digital-clock-timer

    I prototyped a mobile app to handle youth soccer game durations and facilitate substitutions during the game.

    This was when I realized Adobe XD is an excellent piece of software for designing UI interactions, and I regularly go back to it whenever I want to bring an idea to life.

  • Gimp-LOMO
    https://github.com/lelayf/gimp-lomo

    A Scheme Script-Fu plugin for the GNU image manipulation program to apply a Lomo LC-A effect on users' photos.

    Scheme is a Lisp dialect used by the GIMP, a leading open source software for image processing and photo editing. It is the distant equivalent of Adobe Photoshop.

Skills

  • Languages

    Python 3, R, SQL, Python, JavaScript, PHP, CSS2, GraphQL, C++, SAS, Fortran, Scala, Lisp, Perl, HTML, Java, TypeScript, Pine Script, Clojure, Scheme
  • Paradigms

    Business Intelligence (BI), Management, Data Science, ETL
  • Platforms

    Jupyter Notebook, Adobe Marketing Cloud, Linux, Oracle, Amazon Web Services (AWS), Apache2, Kubernetes
  • Storage

    MySQL, JSON, Data Pipelines, PostgreSQL, ClickHouse, MongoDB, PL/SQL
  • Other

    ETL Tools, Google BigQuery, Natural Language Processing (NLP), Hiring, Data Quality, Machine Learning, Clustering, Customer Lifetime Value, Computer Vision, Team Leadership, Deep Learning, Statistics, Data Engineering, CSV, BERT, Word2Vec, Time Series, Data Mining, Data Modeling, Data Reporting, Neural Networks, Data Analytics, Web Scraping, Hugging Face Transformers, Artificial Intelligence (AI), Time Series Analysis, Statistical Methods, Transformers, Experimental Design, Distributed Systems, Machine Learning Operations (MLOps), Email Marketing, Customer Segmentation, Financial Modeling, Generative Adversarial Networks (GANs), Financial Forecasting, Numerical Analysis, Algebra, GPU Computing, Sales, Open Neural Network Exchange (ONNX), CI/CD Pipelines, Scheme Script-Fu, Reinforcement Learning, Discriminant Analysis (LDA), Futures & Options, Diffusion Models
  • Libraries/APIs

    TensorFlow, Pandas, Scikit-learn, REST APIs, PyTorch
  • Tools

    Git, DataViz, Google Cloud AI, Apache Beam, SPSS, Elastic, Oracle Warehouse Builder (OWB), Adobe Experience Design (XD), Tableau
  • Frameworks

    Play 2, Spark, MXNet, Caffe

Education

  • Master's Degree in Statistics
    2000 - 2003
    National School of Statistics and Information Analysis (ENSAI) - Rennes, France
  • Bachelor's Degree in Informatics and Applied Mathematics
    1997 - 1999
    Pierre and Marie Curie University - Paris, France

Certifications

  • Fundamentals of Reinforcement Learning
    MARCH 2021 - PRESENT
    University of Alberta

To view more profiles

Join Toptal
Share it with others