Director of Data Engineering and Data Science
2021 - 2022The Farmer's Dog- Contributed a failover plan to build resilience in the company's analytics stack, focusing on ETL redundancy, vendor, and contractor management.
- Supported a data product strategy leveraging Natural Language Processing (NLP) technology to gain insights into the voice of the customer in real-time.
- Doubled the size of the data engineering team to better support stakeholders' needs across marketing, finance, operations, and engineering departments.
- Acted as a key stakeholder of engineering teams in their quest to transition towards a more decoupled architecture of microservices by identifying and prioritizing the work required to transform any related ETL ingestion logic.
- Performed as a key stakeholder of the customer experience and engineering teams in the context of migration from Kustomer to Gladly, with a particular emphasis on downstream data processing and API integrations.
Technologies: SQL, Python 3, ETL Tools, Google BigQuery, R, Python, Pandas, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, PostgreSQL, REST APIs, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Web ScrapingEngineering Manager
2021 - 2021Hugging Face- Hired key talent across multiple functions, including head of talent acquisition, sales development rep, research scientists, and full-stack and machine learning engineers in collaboration with the cofounding team.
- Leveraged my people management skills to establish myself as a helpful servant leader, with a dotted line towards various individual contributors in all four teams, including science, open-source, hub, and growth.
- Contributed to key initiatives around diversity, equity, and inclusion, as an extension to work done on the company charter focused on democratizing ethical machine learning.
- Performed code reviews in the context of our hiring process, which involved a take-home assignment.
- Supported the growth team by sourcing and participating in various pre-sales calls in the context of our go-to-market strategy related to a proprietary acceleration of inference workloads for NLP.
Technologies: Python 3, Transformers, GPU Computing, Natural Language Processing (NLP), DataViz, Sales, Hiring, TensorFlow, PyTorch, Open Neural Network Exchange (ONNX), Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Deep Learning, Diffusion Models, Statistics, REST APIs, JSON, CSV, BERT, JavaScript, Neural NetworksStaff Machine Learning Engineer
2019 - 2021Spotify- Acted as a key resource for machine learning tasks in the content intelligence team, focused on improving Spotify's music catalog through better data reconciliation capabilities and the proper integration of human expertise in the learning loop.
- Deployed end-to-end pipeline leveraging Roberta transformer model to make use of "blessed" company infrastructure, using TensorFlow Extended and Kubeflow (MLOps).
- Surveyed state of the art in the domain of knowledge graph identification and entity resolution to prototype a working data enrichment solution tapping into third-party datasets.
- Delivered another series of machine learning models, following a proposal to leverage transformer models in novel ways and a collaboration with our research science team to further iterate on that premise.
- Carried out another entity resolution model feeding audio features and standard music metadata, including track titles and artists' names. Demoed the use and deployment of the model as a peer mentor for the rest of the team.
Technologies: Python 3, TensorFlow, Jupyter Notebook, Google Cloud AI, Elastic, Python, Pandas, Scikit-learn, Kubernetes, Data Science, Deep Learning, Generative Adversarial Networks (GANs), Natural Language Processing (NLP), Statistics, Data Engineering, REST APIs, JSON, CSV, BERT, Word2Vec, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data AnalyticsData Engineering Manager
2014 - 2019Spotify- Hired and managed over 30 individual contributors, not simultaneously, across multiple squads in the data infrastructure tribe in NYC.
- Advocated and encouraged using our home-grown library called Scio, a Scala API for Apache Beam that is powering almost every data pipeline at Spotify nowadays.
- Contributed multiple machine learning hacks leveraging the latest advances in deep learning applied to audio, knowledge graphs, and recommender systems.
- Supported technical and scientific delivery, as well as the people processes related to one of the squads in charge of building the experimentation framework A/B testing, used by Spotify at large.
- Collaborated with technical orientation and people processes related to one of the squads building the machine learning infrastructure, based on the Google stack, GPU computing, TensorFlow, TFX, and GCP in general.
Technologies: Scala, Apache Beam, ClickHouse, Google BigQuery, Experimental Design, Distributed Systems, Business Intelligence (BI), Machine Learning Operations (MLOps), Data Quality, Management, Hiring, Python 3, CI/CD Pipelines, Python, Pandas, Scikit-learn, Team Leadership, Kubernetes, Data Science, Deep Learning, Statistics, Data Engineering, ETL, REST APIs, JSON, CSV, Time Series, Data Modeling, Data Reporting, JavaScript, Tableau, Neural Networks, Data Analytics, Amazon Web Services (AWS)Director of Data
2012 - 2013JDNviadeo- Laid out the vision for a fully integrated in-house CRM solution, built from scratch and able to handle content personalization and real-time communications towards the professional social network user base.
- Hired research scientists with PhD degrees to pilot machine learning initiatives related to data quality, including people skills clustering and improving the UX.
- Identified and contracted a Paris-based consulting company where the Play framework had been invented to implement future CRM system components.
- Added a layer of managerial leadership to the analytics group, focused on Web Analytics (GA) and BI dashboarding.
- Bootstrapped Agile practices in software engineering under the helm of expert consultants assisting the company in its transition towards building a healthy product and high-performing teams.
Technologies: R, Scala, Elastic, MongoDB, Play 2, Discriminant Analysis (LDA), Business Intelligence (BI), Email Marketing, Spark, Scikit-learn, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Neural Networks, Data Analytics, Amazon Web Services (AWS)Manager of Business Intelligence
2007 - 2012Photobox- Implemented the first business intelligence solution of the company, based on Oracle BIEE and an Oracle 11gR2 database feeding from a MySQL transactional system using Talend and OWB ETL.
- Implemented and administrated, jointly with my team, a strategic investment made in Neolane used for cross-channel email marketing. It was later acquired by Adobe and rebranded as Adobe Marketing Suite.
- Prototyped the design of a Hadoop-based data warehouse using the Cascalog DSL (a Clojure library) to run distributed data processing jobs on top of the Cascading library.
- Researched customer survey solutions and integrated Vovici into the analytics system so that the user research manager could quickly gain insights into the voice of the customer.
- Mentored dozens of country managers in the marketing team so that they could become autonomous with their email campaigns.
- Guaranteed robust system availability to meet the service level objectives required by our business stakeholders.
- Collaborated on the customer segmentation with our data mining manager using SPSS on top of the Oracle stack in a fully productionized manner.
Technologies: Perl, Oracle, SQL, PL/SQL, Adobe Marketing Cloud, Customer Segmentation, Business Intelligence (BI), Amazon Web Services (AWS), Scikit-learn, Team Leadership, Data Science, Natural Language Processing (NLP), Statistics, Data Engineering, ETL, PostgreSQL, JSON, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, JavaScript, Data Analytics, ClojureBusiness Intelligence Engineer
2005 - 2007PriceMinister- Created daily ETL processes against operational data from a 2-sided marketplace, an eBay competitor in France.
- Developed business intelligence reports in business objects, serving finance and marketing needs.
- Gained exposure to the statistical analysis performed by a third-party agency to further understand marketplace dynamics, sellers vs. buyers, and contributed my findings using R language.
Technologies: Oracle, R, PL/SQL, Perl, Oracle Warehouse Builder (OWB), Data Science, Statistics, Data Engineering, ETL, CSV, Time Series, Data Mining, Data Modeling, Data Reporting, Data AnalyticsWeb Developer
2003 - 2004Lycos Inc.- Developed multiple bricks of an affiliation portal, allowing Lycos to sell its web hosting services as a white label.
- Created parts of a back office interface to surface various usage statistics.
- Performed comprehensive statistical analysis of customer lifetime value on the free web hosting user base in the context of a revamp of the offering.
Technologies: PHP, Apache2, Linux, MySQL, SAS, CSS2, HTML, Clustering, Customer Segmentation, Customer Lifetime Value, Data Science, Statistics, Time Series, Data Mining, Data Reporting, JavaScript, Data Analytics