Freelance Full-stack Data Scientist2020 - PRESENTSprout.ai
Technologies: Jupyter Notebook, Python, Docker, PyTorch, Hugging Face, Scikit-learn, Amazon Web Services (AWS), Kubeflow, Kubernetes, Seaborn, Matplotlib, Plotly, Discriminant Analysis (LDA), Topic Modeling, Machine Learning, Deep Learning, Active Learning, Natural Language Processing (NLP), Machine Learning Operations (MLOps)
- Ran experiments and built a framework to classify text with tiny labeled datasets; few-shot learning using natural language inference.
- Provided a series of tools to better understand the company data and models, such as topic modeling, model explainability, bias analysis, and outlier detection.
- Set up a complete active learning workflow using Label Studio and dedicated back ends.
- Packaged products such as NLI framework and active learning workflow using Docker containers.
- Audited the company's ML platform on MLOps and identified next improvements to focus on.
- Set up a development environment for Kubeflow with Kubernetes and migrated the NLI framework to the Kubeflow pipeline.
- Conducted research on federated learning and suggested a series of bespoke and open-source solutions.
Full-stack Data Scientist2020 - 2021UK-based Startups
Technologies: Python, PyCharm, Scikit-learn, Amazon SageMaker, Machine Learning Operations (MLOps), Natural Language Processing (NLP), Profiling
- Evaluated and compared a series of MLOps platform solutions like AWS SageMaker, Databricks, Kubeflow, and Cnvrg.
- Designed and proposed multiple service architectures to implement MLOps.
- Set up MLOps using DVC, MLFlow, and SageMaker to track experiments, train models, save, and deploy them.
- Performed topic modeling and built a series of ensembles on text classification to identify stress and stressors in social media: a multi-modal deep learning classifier combining text and metadata. This was done as part of the data science bootcamp.
- Generated synthetic data to train and evaluate models.
- Ran profiling to provide 8x improvement in inference time on a classifier.
- Freelanced for Updraft and one stealth startup.
Data Scientist2020 - 2020Department for Digital, Culture, Media and Sport (DCMS)
Technologies: Python, Pandas, NumPy, Matplotlib, SpaCy, Google BigQuery
- Performed a literature review of state of the art in job offers classification.
- Built models using the spaCy similarity API, comparing job offer descriptions and titles to UK Standard Industrial Classification (UK SIC) descriptions.
- Built scripts to run the model on the whole collection (more than one million job offers) and run daily on new job offers.
AI Researcher and Senior Developer2019 - 2020The National Archives
Technologies: Java, Python, Amazon SageMaker
- Interviewed and assessed five suppliers on their solutions and reports from off-the-shelf record management products to bespoke solutions using fully customized models or AI APIs. Worked on a third-party technology evaluation project,.
- Wrote a 50+ page report on NLP techniques and tools to select for permanent preservation records held by government departments to be shared with multiple audiences, including government decision-makers, archivists, and data scientists.
- Delivered a new release on DROID, an open-source project. Implemented or reviewed 60+ pull requests.
- Managed the GitHub community on DROID, responding to user queries, reviewing and merging pull requests.
- Increased project transparency and improved project prioritizing.
- Advocated for the improvement of remote work in my department and offered guidance and support during the COVID-19 lockdown.
CTO and Co-founder2015 - 2020Trackener
- Delivered an IoT product to the market to allow horse owners to look after their horses when they have chronic or acute health issues, keep their horses healthy and happy, and find peace of mind.
- Led the software part for our product, including mobile, web app, and back-end development, servers management, and data science.
- Ran the project management of our technical team, including two hardware and software engineers and part-time contractors.
Senior Java Developer2015 - 2016The National Archives
Technologies: Java, Spring Boot, Groovy, MongoDB, Neo4j, JProfiler
- Collaborated closely with a researcher and a data expert, took over a prototype to link documents, and designed and implemented a set of applications to link collections, evaluate them and publish them.
- Designed, implemented, and deployed to live and maintained a set of back-end applications dedicated to the categorization of 20+ million records of the national archives for their end-user website Discovery, using Lucene, then Solr.
- Managed the servers where my applications were running.
- Installed continuous integration platforms such as Jenkins, Nexus, and SonarQube.
- Created an ML prototype to classify documents based on similarity with Lucene.
- Ran a series of technical presentations to my department.
Software Engineer2011 - 2014Worldline by Atos
Technologies: Java, Maven, Spring, Apache, Apache Tomcat, NGINX, Linux, Hibernate, SQL, MySQL, Object-oriented Design (OOD), Software Architecture
- Took part in the development, project management, and production support of a mediation platform for a high-visibility project for Orange.
- Contributed to multiple projects of varying size on the back ends of the leading French telecom company Orange, including project management and customer relationship, functional and technical design, implementation, and production support.
- Helped develop a banking application dedicated to mandating management in the SEPA norm for BPCE.
- Supervised an offshore development team based in India, including support, validation, and monitoring.