Marc Von Wyl
Verified Expert in Engineering
Machine Learning Engineer and Developer
Marc is a data scientist specializing in natural language processing, with experience in academia and industry, from applied research to productionization and monitoring ML models. He worked on various problems, including extracting insight from large quantities of data, language-specific problematics, and using customers' implicit feedback to train ML models. Marc also teaches introduction to natural language processing as a master-level course at EPITA, an engineering school in Paris, France.
Portfolio
Experience
Availability
Preferred Environment
Linux, Ubuntu, Google Cloud, Python 3, PyTorch, Scikit-learn, SpaCy, Jupyter, Hugging Face Transformers
The most amazing...
...project I've helped develop is a narrative monitoring system capable of summarizing online content and detecting sentiments towards a given topic.
Work Experience
External Teacher
EPITA
- Taught Introduction to Natural Language Processing 1 and 2, two modules of 14 hours each.
- Covered an introduction to linguistics, traditional computational linguistics methods, modern machine learning, and deep learning methods.
- Evaluated students through projects covering coding skills, data analysis, results analysis, and theoretical understanding.
Senior Machine Learning Engineer
Algolia
- Improved our decompounding system significantly by implementing a weekly supervised lexicon generator. Wrote a blog post on the topic that can be read at https://bit.ly/3ylq9kG.
- Enhanced the AI synonyms suggestion system using deep learning technologies.
- Increased the reach of our dynamic re-ranking product by propagating user signals across similar queries.
- Added support for several writing systems, mainly alphasyllabaries, and helped on language-specific issues across several of Algolia's features.
- Took over and improved Algolia's query categorization of the AI feature currently in beta.
- Organized a natural language processing reading group for junior colleagues and an AI guild for ML and NLP practitioners within the company.
NLP Engineer | Lead NLP Engineer
Factmata
- Started as an NLP engineer and was promoted to the lead NLP engineer, managing a team of five data scientists and supervising internships and a master thesis.
- Contributed to the first version of our narrative monitoring system.
- Redesigned our narrative building system by evaluating and comparing clustering algorithms combined with state-of-the-art sentence embeddings.
- Used text summarization and keyphrase extraction to extract key topics within given narratives.
- Developed and iterated on several versions of our stance detection model, which could tell what the sentiment of a given piece of text or narrative on a given topic was.
Speech Research Engineer
Autonomy HPE
- Took over the language modeling part of our speech-to-text system.
- Maintained and automatized the language modeling training pipeline, which covered up to 50 languages.
- Started updating our n-grams language models to RNNs.
Research and Teaching Assistant
University of Geneva
- Developed a multimodal search engine using text and image-based features. The features' weights would iteratively change, responding to users' feedback as they wander inside a collection.
- Wrote and published a research article, "A Parallel Cross-Modal Search Engine over Large-Scale Multimedia Collections with Interactive Relevance Feedback." in ICMR (ACM) in April 2011 with Hisham Mohamed, Eric Bruno, and Stephane Marchand-Maillet.
- Taught classes about introduction to data structures, data analysis, pattern recognition, and applied cybernetics.
Experience
Multilingual Search: Decompounding with Language-specific Lexicons
https://www.algolia.com/developers-tech-blog/code-and-deep-dives/increase-decompounding-accuracy-by-generating-a-language-specific-lexiconIn search, time is critical. We can't use complex machine learning-based methods at query time. I investigated the problem and discovered that simply using the longest match decompounding with a lexicon can give tremendous results, as long as the lexicon is of quality.
To build language-specific lexicons, I used a mix of machine learning, part-of-speech tagging, word probabilities, and linguistics knowledge. The result is a system that can create these lexicons with little supervision, using
• a part-of-speech tagger in the given language (several libraries cover up to a hundred languages),
• a large quantity of unlabelled data, and
• knowledge of the linking morphemes used in the given language, usually a small number.
Narrative Monitoring
I worked first on the cluster summarization part, finding key phrases to summarize our clusters. I also improved the clustering by using the latest research in sentence embeddings and comparing several clustering algorithms. Finally, I worked extensively on our stance detection model, which went through several iterations, starting with a pure linguistics approach before using the latest deep learning models.
Education
Master's Degree in Computer Science
University of Geneva - Geneva, Switzerland
Bachelor's Degree in Computer Science
University of Geneva - Geneva, Switzerland
Bachelor's Degree in Computer Science
University of Applied Sciences and Arts of Western Switzerland - Geneva, Switzerland
Certifications
Natural Language Processing
Coursera
Deep Learning Specialization
Coursera
Skills
Libraries/APIs
Scikit-learn, NumPy, Pandas, Natural Language Toolkit (NLTK), PyTorch, TensorFlow, SpaCy
Tools
Jupyter, PyCharm, MATLAB
Languages
Python 3, Python, C++, R, C++17, Go
Paradigms
Data Science, Continuous Integration (CI), Continuous Delivery (CD)
Platforms
Linux, Ubuntu, Visual Studio Code (VS Code)
Storage
Google Cloud
Other
Natural Language Processing (NLP), Machine Learning, Computational Linguistics, Text Processing, Data Mining, BERT, GPT, Generative Pre-trained Transformers (GPT), Linguistics, Software Engineering, IT Project Management, Artificial Intelligence (AI), Pattern Recognition, Information Retrieval, Deep Learning, Languages, Search, Language Models, Neural Networks, Sentiment Analysis, Transformers, fastText, Image Processing, Multimedia Processing, Probability Theory, Calculus, Linear Algebra, Robotics, Speech to Text, Cloud Infrastructure, Computer Vision, Sequence Models, Recurrent Neural Networks (RNNs), Clustering, Hugging Face Transformers, University Teaching
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring