
Michael Völske
Verified Expert in Engineering
Machine Learning Developer
Michael has a PhD in computer science, a decade of experience solving complex data problems, and a dozen publications at top-tier international venues like SIGIR, CIKM, and ACL—many of them based on web-scale datasets. He excels in planning, procuring, and installing the on-premise data processing infrastructure spanning hundreds of servers, petabytes of disk, and petaflops of compute. Michael has a broad knowledge of modern machine learning and teaches the fundamentals to hundreds of students.
Portfolio
Experience
Availability
Preferred Environment
Linux, Emacs, Visual Studio Code (VS Code), Python 3, Spark, Pandas, Kubernetes, Scikit-learn, Git
The most amazing...
...dataset I've analyzed is a billion-item query log from a major search engine that taught me the power of Apache Spark and resulted in a top-ranked publication.
Work Experience
Assistant Lecturer Online Teaching
Berliner Hochschule für Technik
- Taught an online class on the fundamentals of computer operating systems as a freelance assistant lecturer to classes of 20+ students per semester.
- Designed and administered the practical exercise labs on digital circuits, assembly language, process scheduling, and file systems.
- Created and administered the mid-term and final exams.
Postdoctoral Research Assistant
Bauhaus-Universität Weimar
- Published 7+ research papers on information retrieval and data mining over three years at the A-level international and several minor venues, often leveraging state-of-the-art machine learning and data processing techniques.
- Designed teaching materials on the fundamentals of machine learning as part of an annual lecture and led an associated programming lab for more than 100 students each year.
- Led a team to plan, procure, and install a state-of-the-art GPU computing cluster; designed and implemented the systems for authenticating dozens of researchers using this infrastructure across four institutions.
- Supported and mentored 11 successful students in writing bachelor's and master's theses.
Research Assistant
Bauhaus-Universität Weimar
- Contributed to more than a dozen scientific publications on information retrieval, natural language processing, and data mining, half of which were published at A-rated international venues. Implemented the experimental systems in Python and Java.
- Taught lab classes on the fundamentals of machine learning and held a recurring seminar class on big data processing architectures. Led small student project groups on a variety of hot topics in machine learning and data mining.
- Planned and implemented the procurement, installation, maintenance, and monitoring for computing infrastructure, which involved more than 200 individual servers.
- Supported and mentored 11 successful students in the preparation of bachelor's and master's theses.
Experience
Lecture and Lab Class "Introduction to Machine Learning"
My responsibilities included teaching in-person and holding online classes, designing lecture materials and lab exercises, and supervising teaching assistants. To support the associated programming labs, I rolled out a Kubernetes-based Jupyterhub deployment servicing up to a hundred students at a time. I significantly expanded the teaching materials on neural networks and deep learning.
Query Classification and Log Analysis on a Billion-item Query Log
Research and Implementation for Axiomatic Information Retrieval
https://webis.de/publications.html?q=author:volske+axiomaticThis research project investigated strategies to make retrieval axioms directly usable to benefit real-world search engines. A pilot study published at CIKM 2016 showed how retrieval axioms could directly modify result rankings and thus improve search result quality. Follow-up work in ICTIR 2021 showed how retrieval axioms could generate explanations for arbitrary rankings, making complex relevance scoring functions such as those based on deep neural networks more interpretable. The axiomatic re-ranking pipeline I implemented has contributed to several further publications.
Procurement, Installation, and Maintenance of Computing Infrastructure
Mining Reddit for Abstractive Summarization Ground Truth Data
https://webis.de/data/webis-tldr-17.htmlI led an effort to mine more than four million human-written source-summary pairs from the social media posts made to the Reddit platform, where users frequently summarize long messages as a courtesy to their readers, prefixing the summary with "TL;DR" ("too long; didn't read") or similar. To handle the scale of the input data (all Reddit posts ever made up to the year 2017), I leveraged technologies like Hadoop and Spark. The resulting Webis-TLDR-17 dataset formed the basis for a shared-task competition on abstractive summarization organized by a mixed team of researchers from the industry and academia at INLG 2019. Our dataset was subsequently included in the Huggingface and TensorFlow datasets libraries and has been cited in more than 40 publications so far.
Skills
Languages
Python 3, Python, Java, C, Ada, JavaScript, Scala, SQL
Paradigms
Data Science, DevOps
Other
Machine Learning, Information Retrieval, Data Engineering, University Teaching, Technical Writing, IT Infrastructure, GPU Computing, Programming, Data Mining, Web Technologies, Statistics, Linear Algebra, Optimization, Data Analysis, Cloud Architecture, Natural Language Processing (NLP), Data Visualization, Text Mining, Big Data, Regular Expressions, GPT, Generative Pre-trained Transformers (GPT)
Frameworks
Spark, Hadoop
Libraries/APIs
Pandas, Scikit-learn, PyTorch
Tools
Git, LaTeX, SaltStack, Emacs, Jupyter, GitLab
Platforms
Linux, Kubernetes, Docker, Visual Studio Code (VS Code), Jupyter Notebook
Storage
JSON, Ceph, PostgreSQL, On-premise
Education
PhD in Computer Science
Bauhaus-Universität Weimar - Weimar, Germany
Master's Degree in Computer Science
Bauhaus-Universität Weimar - Weimar, Germany