Ales Franek, Developer in Prague, Czech Republic
Ales is available for hire
Hire Ales

Ales Franek

Verified Expert  in Engineering

Data Scientist and AI Developer

Location
Prague, Czech Republic
Toptal Member Since
March 22, 2021

Ales is a data scientist with eight years of experience in machine learning, natural language processing, information retrieval, data analysis, and data visualization. He is also a strong Python programmer, writing readable, modular, and efficient code. Ales has often owned all parts of the development cycle, from product discovery to production deployment, focusing on impact, pragmatism, and finding creative solutions to business needs that deliver exceptional value.

Portfolio

Signal Media Limited, doing business as Signal AI
Python, PostgreSQL, Git, Extreme Programming...
LexisNexis UK
Python, fastText, Pandas, Elasticsearch, Kibana, Scikit-learn, SpaCy...
Seznam.cz
Python, Machine Learning, NumPy, Scikit-learn, XGBoost, Gensim, fastText, Keras...

Experience

Availability

Part-time

Preferred Environment

PyCharm, DBeaver, Slack, iTerm2, Git, Zoom, Python

The most amazing...

...and impactful project I've worked on was a news recommender system for the most popular Czech web portal that's visited by millions of people every day.

Work Experience

Senior Data Scientist

2019 - 2021
Signal Media Limited, doing business as Signal AI
  • Managed the end-to-end lifecycle and development of all NLP concepts, including discovery, research, data annotation, training, evaluation, serving, monitoring, reporting, and UX.
  • Deployed new components to the production pipeline, which processed millions of documents per day.
  • Optimized existing production pipeline services, thereby reducing their processing costs.
  • Improved model evaluation in a model management platform containing thousands of live classifiers.
  • Introduced a new way to classify documents, doubling the number of available categories.
  • Played a key role in setting direction and OKRs as a member of a highly autonomous cross-functional team.
  • Participated actively in hiring and onboarding new employees and chaired weekly research guild meetings.
Technologies: Python, PostgreSQL, Git, Extreme Programming, Natural Language Processing (NLP), GPT, Generative Pre-trained Transformers (GPT), Scikit-learn, Pandas, NumPy, Elasticsearch, Matplotlib, fastText, SpaCy, Metaflow, Docker, Amazon Web Services (AWS), SVMs, Support Vector Machines (SVM), Data Science, Machine Learning, SQL, Data Visualization, Artificial Intelligence (AI), Tf-idf, Objectives & Key Results (OKRs), Data Analytics, Linux, Data Analysis

Data Scientist

2017 - 2019
LexisNexis UK
  • Significantly improved search relevance of the core legal research: established online and offline metrics; incorporated user engagement to the ranking algorithm; and enhanced query classification, recognition of legal phrases, and autocomplete.
  • Educated the business on basic data science principles and promoted the data-driven decision-making culture.
  • Oversaw data science initiatives across the UK division.
Technologies: Python, fastText, Pandas, Elasticsearch, Kibana, Scikit-learn, SpaCy, Amazon Web Services (AWS), Git, Matplotlib, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), GPT, Machine Learning, Artificial Intelligence (AI), Data Visualization, Data Analysis, SQL, Search, Information Retrieval, Data Science, Data Analytics, NumPy, Gensim, Linux

Applied Machine Learning Researcher

2015 - 2017
Seznam.cz
  • Significantly improved performance and added new functionality to Seznam.cz, the most visited web portal and search engine in the Czech Republic.
  • Improved components of the full-text web search engine and related services by using ML, NLP, neural networks, recommender systems, and anomaly detection.
  • Built a framework for automated versioning, caching, parallelization, visualization, and reproducibility of data science experiments.
  • Halved the error rate of body text extraction for crawled web pages, an essential part of web search.
  • Increased the accuracy of learning-to-rank models by determining optimal discretization of continuous features for decision trees.
  • Increased efficiency of the web crawler by predicting the times of likely future web page updates.
  • Increased the click-through rate on news articles by developing a recommender for Seznam’s homepage, one of the most visited websites in the Czech Republic.
  • Further improved the recommender by using article embeddings to mitigate the cold-start problem.
  • Increased relevance of the autocomplete feature and designed an algorithm for semantic deduplication of the suggested queries.
  • Developed a method for summarization of user product reviews for a comparison shopping service.
Technologies: Python, Machine Learning, NumPy, Scikit-learn, XGBoost, Gensim, fastText, Keras, Matplotlib, Linux, Git, Data Visualization, Artificial Intelligence (AI), Clustering, Decision Trees, Neural Networks, Recommendation Systems, Non-negative Matrix Factorization (NMF), SQL, t-SNE, Genetic Algorithms, Tf-idf, Web Search, Website Ranking, Decision Tree Regression, Data Science, Pattern Recognition, GPT, Generative Pre-trained Transformers (GPT), Natural Language Processing (NLP), Data Analytics, Pandas, Search, Information Retrieval, Predictive Analytics

Computer Vision Researcher

2013 - 2015
Wikidi
  • Designed and developed an end-to-end image retrieval system to detect brand logos in photos from social networks, capable of processing millions of images per day.
  • Explored the potential for an ML-based video encoding algorithm.
  • Contributed to a system that was able to find missing tech specs of products via internet searches.
Technologies: OpenCV, Python, NumPy, Matplotlib, Git, Linux, Computer Vision, Machine Learning, Generative Pre-trained Transformers (GPT), GPT, Natural Language Processing (NLP), Neural Networks, Approximate Nearest Neighbors, Data Visualization, Artificial Intelligence (AI), Data Science, Image Processing

News Recommender System for Seznam.cz

https://www.seznam.cz/
Designed the first news recommender system for the homepage of Seznam.cz, the most visited Czech website. I had to find creative solutions to overcome some of the obstacles; for example, disregarding articles that were not originally logged. The model was based on collaborative filtering and the system also used neural-based document embeddings to mitigate the cold-start problem. The new approach increased the click-through rate by 20% over the original editor-curated feed.

Logo Recognition for Images From Social Networks

Carried out research and designed a cascade of algorithms to detect brand logos on photos from social networks; for example, when someone is holding a Starbucks cup or wearing an adidas sweatshirt. I also implemented the end-to-end system from the ground up, using OpenCV and C++ to achieve extreme efficiency. The app, formerly known as Brandiozo, was a plugin for social media analytics companies.

Machine Learning Meetups Prague

Machine Learning Meetups (MLMU) is an independent platform for people interested in machine learning and artificial intelligence. I orchestrate these regular community meetings that offer inspiring talks about cutting-edge methods, experiences, tools, and applications.

Languages

Python, SQL

Libraries/APIs

Matplotlib, Scikit-learn, Pandas, NumPy, SpaCy, XGBoost, Keras, MLlib, OpenCV

Paradigms

Data Science, Objectives & Key Results (OKRs), Extreme Programming

Other

Artificial Intelligence (AI), Machine Learning, Natural Language Processing (NLP), fastText, Tf-idf, Web Search, Website Ranking, Search, Information Retrieval, GPT, Generative Pre-trained Transformers (GPT), Software Development, Pattern Recognition, Data Analytics, Computer Science, Data Visualization, Decision Tree Regression, Decision Trees, Decision Tree Classification, Logistic Regression, Linear Regression, Neural Networks, SVMs, Support Vector Machines (SVM), Clustering, Recommendation Systems, Data Analysis, Image Processing, Predictive Analytics, Computer Vision, Game Theory, Combinatorial Optimization, Multi-agent Systems, Cryptography, Metaflow, Non-negative Matrix Factorization (NMF), t-SNE, Genetic Algorithms, Approximate Nearest Neighbors

Storage

Amazon S3 (AWS S3), DBeaver, PostgreSQL, Elasticsearch

Tools

PyCharm, Git, Kibana, Gensim, AWS Batch, MATLAB

Platforms

Docker, Amazon Web Services (AWS), Linux, Kubernetes

2014 - 2014

Exchange Semester in Computer Science

National Taiwan University of Science and Technology - Taipei, Taiwan

2011 - 2014

Master's Degree in Artificial Intelligence

Czech Technical University in Prague - Prague, Czechia

2012 - 2012

Exchange Semester in Computer Science

University of Wisconsin–Madison - Madison, WI, USA

2008 - 2011

Bachelor's Degree in Cybernetics and Measurements

Czech Technical University in Prague - Prague, Czechia

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring