Algorithm Posts

The Toptal Engineering Blog is a hub for in-depth development tutorials and new technology announcements created by professional software engineers in the Toptal network.
Vladyslav Millier
Exploring Supervised Machine Learning Algorithms

While machine learning sounds highly technical, an introduction to the statistical methods involved quickly brings it within reach. In this article, Toptal Freelance Software Engineer Vladyslav Millier explores basic supervised machine learning algorithms and scikit-learn, using them to predict survival rates for Titanic passengers.

Continue reading →
Roman Vashchegin
Conquer String Search with the Aho-Corasick Algorithm

The Aho-Corasick algorithm can be used to efficiently search for multiple patterns in a large blob of text, making it a really useful algorithm in data science and many other areas.

In this article, Toptal Freelance Software Engineer Roman Vashchegin shows how the Aho-Corasick algorithm uses a trie data structure to efficiently match a dictionary of words against any text.

Continue reading →
Yuri da Silva Villas Boas
Getting Started with the SRVB Cryptosystem

This article will give you an introduction to the principles behind public-key cryptosystems and introduce you to the Santana Rocha-Villas Boas (SRVB) cryptosystem, developed by the author of the article and prof. Daniel Santana Rocha. The algorithm authors are making a campaign that includes a financial reward to anyone who manages to crack the code.

Continue reading →
Juan Pablo Carzolio
A Guide to Consistent Hashing

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table. It powers many high-traffic dynamic websites and web applications.

In this tutorial, Toptal Freelance Software Engineer Juan Pablo Carzolio will walk us through what it is and how hashing, distributed hashing and consistent hashing work.

Continue reading →
Shanglun Wang
How to Build a Natural Language Processing App

Natural language is increasingly becoming a viable way of interacting with smart software. Google search, Apple’s Siri, Microsoft’s Cortana, etc. are all capable of understanding queries in natural language.

In this article, Toptal Freelance Software Engineer Shanglun (Sean) Wang walks us through some useful concepts and techniques in natural language processing and shows how they can be used to build a simple NLP app.

Continue reading →
Eugene Ossipov
Genetic Algorithms: Search and Optimization by Natural Selection

Many problems have optimal algorithms developed for them, while many others require us to randomly guess until we get a good answer. Even an optimal solution becomes slow and complex at a certain scale, at which point we can turn to natural processes to see how they reach acceptable results.

In this article, Toptal Freelance Software Engineer Eugene Ossipov walks us through the basics of creating a Genetic Algorithm and gives us the knowledge to delve deeper into solving any problems using this approach.

Continue reading →
Lovro Iliassich
Clustering Algorithms: From Start To State Of The Art

Clustering algorithms are very important to unsupervised learning and are key elements of machine learning in general. These algorithms give meaning to data that are not labelled and help find structure in chaos. But not all clustering algorithms are created equal; each has its own pros and cons.

In this article, Toptal Freelance Software Engineer Lovro Iliassich explores a heap of clustering algorithms, from the well known K-Means algorithm to the elegant, state-of-the-art Affinity Propagation technique.

Continue reading →
Dino Causevic
Tree Kernels: Quantifying Similarity Among Tree-Structured Data

Today, a massive amount of data is available in the form of networks or graphs. For example, the World Wide Web, with its web pages and hyperlinks, social networks, semantic networks, biological networks, citation networks for scientific literature, and so on.

A tree is a special type of graph, and is naturally suited to represent many types of data. The analysis of trees is an important field in computer and data science. In this article, we will look at the analysis of the link structure in trees. In particular, we will focus on tree kernels, a method for comparing tree graphs to each other, allowing us to get quantifiable measurements of their similarities or differences. This an important process for many modern applications such as classification and data analysis.

Continue reading →
Daniel Angel Muñoz Trejo
Optimized Successive Mean Quantization Transform

Image processing algorithms are often very resource intensive due to fact that they process pixels on an image one at a time and often requires multiple passes. Successive Mean Quantization Transform (SMQT) is one such resource intensive algorithm that can process images taken in low-light conditions and reveal details from dark regions of the image.

In this article, Toptal engineer Daniel Angel Munoz Trejo gives us some insight into how the SMQT algorithm works and walks us through a clever optimization technique to make the algorithm a viable option for handheld devices.

Continue reading →
Mahmud Ridwan
Predicting Likes: Inside A Simple Recommendation Engine's Algorithms

The Internet is becoming “smarter” every day. The video-sharing website that you frequently visit seems to know exactly what you will like, even before you have seen it. The online shopping cart holding your items almost magically figures out the one thing that you may have missed or intended to add before checking out. It’s as if these web services are reading your mind - or are they?

Turns out, predicting a user’s likes involves more math than magic. In this article we will explore one of the many ways of building a recommendation engine that is both simple to implement and understand.

Continue reading →
Jovan Jovanovic
How does Shazam work? Music Recognition Algorithms, Fingerprinting, and Processing

You hear a familiar song in the club or the restaurant. You listened to this song a thousand times long ago, and the sentimentality of the song really touches your heart. You desperately want to heart it tomorrow, but you can’t remember its name! Fortunately, in our amazing futuristic world, you have a phone with music recognition software installed, and you are saved.

But how does this really work? Shazam’s algorithm was revealed to world in 2003. In this article we’ll go over the fundamentals of that algorithm.

Continue reading →
Ahmed Al-Amir
Needle in a Haystack: A Nifty Large-Scale Text Search Algorithm Tutorial

When coming across the term “text search”, one usually thinks of a large body of text, which is indexed in a way that makes it possible to quickly look up one or more search terms when they are entered by a user. This is a classic problem in computer science, to which many solutions exist.

But how about a reverse scenario? What if what’s available for indexing beforehand is a group of search phrases, and only at runtime is a large body of text presented for searching?

Continue reading →