Big Data

Showing 1-9 of 16 results
EngineeringIcon ChevronBack-end

Big Data Architecture for the Masses: A ksqlDB and Kubernetes Tutorial

By Dmitrii Bolotov

Today’s cloud building blocks empower any size team—even a lone engineer—to build big data solutions. Learn how to use open-source tools to create scalable architecture for your next project.

14 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Building a Data Warehouse Data Quality Process

By Alexander Hauskrecht

Data quality is a crucial element of any successful data warehouse solution. As the complexity of data warehouses increases, so does the need for data quality processes. In this article, Toptal Data Quality Developer Alexander Hauskrecht outlines how you can ensure a high degree of data quality and why this process is so important.

16 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Turn Chaos Into Profit: Understanding the ETL Process

By Alexandre Wanderer

ETL can consolidate data from various sources into an organized, reliable, and usable database. This allows businesses to employ previously unused or underused data to improve their performance. In this article, Toptal Data Modeling Developer Alexandre Wanderer demonstrates all stages of the ETL process in building a data warehouse.

8 minute readContinue Reading
EngineeringIcon ChevronBack-end

Introduction to Deep Learning Trading in Hedge Funds

By Neven Pičuljan

In this article, Toptal Freelance Software Engineer Neven Pičuljan introduces you to the intricacies of deep learning in hedge funds and finance in general.

21 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Conquer String Search with the Aho-Corasick Algorithm

By Roman Vashchegin

The Aho-Corasick algorithm can be used to efficiently search for multiple patterns in a large blob of text, making it a really useful algorithm in data science and many other areas. In this article, Toptal Freelance Software Engineer Roman Vashchegin shows how the Aho-Corasick algorithm uses a trie data structure to efficiently match a dictionary of words against any text.

16 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Twitter Data Mining: A Guide to Big Data Analytics Using Python

By Anthony Sistilli

Twitter is a goldmine of data. Unlike other social platforms, almost every user’s tweets are completely public and pullable. In this tutorial, Toptal Freelance Software Engineer Anthony Sistilli will be exploring how you can use Python, the Twitter API, and data mining techniques to gather useful data.

8 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Guide to Consistent Hashing

By Juan Pablo Carzolio

Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table. It powers many high-traffic dynamic websites and web applications. In this tutorial, Toptal Freelance Software Engineer Juan Pablo Carzolio will walk us through what it is and how hashing, distributed hashing and consistent hashing work.

25+ minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Tree Kernels: Quantifying Similarity Among Tree-structured Data

By Dino Causevic

Today, a massive amount of data is available in the form of networks or graphs. For example, the World Wide Web, with its web pages and hyperlinks, social networks, semantic networks, biological networks, citation networks for scientific literature, and so on. A tree is a special type of graph, and is naturally suited to represent many types of data. The analysis of trees is an important field in computer and data science. In this article, we will look at the analysis of the link structure in trees. In particular, we will focus on tree kernels, a method for comparing tree graphs to each other, allowing us to get quantifiable measurements of their similarities or differences. This an important process for many modern applications such as classification and data analysis.

12 minute readContinue Reading
EngineeringIcon ChevronBack-end

Developing for the Cloud in the Cloud: BigData Development with Docker in AWS

By Michele Sciabarra

More and more people are moving their work from desktop applications to the cloud using an equivalent online web application. However, this has unfortunately not been true for software development IDEs. Although there have been some attempts to provide an online IDE, they have not come anywhere close to traditional IDEs. In this article, Toptal Freelance Software Engineer Michele Sciabarra guides us on how to build a cloud-based development environment for Scala and big data applications, with the help of Docker in Amazon AWS.

14 minute readContinue Reading

Join the Toptal® community.