Data Science

Showing 19-27 of 33 results
EngineeringIcon ChevronData Science and Databases

Clustering Algorithms: From Start to State of the Art

By Lovro Iliassich

Clustering algorithms are very important to unsupervised learning and are key elements of machine learning in general. These algorithms give meaning to data that are not labelled and help find structure in chaos. But not all clustering algorithms are created equal; each has its own pros and cons. In this article, Toptal Freelance Software Engineer Lovro Iliassich explores a heap of clustering algorithms, from the well known K-Means algorithm to the elegant, state-of-the-art Affinity Propagation technique.

11 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Tree Kernels: Quantifying Similarity Among Tree-structured Data

By Dino Causevic

Today, a massive amount of data is available in the form of networks or graphs. For example, the World Wide Web, with its web pages and hyperlinks, social networks, semantic networks, biological networks, citation networks for scientific literature, and so on. A tree is a special type of graph, and is naturally suited to represent many types of data. The analysis of trees is an important field in computer and data science. In this article, we will look at the analysis of the link structure in trees. In particular, we will focus on tree kernels, a method for comparing tree graphs to each other, allowing us to get quantifiable measurements of their similarities or differences. This an important process for many modern applications such as classification and data analysis.

12 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Ensemble Methods: Elegant Techniques to Produce Improved Machine Learning Results

By Necati Demir, PhD

Machine Learning, in computing, is where art meets science. Perfecting a machine learning tool is a lot about understanding data and choosing the right algorithm. But why choose one algorithm when you can choose many and make them all work to achieve one thing: improved results. In this article, Toptal Engineer Necati Demir walks us through some elegant techniques of ensemble methods where a combination of data splits and multiple algorithms is used to produce machine learning results with higher accuracy.

6 minute readContinue Reading
EngineeringIcon ChevronBack-end

Guide To Budget-friendly Data Mining

By Jeffrey Shumaker

Although database programming does not evolve at nearly the same pace as traditional application programming, recent advancements in several fields are bringing new techniques and technologies within the reach of small and independent developers. In this guide, Toptal Freelance Software Engineer Jeffrey Shumaker explains how developers can quickly and easily tap these methods to identify database issues they may not even be aware of, and how they can build excellent data mining tools without spending a lot on expensive software licenses.

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Data Mining for Predictive Social Network Analysis

By Elder Santos

Analysts have come to recognize social network data as a virtual treasure trove of information for sensing public opinion trends and groundswells of support. In this article, Toptal Engineer Elder Santos describes the techniques he employed for a proof-of-concept that effectively analyzed Twitter Trend Topics to predict, as a sample test case, regional voting patterns in the 2014 Brazilian presidential election.

7 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Azure Tutorial: Predicting Gas Prices Using Azure Machine Learning Studio

By Ivan Matec

Machine learning has changed the way we deal with data. Data driven problems, that are difficult to solve using standard methods, can often be tackled with much more ease using machine learning algorithms. In this article, we will explore Azure Machine Learning features and capabilities through solving one of the problems that we face in our everyday lives.

5 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Business Intelligence Platform: Tutorial Using MongoDB Aggregation Pipeline

By Avinash Kaza

In today’s data-driven world, researchers are busy answering interesting questions by churning through huge volumes of data. Some obvious challenges they face are due to the sheer size of the dataset they have to deal with. In this article, we take a peek at a simple business intelligence platform implemented on top of the MongoDB Aggregation Pipeline.

7 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Blockchain Technology Explained: Powering Bitcoin

By Nermin Hajdarbegovic

Bitcoin blockchain is the backbone of the network and provides a tamper-proof data structure, providing a shared public ledger open to all. This article provides insight in blockchain technology, current status and its potential.

6 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

3D Data Visualization With Open Source Tools: A Tutorial Using VTK

By Benjamin Hopfer

How do we understand and interpret the huge amounts of data coming out of simulations? How do we visualize potential gigabytes of datapoints in a large dataset? In this article I will give a quick introduction to VTK and its pipeline architecture, and go on to discuss a real-life visualization example.

12 minute readContinue Reading

Join the Toptal® community.