Stars Realigned: Improving the IMDb Rating System
IMDb ratings have genre bias: For example, dramas tend to score higher. Removing common feature bias and keeping unique characteristics, it’s possible to create a new, refined score based on IMDb information.
Juan Manuel Ortiz de Zarate
Developing a Bioinformatics Database for Disulfide Bonds Research
The Protein Data Bank (PDB) bioinformatics database is the world’s largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc. This article explains how to extract, filter, and clean data from the PDB to make it suitable for further analysis.
Apache Spark Streaming Tutorial: Identifying Trending Twitter Hashtags
Social networks are among the biggest sources of data today, and this means they are an extremely valuable asset for marketers, big data specialists, and even individual users like journalists and other professionals. Harnessing the potential of real-time Twitter data is also useful in many time-sensitive business processes.
In this article, Toptal Freelance Software Engineer Hanee’ Medhat explains how you can build a simple Python application to leverage the power of Apache Spark, and then use it to read and process tweets to identify trending hashtags.
Hanee' Medhat Shousha
A Guide to Consistent Hashing
Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table. It powers many high-traffic dynamic websites and web applications.
In this tutorial, Toptal Freelance Software Engineer Juan Pablo Carzolio will walk us through what it is and how hashing, distributed hashing and consistent hashing work.
Juan Pablo Carzolio
Tree Kernels: Quantifying Similarity Among Tree-structured Data
Today, a massive amount of data is available in the form of networks or graphs. For example, the World Wide Web, with its web pages and hyperlinks, social networks, semantic networks, biological networks, citation networks for scientific literature, and so on.
A tree is a special type of graph, and is naturally suited to represent many types of data. The analysis of trees is an important field in computer and data science. In this article, we will look at the analysis of the link structure in trees. In particular, we will focus on tree kernels, a method for comparing tree graphs to each other, allowing us to get quantifiable measurements of their similarities or differences. This an important process for many modern applications such as classification and data analysis.
Developing for the Cloud in the Cloud: BigData Development with Docker in AWS
More and more people are moving their work from desktop applications to the cloud using an equivalent online web application. However, this has unfortunately not been true for software development IDEs. Although there have been some attempts to provide an online IDE, they have not come anywhere close to traditional IDEs.
In this article, Toptal Freelance Software Engineer Michele Sciabarra guides us on how to build a cloud-based development environment for Scala and big data applications, with the help of Docker in Amazon AWS.
Introduction to Apache Spark with Examples and Use Cases
In this post, Toptal engineer Radek Ostrowski introduces Apache Spark – fast, easy-to-use, and flexible big data processing. Billed as offering “lightning fast cluster computing”, the Spark technology stack incorporates a comprehensive set of capabilities, including SparkSQL, Spark Streaming, MLlib (for machine learning), and GraphX. Spark may very well be the “child prodigy of big data”, rapidly gaining a dominant position in the complex world of big data processing.
World-class articles, delivered weekly.
- Algorithm Developers
- Angular Developers
- AWS Developers
- Azure Developers
- Big Data Architects
- Blockchain Developers
- Business Intelligence Developers
- C Developers
- Computer Vision Developers
- Django Developers
- Docker Developers
- Elixir Developers
- Go Engineers
- GraphQL Developers
- Jenkins Developers
- Kotlin Developers
- Kubernetes Experts
- Machine Learning Engineers
- Magento Developers
- .NET Developers
- R Developers
- React Native Developers
- Ruby on Rails Developers
- Salesforce Developers
- SQL Developers
- Sys Admins
- Tableau Developers
- Unreal Engine Developers
- Xamarin Developers
- View More Freelance Developers
Join the Toptal® community.