We've Launched "The Suddenly Remote Playbook,"
A Comprehensive Guide for Working Remotely
The Suddenly Remote Playbook
Read Now

Data Science

Showing 1-9 of 25 results
EngineeringIcon ChevronTechnology

Stars Realigned: Improving the IMDb Rating System

by Juan Manuel Ortiz de Zarate

IMDb ratings have genre bias: For example, dramas tend to score higher. Removing common feature bias and keeping unique characteristics, it's possible to create a new, refined score based on IMDb information.

10 minute readContinue Reading
EngineeringIcon ChevronBack-end

Do the Math: Scaling Microservices Applications with Orchestrators

by Antoine Hamon

Automatically scaling container deployments in a microservices-based app architecture is downright luxurious...once it's set up. But what's the best way to tune an app's orchestration parameters?

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Three Principles of Data Warehouse Development

by Chamitha Wanaguru

Data warehouses aren’t exactly a new concept, but industry demand for data science services, coupled with the rise of AI and machine learning, is making them more relevant than ever. In this post, Toptal Data Warehouse Developer Chamitha Wanaguru outlines three basic principles you need to keep in mind when developing a new data warehouse.

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Developing a Bioinformatics Database for Disulfide Bonds Research

by Viktor Bojović

The Protein Data Bank (PDB) bioinformatics database is the world's largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc. This article explains how to extract, filter, and clean data from the PDB to make it suitable for further analysis.

25 minute readContinue Reading
EngineeringIcon ChevronBack-end

Introduction to Deep Learning Trading in Hedge Funds

by Neven Pičuljan

In this article, Toptal Freelance Software Engineer Neven Pičuljan introduces you to the intricacies of deep learning in hedge funds and finance in general.

21 minute readContinue Reading
EngineeringIcon ChevronBack-end

The Definitive Guide to NoSQL Databases

by Mohammad Altarade

Limited SQL scalability has prompted the industry to develop and deploy a number of NoSQL database management systems, with a focus on performance, reliability, and consistency. The trend was driven by proprietary NoSQL databases developed by Google and Amazon. Eventually, open-source systems like MongoDB, Cassandra, and Hypertable brought NoSQL within reach of everyone. In this post, Senior Software Engineer Mohamad Altarade dives into some of them and explains why NoSQL will probably be with us for years to come.

15 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Data Engineer's Guide To Non-Traditional Data Storages

by Ken Hu

With the rise of big data and data science, storage and retrieval have become a critical pipeline component for data use and analysis. Recently, new data storage technologies have emerged. But the question is: Which one should you choose? Which one is best suited for data engineering? In this article, Toptal Data Scientist Ken Hu compares three prominent storage technologies within the context of data engineering.

7 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

An HDFS Tutorial for Data Analysts Stuck with Relational Databases

by Dallas H. Snider

The Hadoop Distributed File System (HDFS) is a scalable, open-source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.

10 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Comprehensive Introduction To Your Genome With the SciPy Stack

by Zhuyi Xue

Genome data is one of the most widely analyzed datasets in the realm of Bioinformatics. The SciPy stack offers a suite of popular Python packages designed for numerical computing, data transformation, analysis and visualization, which is ideal for many bioinformatic analysis needs. In this tutorial, Toptal Software Engineer Zhuyi Xue walks us through some of the capabilities of the SciPy stack. He also answers some interesting questions about the human genome, including: How much of the genome is incomplete? How long is a typical gene?

23 minute readContinue Reading

Join the Toptal® community.