Data Science

Showing 1-9 of 24 results
EngineeringIcon ChevronBack-end

Do the Math: Scaling Microservices Applications with Orchestrators

by Antoine Hamon

Automatically scaling container deployments in a microservices-based app architecture is downright luxurious...once it's set up. But what's the best way to tune an app's orchestration parameters?

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Three Principles of Data Warehouse Development

by Chamitha Wanaguru

Data warehouses aren’t exactly a new concept, but industry demand for data science services, coupled with the rise of AI and machine learning, is making them more relevant than ever. In this post, Toptal Data Warehouse Developer Chamitha Wanaguru outlines three basic principles you need to keep in mind when developing a new data warehouse.

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Developing a Bioinformatics Database for Disulfide Bonds Research

by Viktor Bojović

The Protein Data Bank (PDB) bioinformatics database is the world's largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc. This article explains how to extract, filter, and clean data from the PDB to make it suitable for further analysis.

25 minute readContinue Reading
EngineeringIcon ChevronBack-end

Introduction to Deep Learning Trading in Hedge Funds

by Neven Pičuljan

In this article, Toptal Freelance Software Engineer Neven Pičuljan introduces you to the intricacies of deep learning in hedge funds and finance in general.

21 minute readContinue Reading
EngineeringIcon ChevronBack-end

The Definitive Guide to NoSQL Databases

by Mohammad Altarade

Limited SQL scalability has prompted the industry to develop and deploy a number of NoSQL database management systems, with a focus on performance, reliability, and consistency. The trend was driven by proprietary NoSQL databases developed by Google and Amazon. Eventually, open-source systems like MongoDB, Cassandra, and Hypertable brought NoSQL within reach of everyone. In this post, Senior Software Engineer Mohamad Altarade dives into some of them and explains why NoSQL will probably be with us for years to come.

15 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Data Engineer's Guide To Non-Traditional Data Storages

by Ken Hu

With the rise of big data and data science, storage and retrieval have become a critical pipeline component for data use and analysis. Recently, new data storage technologies have emerged. But the question is: Which one should you choose? Which one is best suited for data engineering? In this article, Toptal Data Scientist Ken Hu compares three prominent storage technologies within the context of data engineering.

7 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

An HDFS Tutorial for Data Analysts Stuck With Relational Databases

by Dallas H. Snider

The Hadoop Distributed File System (HDFS) is a scalable, open source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.

10 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Comprehensive Introduction To Your Genome With the SciPy Stack

by Zhuyi Xue

Genome data is one of the most widely analyzed datasets in the realm of Bioinformatics. The SciPy stack offers a suite of popular Python packages designed for numerical computing, data transformation, analysis and visualization, which is ideal for many bioinformatic analysis needs. In this tutorial, Toptal Software Engineer Zhuyi Xue walks us through some of the capabilities of the SciPy stack. He also answers some interesting questions about the human genome, including: How much of the genome is incomplete? How long is a typical gene?

23 minute readContinue Reading
EngineeringIcon ChevronBack-end

Boost Your Data Munging with R

by Jan Gorecki

As a language, R is strongly tied to data and is thus used mostly by statisticians and data scientists. Many who already use R for machine learning, though, are not aware that data munging can be done faster in R, meaning another tool is not required for that task. In this article, Freelance Software Engineer Jan Gorecki explores tabular data transformations and introduces us to one of the fastest open-source data wrangling tools available.

17 minute readContinue Reading

Join the Toptal® community.