Automatically scaling container deployments in a microservices-based app architecture is downright luxurious...once it's set up. But what's the best way to tune an app's orchestration parameters?
Data warehouses aren’t exactly a new concept, but industry demand for data science services, coupled with the rise of AI and machine learning, is making them more relevant than ever. In this post, Toptal Data Warehouse Developer Chamitha Wanaguru outlines three basic principles you need to keep in mind when developing a new data warehouse.
The Protein Data Bank (PDB) bioinformatics database is the world's largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc. This article explains how to extract, filter, and clean data from the PDB to make it suitable for further analysis.
In this article, Toptal Freelance Software Engineer Neven Pičuljan introduces you to the intricacies of deep learning in hedge funds and finance in general.
Limited SQL scalability has prompted the industry to develop and deploy a number of NoSQL database management systems, with a focus on performance, reliability, and consistency. The trend was driven by proprietary NoSQL databases developed by Google and Amazon. Eventually, open-source systems like MongoDB, Cassandra, and Hypertable brought NoSQL within reach of everyone. In this post, Senior Software Engineer Mohamad Altarade dives into some of them and explains why NoSQL will probably be with us for years to come.
With the rise of big data and data science, storage and retrieval have become a critical pipeline component for data use and analysis. Recently, new data storage technologies have emerged. But the question is: Which one should you choose? Which one is best suited for data engineering? In this article, Toptal Data Scientist Ken Hu compares three prominent storage technologies within the context of data engineering.
The Hadoop Distributed File System (HDFS) is a scalable, open-source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.
Genome data is one of the most widely analyzed datasets in the realm of Bioinformatics. The SciPy stack offers a suite of popular Python packages designed for numerical computing, data transformation, analysis and visualization, which is ideal for many bioinformatic analysis needs. In this tutorial, Toptal Software Engineer Zhuyi Xue walks us through some of the capabilities of the SciPy stack. He also answers some interesting questions about the human genome, including: How much of the genome is incomplete? How long is a typical gene?
As a language, R is strongly tied to data and is thus used mostly by statisticians and data scientists. Many who already use R for machine learning, though, are not aware that data munging can be done faster in R, meaning another tool is not required for that task. In this article, Freelance Software Engineer Jan Gorecki explores tabular data transformations and introduces us to one of the fastest open-source data wrangling tools available.
World-class articles, delivered weekly.
Subscription implies consent to our privacy policy
Thank you!
Check out your inbox to confirm your invite.
Join the Toptal® community.