Working with non-numerical data can be challenging, even for seasoned data scientists. To make good use of such data, it needs to be transformed. But how? In this article, Toptal Data Scientist Yaroslav Kopotilov will introduce you to embeddings and demonstrate how they can be used to visualize complex data and make it usable.
IMDb ratings have genre bias: For example, dramas tend to score higher. Removing common feature bias and keeping unique characteristics, it's possible to create a new, refined score based on IMDb information.
Automatically scaling container deployments in a microservices-based app architecture is downright luxurious...once it's set up. But what's the best way to tune an app's orchestration parameters?
Data warehouses aren’t exactly a new concept, but industry demand for data science services, coupled with the rise of AI and machine learning, is making them more relevant than ever. In this post, Toptal Data Warehouse Developer Chamitha Wanaguru outlines three basic principles you need to keep in mind when developing a new data warehouse.
The Protein Data Bank (PDB) bioinformatics database is the world's largest repository of experimentally-determined structures of proteins, nucleic acids, and complex assemblies. All data is gathered using experimental methods such as X-ray, spectroscopy, crystallography, NMR, etc. This article explains how to extract, filter, and clean data from the PDB to make it suitable for further analysis.
In this article, Toptal Freelance Software Engineer Neven Pičuljan introduces you to the intricacies of deep learning in hedge funds and finance in general.
Limited SQL scalability has prompted the industry to develop and deploy a number of NoSQL database management systems, with a focus on performance, reliability, and consistency. The trend was driven by proprietary NoSQL databases developed by Google and Amazon. Eventually, open-source systems like MongoDB, Cassandra, and Hypertable brought NoSQL within reach of everyone. In this post, Senior Software Engineer Mohamad Altarade dives into some of them and explains why NoSQL will probably be with us for years to come.
With the rise of big data and data science, storage and retrieval have become a critical pipeline component for data use and analysis. Recently, new data storage technologies have emerged. But the question is: Which one should you choose? Which one is best suited for data engineering? In this article, Toptal Data Scientist Ken Hu compares three prominent storage technologies within the context of data engineering.
The Hadoop Distributed File System (HDFS) is a scalable, open-source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics. In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.
World-class articles, delivered weekly.
Subscription implies consent to our privacy policy
Thank you!
Check out your inbox to confirm your invite.
Join the Toptal® community.