Data Science and Databases

Showing 92-98 of 139 results

Share

An HDFS Tutorial for Data Analysts Stuck with Relational Databases

The Hadoop Distributed File System (HDFS) is a scalable, open-source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics.

In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.

10 minute readContinue Reading
Dallas H. Snider

Dallas H. Snider

Dallas has 22 years of database application development experience. He has worked with SQL servers & Oracle, in both Windows & Linux.

A Comprehensive Introduction To Your Genome With the SciPy Stack

Genome data is one of the most widely analyzed datasets in the realm of Bioinformatics. The SciPy stack offers a suite of popular Python packages designed for numerical computing, data transformation, analysis and visualization, which is ideal for many bioinformatic analysis needs.

In this tutorial, Toptal Software Engineer Zhuyi Xue walks us through some of the capabilities of the SciPy stack. He also answers some interesting questions about the human genome, including: How much of the genome is incomplete? How long is a typical gene?

23 minute readContinue Reading
Zhuyi Xue

Zhuyi Xue

Zhuyi is a skilled Python developer with over seven years of experience. He is also proficient in JavaScript and Scala.

Boost Your Data Munging With R

As a language, R is strongly tied to data and is thus used mostly by statisticians and data scientists. Many who already use R for machine learning, though, are not aware that data munging can be done faster in R, meaning another tool is not required for that task.

In this article, Freelance Software Engineer Jan Gorecki explores tabular data transformations and introduces us to one of the fastest open-source data wrangling tools available.

17 minute readContinue Reading
Jan Gorecki

Jan Gorecki

Jan is a business intelligence and data warehousing expert with advanced R skills and some infrastructure experience.

Bidirectional Relationship Support in JSON

Ever tried to create a JSON data structure that includes entities with bidirectional relationships? If you have, you know that this often results in errors or exceptions being thrown.

In this article, Toptal Freelance Software Engineer Nirmel Murtic provides a robust working approach to avoiding these errors when creating JSON structures that included entities with bidirectional (i.e., circular) relationships.

10 minute readContinue Reading
Nirmel Murtic

Nirmel Murtic

Nirmel is a software engineer with more than eight years of professional experience. He excels as a solo developer but has lead experience, too.

The Rise Of Automated Trading: Machines Trading the S&P 500

More than 60 percent of trading activities with different assets rely on automated trading and machine learning instead of human traders. Today, specialized programs based on particular algorithms and learned patterns automatically buy and sell assets in various markets, with a goal to achieve a positive return in the long run.

In this article, Toptal Freelance Data Scientist Andrea Nalon explains how to predict, using machine learning and Python, which trade should be made next on the S&P 500 to get a positive gain.

24 minute readContinue Reading
Andrea Nalon

Andrea Nalon

With an MCE and extensive ML and quantitative analysis training, Andrea’s a data science experience covers R, Python, VBA, Excel, and SQL.

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Clustering Algorithms: From Start to State of the Art

Clustering algorithms are very important to unsupervised learning and are key elements of machine learning in general. These algorithms give meaning to data that are not labelled and help find structure in chaos. But not all clustering algorithms are created equal; each has its own pros and cons.

In this article, Toptal Freelance Software Engineer Lovro Iliassich explores a heap of clustering algorithms, from the well known K-Means algorithm to the elegant, state-of-the-art Affinity Propagation technique.

11 minute readContinue Reading
Lovro Iliassich

Lovro Iliassich

Lovro is a Machine Learning Engineer and Data Scientist. He worked at Amazon as well as a researcher at multiple academic institutions.

A Tutorial on Drill-down FusionCharts in jQuery

When dealing with data analysis, most companies rely on MS Excel or Google Sheets, but dealing with data presented this way isn’t very eye-catching or intuitive. It’s once you add visualizations to this data that things become a little easier to manage. That’s the topic of today’s tutorial by our guest author from Adobe, Rohit Boggarapu. Join us as he guides us though the process of making interactive drill-down charts using jQuery and FusionCharts.

9 minute readContinue Reading
Rohit Boggarapu

Rohit Boggarapu

Rohit (BTech) is a software engineer at Adobe and exceptional front-end programmer. He specializes in React/RN, Ionic, and Angular.

Toptal Engineering Expert

Gabriel Courtemanche

Gabriel is a highly efficient and reliable professional who possesses a broad skill set for web application development. He's been working on a range of products and clients—from working on scalability problems in production engineering teams at Shopify and Autodesk to launching new applications for startups. Most of his work consists of leading technical teams, by creating an easy development environment, fixing technical debts, providing best practices code examples, and mentoring devs.
Read more

Previously At

Shopify

World-class articles, delivered weekly.

Subscription implies consent to our privacy policy

Join the Toptal® community.