Data Science

Showing 1-9 of 34 results
EngineeringIcon ChevronData Science and Databases

Apache Spark Optimization Techniques for High-performance Data Processing

By Necati Demir, PhD

Apache Spark is an analytics engine that can handle very large data sets. This guide reveals strategies to optimize its performance using PySpark.

11 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Mining for Twitter Clusters: Social Network Analysis With R and Gephi

By Juan Manuel Ortiz de Zarate

Explore Twitter data clusters to discover keywords that dominate identified groups. Focusing on a politically slanted data set provides an easy target for analysis.

8 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Python vs. R: Syntactic Sugar Magic

By Leandro Roser

Python and R empower data scientists to solve problems using elegant syntactic sugar, simplifying coding and solution exploration. Each language brings its unique capabilities and approach to bear.

7 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Understanding Twitter Dynamics With R and Gephi: Text Analysis and Centrality

By Juan Manuel Ortiz de Zarate

Centrality and text analysis allow users to get more out of their social network data. Here’s how you can leverage them using R and Gephi.

12 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

A Deeper Meaning: Topic Modeling in Python

By Federico Albanese

Colloquial language doesn’t lend itself to computation. That’s where natural language processing steps in. Learn how topic modeling helps computers understand human speech.

8 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Social Network Analysis in R and Gephi: Digging Into Twitter

By Juan Manuel Ortiz de Zarate

Thanks to rapid advances in technology, large amounts of data generated on social networks can be analyzed with relative ease, especially for those who use the R programming language and Gephi.

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Graph Data Science With Python/NetworkX

By Federico Albanese

Data inundates us like never before—how can we hope to analyze it? Graphs (networks, not bar graphs) provide an elegant approach. Find out how to start with the Python NetworkX library to describe, visualize, and analyze "graph theory" datasets.

9 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Machine Learning Number Recognition: From Zero to Application

By Teimur Gasanov

Harnessing the potential of machine learning for computer vision is not a new concept but recent advances and the availability of new tools and datasets have made it more accessible to developers. In this article, Toptal Software Developer Teimur Gasanov demonstrates how you can create an app capable of identifying handwritten digits in under 30 minutes, including the API and UI.

10 minute readContinue Reading
EngineeringIcon ChevronData Science and Databases

Embeddings in Machine Learning: Making Complex Data Simple

By Yaroslav Kopotilov

Working with non-numerical data can be challenging, even for seasoned data scientists. To make good use of such data, it needs to be transformed. But how? In this article, Toptal Data Scientist Yaroslav Kopotilov will introduce you to embeddings and demonstrate how they can be used to visualize complex data and make it usable.

11 minute readContinue Reading

Join the Toptal® community.