Sharing related information among isolated systems has become increasingly important. There are many methods to choose from to perform that task for SQL Server, but it’s important to know which is better for each use case.Continue reading →
Dropwizard allows developers to quickly bootstrap their projects and package applications as easily deployable standalone services. It also happens to be relatively simple to use and implement.
In this tutorial, Toptal Freelance Software Engineer Dusan Simonovic will introduce you to Dropwizard and demonstrate how you can use this powerful framework to create RESTful web services with ease.Continue reading →
Build a bot that analyzes the sentiment of incoming email messages using Recursive Neural Tensor Networks from the Stanford NLP library.Continue reading →
Twitter is a goldmine of data. Unlike other social platforms, almost every user’s tweets are completely public and pullable.
In this tutorial, Toptal Freelance Software Engineer Anthony Sistilli will be exploring how you can use Python, the Twitter API, and data mining techniques to gather useful data.Continue reading →
Social networks are among the biggest sources of data today, and this means they are an extremely valuable asset for marketers, big data specialists, and even individual users like journalists and other professionals. Harnessing the potential of real-time Twitter data is also useful in many time-sensitive business processes.
In this article, Toptal Freelance Software Engineer Hanee’ Medhat explains how you can build a simple Python application to leverage the power of Apache Spark, and then use it to read and process tweets to identify trending hashtags.Continue reading →
Security has always been a primary concern for database experts, and with the advent of new, decentralized services, it’s become even more crucial. Microsoft addressed the need for an added level of security in SQL with the introduction of Always Encrypted functionality in SQL Server 2016.
In this blog post, Toptal Freelance Software Engineer Josip Saban explains how Microsoft’s Always Encrypted concept works, how it’s implemented, and why developers can’t afford to ignore it.Continue reading →
Consistent Hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table. It powers many high-traffic dynamic websites and web applications.
In this tutorial, Toptal Freelance Software Engineer Juan Pablo Carzolio will walk us through what it is and how hashing, distributed hashing and consistent hashing work.Continue reading →
Limited SQL scalability has prompted the industry to develop and deploy a number of NoSQL database management systems, with a focus on performance, reliability, and consistency. The trend was driven by proprietary NoSQL databases developed by Google and Amazon. Eventually, open-source systems like MongoDB, Cassandra, and Hypertable brought NoSQL within reach of everyone.
In this post, Toptal Software Engineer Mohamad Altarade dives into some of them and explains why NoSQL will probably be with us for years to come.Continue reading →
With the rise of big data and data science, storage and retrieval have become a critical pipeline component for data use and analysis. Recently, new data storage technologies have emerged. But the question is: Which one should you choose? Which one is best suited for data engineering?
In this article, Toptal Data Scientist Ken Hu compares three prominent storage technologies within the context of data engineering.Continue reading →
Nobody wants to leave valuable customer data behind. Unfortunately, though, the hardest part of data migration to a complex CRM system, such as Salesforce, is the handling of legacy data.
In this article, Toptal Software Engineer Marian Paul provides 10 tips for successful legacy data migration to Salesforce.Continue reading →
The Hadoop Distributed File System (HDFS) is a scalable, open source solution for storing and processing large volumes of data. With its built-in replication and resilience to disk failures, HDFS is an ideal system for storing and processing data for analytics.
In this step-by-step tutorial, Toptal Database Developer Dallas H. Snider details how to migrate existing data from a PostgreSQL database into the more efficient HDFS.Continue reading →
Genome data is one of the most widely analyzed datasets in the realm of Bioinformatics. The SciPy stack offers a suite of popular Python packages designed for numerical computing, data transformation, analysis and visualization, which is ideal for many bioinformatic analysis needs.
In this tutorial, Toptal Software Engineer Zhuyi Xue walks us through some of the capabilities of the SciPy stack. He also answers some interesting questions about the human genome, including: How much of the genome is incomplete? How long is a typical gene?Continue reading →
As a language, R is strongly tied to data and is thus used mostly by statisticians and data scientists. Many who already use R for machine learning, though, are not aware that data munging can be done faster in R, meaning another tool is not required for that task.
In this article, Freelance Software Engineer Jan Gorecki explores tabular data transformations and introduces us to one of the fastest open-source data wrangling tools available.Continue reading →
Ever tried to create a JSON data structure that includes entities with bidirectional relationships? If you have, you know that this often results in errors or exceptions being thrown.
In this article, Toptal Freelance Software Engineer Nirmel Murtic provides a robust working approach to avoiding these errors when creating JSON structures that included entities with bidirectional (i.e. circular) relationships.Continue reading →
More than 60 percent of trading activities with different assets rely on automated trading and machine learning instead of human traders. Today, specialized programs based on particular algorithms and learned patterns automatically buy and sell assets in various markets, with a goal to achieve a positive return in the long run.
In this article, Toptal Freelance Data Scientist Andrea Nalon explains how to predict, using machine learning and Python, which trade should be made next on the S&P 500 to get a positive gain.Continue reading →
Clustering algorithms are very important to unsupervised learning and are key elements of machine learning in general. These algorithms give meaning to data that are not labelled and help find structure in chaos. But not all clustering algorithms are created equal; each has its own pros and cons.
In this article, Toptal Freelance Software Engineer Lovro Iliassich explores a heap of clustering algorithms, from the well known K-Means algorithm to the elegant, state-of-the-art Affinity Propagation technique.Continue reading →
When dealing with data analysis, most companies rely on MS Excel or Google Sheets, but dealing with data presented this way isn’t very eye-catching or intuitive. It’s once you add visualizations to this data that things become a little easier to manage. That’s the topic of today’s tutorial by our guest author from Adobe, Rohit Boggarapu. Join us as he guides us though the process of making interactive drill-down charts using jQuery and FusionCharts.Continue reading →
Today, a massive amount of data is available in the form of networks or graphs. For example, the World Wide Web, with its web pages and hyperlinks, social networks, semantic networks, biological networks, citation networks for scientific literature, and so on.
A tree is a special type of graph, and is naturally suited to represent many types of data. The analysis of trees is an important field in computer and data science. In this article, we will look at the analysis of the link structure in trees. In particular, we will focus on tree kernels, a method for comparing tree graphs to each other, allowing us to get quantifiable measurements of their similarities or differences. This an important process for many modern applications such as classification and data analysis.Continue reading →
Hackathons often inspire engineers to create amazing software. By blending various technologies together, really useful and often fun projects can be realized in a short period of time.
In this article, Toptal engineer Radek Ostrowski shares his experience participating in the IBM Sparkathon, and walks us through how he elegantly combined the power of Apache Spark and Docker in IBM Bluemix to build a weather app.Continue reading →
Machine Learning, in computing, is where art meets science. Perfecting a machine learning tool is a lot about understanding data and choosing the right algorithm. But why choose one algorithm when you can choose many and make them all work to achieve one thing: improved results.
In this article, Toptal Engineer Necati Demir walks us through some elegant techniques of ensemble methods where a combination of data splits and multiple algorithms is used to produce machine learning results with higher accuracy.Continue reading →
In this article, Toptal engineer Ivan Voras provides a useful overview and comparison of multi-processing network server models, with the goal being to take some of the mystery out of writing high performance networking code. The article is intended for “system programmers”, i.e., back-end developers who will work with the low-level details of their applications, implementing network server code.Continue reading →
Developers often work on only one machine, and have their whole development environment on that machine. Testing database replication before deploying changes in this kind of a development environment can be a challenging task.
In this article, Toptal engineer Ivan Bojovic guides us through a step-by-step tutorial on how to implement MySQL master-slave replication on one machine.Continue reading →
Developers sometimes run into tasks that require access to email mailboxes. In most cases, this is done using the Internet Message Access Protocol, or IMAP. As a PHP developer, I first turned to PHP’s built in IMAP library, but this library is buggy and impossible to debug or modify. So today we will create a working IMAP email client from the ground up using PHP. We will also see how to use Gmail’s special commands.Continue reading →
Image processing algorithms are often very resource intensive due to fact that they process pixels on an image one at a time and often requires multiple passes. Successive Mean Quantization Transform (SMQT) is one such resource intensive algorithm that can process images taken in low-light conditions and reveal details from dark regions of the image.
In this article, Toptal engineer Daniel Angel Munoz Trejo gives us some insight into how the SMQT algorithm works and walks us through a clever optimization technique to make the algorithm a viable option for handheld devices.Continue reading →
When Mike Bostock created D3.js, he introduced a tried and true reusable charts pattern for implementing the same chart in any number of selections. However, the limitations of this pattern are realized once the chart is initialized. In this article, Toptal engineer Rob Moore presents a revised reusable charts pattern that leverages the full power of D3.js.Continue reading →
Bitcoin created a lot of buzz on the Internet. It was ridiculed, it was attacked, and eventually it was accepted and became a part of our lives. However, Bitcoin is not alone. At this moment, there are over 700 AltCoin implementations, which use similar principles of CryptoCurrency.Continue reading →
Google’s new cloud code platform does not appear to be taking on GitHub head on. Instead, Cloud Source Repositories (CSR) will allow users to connect to repositories hosted on GitHub or Bitbucket. However, everything is automatically synced to the Google Cloud Source Repository.
The good news is that a Google CSR can be connected to another Git repository hosted on GitHub or Bitbucket. All changes will be synchronised on both platforms, as you can set Google CSR to automatically mirror from GitHub and Bitbucket.Continue reading →
In-memory data collection manipulation is something that we often need to do in data-centric reporting and visualization applications. When needed, we often tend to resort to complex loops, list comprehensions, and other suboptimal means, which can easily end up being a huge mess of hard-to-maintain spaghetti code. Supergroup.js is an in-memory data manipulation library that can be used to solve some common data manipulation challenges on limited datasets.Continue reading →
To retain its users, any application or website must run fast. For mission critical environments, a couple of milliseconds delay in getting information might create big problems. As database sizes grow day by day, we need to fetch data as fast as possible, and write the data back into the database as fast as possible. To make sure all operations are executing smoothly, we have to tune Microsoft SQL Server for performance.Continue reading →
Microsoft Bond is a modern data serialization framework. It provides powerful DSL and flexible protocols, code generators for C++ and C#, efficient protocol implementations for Windows, Linux, and Mac OS X. This article is a quick guide of the features and use of this framework.Continue reading →
Analysts have come to recognize social network data as a virtual treasure trove of information for sensing public opinion trends and groundswells of support. In this article, Toptal Engineer Elder Santos describes the techniques he employed for a proof-of-concept that effectively analyzed Twitter Trend Topics to predict, as a sample test case, regional voting patterns in the 2014 Brazilian presidential election.Continue reading →
Machine learning has changed the way we deal with data. Data driven problems, that are difficult to solve using standard methods, can often be tackled with much more ease using machine learning algorithms. In this article, we will explore Azure Machine Learning features and capabilities through solving one of the problems that we face in our everyday lives.Continue reading →
In today’s data driven world, researches are busy answering interesting questions by churning through huge volumes of data. Some obvious challenges they face are due the sheer size of dataset that they have to deal with. In this article, we take a peek at a simple business intelligence platform implemented on top of the MongoDB Aggregation Pipeline.Continue reading →
In this post, Toptal engineer Radek Ostrowski introduces Apache Spark – fast, easy-to-use, and flexible big data processing. Billed as offering “lightning fast cluster computing”, the Spark technology stack incorporates a comprehensive set of capabilities, including SparkSQL, Spark Streaming, MLlib (for machine learning), and GraphX. Spark may very well be the “child prodigy of big data”, rapidly gaining a dominant position in the complex world of big data processing.Continue reading →
Although the most wide-spread and supported way of running Django is on a Linux system (e.g., with uwsgi and nginx), it actually doesn’t take much work to get it to run on IIS. In this article, Toptal Engineer Ivan Voras walks you through a step-by-step tutorial, clearly explaining how to install Django on IIS.Continue reading →
When coming across the term “text search”, one usually thinks of a large body of text, which is indexed in a way that makes it possible to quickly look up one or more search terms when they are entered by a user. This is a classic problem in computer science, to which many solutions exist.
But how about a reverse scenario? What if what’s available for indexing beforehand is a group of search phrases, and only at runtime is a large body of text presented for searching?Continue reading →
Bitcoin blockchain is the backbone of the network and provides a tamper-proof data structure, providing a shared public ledger open to all. This article provides insight in blockchain technology, current status and its potential.Continue reading →
Since almost all smartphones today are equipped with location sensors, motion sensors, bluetooth, and wifi, today’s mobile apps can use context awareness to dramatically increase their capabilities and value. This article walks you through building a context aware app that employs complex event processing.Continue reading →
ESA’s Rosetta mission soft-landed its Philae probe on a comet, the first time in history that such an extraordinary feat has been achieved. Closely after that, Microsoft Open Sourced .NET.
The first event is a great step for mankind, and the latter is even greater for Microsoft!Continue reading →
There are thousands of different Android powered devices, with different screen sizes, chip architectures, hardware configurations, and software versions. Unfortunately, segmentation is the price to pay for openness, and there are thousands ways your app can fail on different devices.
Regardless of such huge segmentation, the majority of bugs are actually introduced because of logic errors. These bugs are easily prevented, as long as we get the basics right!
Here’s a quick rundown of the 10 most common mistakes Android developers make.Continue reading →
How do we understand and interpret the huge amounts of data coming out of simulations? How do we visualize potential gigabytes of datapoints in a large dataset? In this article I will give a quick introduction to VTK and its pipeline architecture, and go on to discuss a real-life visualization example.Continue reading →
React.js is a fantastic library. It is only one part of a front-end application stack, however. It doesn’t have much to offer when it comes to managing data and state. Facebook, the makers of React, have offered some guidance there in the form of Flux. I’ll introduce basic Flux control flow, discuss what’s missing for Stores, and how to use Backbone Models and Collections to fill the gap in a “Flux-compliant” way.Continue reading →
Data conversion, translation, and mapping is by no means rocket science, but it is by all means tedious. This article introduces MetaDapper, a .NET library that strives to simplify, streamline, and automate the data conversion process to the greatest extent possible.Continue reading →
Scientific computing is hard. But thanks to an ever-growing landscape of open source tools, really tough problems are becoming easier to solve. Toptal engineer Charles Cook provides an in-depth example, leveraging open source tools to solve a problem in computational fluid dynamics.Continue reading →
This article introduces the basics of Machine Learning theory, laying down the common themes and concepts, making it easy to follow the logic and get comfortable with the topic.Continue reading →
Once you step beyond the comfortable confines of English-only character sets, you quickly find yourself entangled in the wonderfully wacky world of UTF-8.
Indeed, navigating through UTF-8 related issues can be a frustrating and hair-pulling experience. This post provides a concise cookbook for addressing these issues when working with PHP and MySQL in particular, based on practical experience and lessons learned.Continue reading →
A potentially critical problem, nicknamed “Heartbleed”, has surfaced in the widely-used OpenSSL cryptographic library. The vulnerability is particularly dangerous in that potentially critical data can be leaked and the attack leaves no trace.
As a user, chances are that sites you frequent regularly are affected and your data may have been compromised. As a developer or sys admin, sites or servers you’re responsible for are likely to have been affected.
Here are the key facts you need to know about this dangerous bug and how to mitigate your vulnerability.Continue reading →
The recent resurgence in Artificial Intelligence has been powered in no small part by a new trend in machine learning, known as “Deep Learning”. In this article, I’ll introduce you to the key concepts and algorithms behind Deep Learning, beginning with the simplest building block.Continue reading →
The need to adapt legacy code and systems to meet modern day performance and processing demands is widespread. This post provides a case study of the use of Erlang and CloudI to adapt legacy code, consisting of a decades-old collection of multi-user game software written in C, to the 21st century.Continue reading →
Testing. It always seems to get left to the last minute, then cut because you’re out of time, budget, or whatever else. Management wonders why developers can’t just “get it right the first time”, and developers (especially on large systems) can be taken off-guard when different stakeholders describe different parts of the system.
With behavior-driven development, you can turn testing into a shared process that focuses on the behaviors of the system, why they matter, and who cares.Continue reading →
As a veteran telecommuter through multiple jobs in my career, I have witnessed and experienced the many joys of being a remote worker. As for the horror stories, I have more than a few I could tell. With a bit of artistic inclination and a talent for mathematics, I also have a fascination with patterns: design patterns, architectural patterns, behavioral patterns, social patterns, weather patterns—all sorts of patterns!
When I first encountered anti-patterns, I discovered a trove of wisdom I wish I had known before I had learned the hard way. Anti-patterns are recognizable repeated patterns that contribute significantly to failure. For example, the manager that keeps interrupting the employee in order to see if the employee is getting any work done is engaging in an anti-pattern that serves to prevent the employee from getting any work done!
Based on my own experiences and experiences of friends and co-workers, I am assembling descriptions of anti-patterns related to telecommuting.Continue reading →
In 2007, Bennett Haselton revealed a minor hack with major implications: querying ranges of numbers on Google would return pages of sensitive information, including Credit Card numbers, Social Security numbers, and more. While Haselton’s hack was addressed and patched, I was able to tweak his original technique to bypass Google’s filter and return the same old dangerous results.Continue reading →
Database tuning can be an incredibly difficult task, particularly when working with large-scale data where even the most minor change can have a dramatic (positive or negative) impact on performance.
In mid-sized and large companies, most database tuning will be handled by a Database Administrator (DBA). But believe me, there are plenty of developers out there who have to perform DBA-like tasks. Further, in many of the companies I’ve seen that do have DBAs, they often struggle to work well with developers—the positions simply require different modes of problem solving, which can lead to disagreement among coworkers.
In this article, I’d like to both provide developers with some developer-side database tuning tips and explain how developers and DBAs can work together effectively.Continue reading →
From the very first days in our lives as programmers, we’ve all dealt with data structures: Arrays, linked lists, trees, sets, stacks and queues are our everyday companions, and the experienced programmer knows when and why to use them.
In this article we’ll see how an oft-neglected data structure, the trie, really shines in application domains with specific features, like word games.Continue reading →
Web Developers often fail to consider the consequences of thousands of users accessing our applications at the same time. Perhaps it’s because we love to rapidly prototype; perhaps it’s because testing such scenarios is simply hard.
Regardless, I’m going to argue that ignoring scalability is not as bad as it sounds—if you use the proper set of tools and follow good development practices. In this case: the Play! framework and the Scala language.Continue reading →
But this isn’t just another article about cohort analysis. If you already know the importance of the topic and want to skip the introduction, you can jump to the simulator, where you can either simulate startup growth based on retention, churn, and a number of other factors, or analyze your own PayPal logs with the code I’ve open sourced.
If, however, you don’t realize that these are some of the most important metrics around–continue reading.Continue reading →
Porn is a big industry. There aren’t many sites on the Internet that can rival the traffic of its biggest players.
And juggling this immense traffic is tough. To make things even harder, much of the content served from porn sites is made up of low latency live streams rather than simple static video content. But for all of the challenges involved, rarely have I read about the developers who take them on. So I decided to write about my own experience on the job.Continue reading →