Oliver is a versatile data scientist and software engineer combining several years of experience and a postgraduate mathematics degree from Oxford. Career assignments have ranged from building machine learning solutions for startups to leading project teams and handling vast amounts of data at Goldman Sachs. With this background, he is adept at picking up new skills quickly to deliver robust solutions to the most demanding of businesses.
Eva is a skilled back-end developer and machine learning engineer with experience in scalability issues, system administration, and more. She has a flair for well-structured, readable, and maintainable applications and excellent knowledge of Python, Ruby, and Go. She is a quick learner and has worked in teams of all sizes.
Necati holds a PhD in machine learning and has 13 years of experience in private industry, including team management. He has worked on various projects, including voice, network security, and embedded Linux, which has enabled him to look at problems from a broad perspective. He specializes in using AI and machine learning to enhance the human experience.
Renee is a data scientist with over 12 years of experience, and five years as a full-stack software engineer. For over 12 years, he has worked in international environments, with English or German as a working language. This includes four years working remotely for German and Austrian client companies and nine months working remotely as a member of the Deutsche Telekom international analytics team.
Aljosa is a data scientist and developer who has more than eight years of experience building statistical/predictive machine learning models, analyzing noisy data sets, and designing and developing decision support tools and services. He joined Toptal because freelancing intrigues him, and the best projects and people are to be found here.
Dr. Karvetski has ten years of experience as a data and decision scientist. He has worked across academia and industry in a variety of team and client settings, and has been recognized as an excellent communicator. He loves working with teams to conceive and deploy novel data science solutions. He has expertise with R, SQL, MATLAB, SAS, and other platforms for data science.
What makes a great data scientist? The answer depends on the branch of data science in question and your specific needs, but all data scientists inevitably share a core set of skills. This comprehensive hiring guide outlines critical skills and explains how to pick the right data scientist for the job.
... allows corporations to quickly assemble teams that have the right skills for specific projects.
Despite accelerating demand for coders, Toptal prides itself on almost Ivy League-level vetting.
Building a cross-platform app to be used worldwide
Tripcents wouldn't exist without Toptal. Toptal Projects enabled us to rapidly develop our foundation with a product manager, lead developer, and senior designer. In just over 60 days we went from concept to Alpha. The speed, knowledge, expertise, and flexibility is second to none. The Toptal team were as part of tripcents as any in-house team member of tripcents. They contributed and took ownership of the development just like everyone else. We will continue to use Toptal. As a start up, they are our secret weapon.
Brantley Pace, CEO & Co-Founder
I am more than pleased with our experience with Toptal. The professional I got to work with was on the phone with me within a couple of hours. I knew after discussing my project with him that he was the candidate I wanted. I hired him immediately and he wasted no time in getting to my project, even going the extra mile by adding some great design elements that enhanced our overall look.
Paul Fenley, Director
K Dunn & Associates
The developers I was paired with were incredible -- smart, driven, and responsive. It used to be hard to find quality engineers and consultants. Now it isn't.
Ryan Rockefeller, CEO
Toptal understood our project needs immediately. We were matched with an exceptional freelancer from Argentina who, from Day 1, immersed himself in our industry, blended seamlessly with our team, understood our vision, and produced top-notch results. Toptal makes connecting with superior developers and programmers very easy.
Jason Kulik, Co-Founder
As a small company with limited resources we can't afford to make expensive mistakes. Toptal provided us with an experienced programmer who was able to hit the ground running and begin contributing immediately. It has been a great experience and one we'd repeat again in a heartbeat.
Stuart Pocknee , Principal
Site Specific Software Solutions
We used Toptal to hire a developer with extensive Amazon Web Services experience. We interviewed four candidates, one of which turned out to be a great fit for our requirements. The process was quick and effective.
Abner Guzmán Rivera, CTO and Chief Scientist
Sergio was an awesome developer to work with. Top notch, responsive, and got the work done efficiently.
Dennis Baldwin, Chief Technologist and Co-Founder
Working with Marcin is a joy. He is competent, professional, flexible, and extremely quick to understand what is required and how to implement it.
André Fischer, CTO
We needed a expert engineer who could start on our project immediately. Simanas exceeded our expectations with his work. Not having to interview and chase down an expert developer was an excellent time-saver and made everyone feel more comfortable with our choice to switch platforms to utilize a more robust language. Toptal made the process easy and convenient. Toptal is now the first place we look for expert-level help.
Derek Minor, Senior VP of Web Development
Networld Media Group
Toptal's developers and architects have been both very professional and easy to work with. The solution they produced was fairly priced and top quality, reducing our time to launch. Thanks again, Toptal.
Jeremy Wessels, CEO
We had a great experience with Toptal. They paired us with the perfect developer for our application and made the process very easy. It was also easy to extend beyond the initial time frame, and we were able to keep the same contractor throughout our project. We definitely recommend Toptal for finding high quality talent quickly and seamlessly.
Ryan Morrissey, CTO
Applied Business Technologies, LLC
I'm incredibly impressed with Toptal. Our developer communicates with me every day, and is a very powerful coder. He's a true professional and his work is just excellent. 5 stars for Toptal.
Pietro Casoar, CEO
Ronin Play Pty Ltd
Working with Toptal has been a great experience. Prior to using them, I had spent quite some time interviewing other freelancers and wasn't finding what I needed. After engaging with Toptal, they matched me up with the perfect developer in a matter of days. The developer I'm working with not only delivers quality code, but he also makes suggestions on things that I hadn't thought of. It's clear to me that Amaury knows what he is doing. Highly recommended!
George Cheng, CEO
As a Toptal qualified front-end developer, I also run my own consulting practice. When clients come to me for help filling key roles on their team, Toptal is the only place I feel comfortable recommending. Toptal's entire candidate pool is the best of the best. Toptal is the best value for money I've found in nearly half a decade of professional online work.
Ethan Brooks, CTO
Langlotz Patent & Trademark Works, Inc.
In Higgle's early days, we needed the best-in-class developers, at affordable rates, in a timely fashion. Toptal delivered!
Lara Aldag, CEO
Toptal makes finding a candidate extremely easy and gives you peace-of-mind that they have the skills to deliver. I would definitely recommend their services to anyone looking for highly-skilled developers.
Michael Gluckman, Data Manager
Toptal’s ability to rapidly match our project with the best developers was just superb. The developers have become part of our team, and I’m amazed at the level of professional commitment each of them has demonstrated. For those looking to work remotely with the best engineers, look no further than Toptal.
Laurent Alis, Founder
Toptal makes finding qualified engineers a breeze. We needed an experienced ASP.NET MVC architect to guide the development of our start-up app, and Toptal had three great candidates for us in less than a week. After making our selection, the engineer was online immediately and hit the ground running. It was so much faster and easier than having to discover and vet candidates ourselves.
Jeff Kelly, Co-Founder
We needed some short-term work in Scala, and Toptal found us a great developer within 24 hours. This simply would not have been possible via any other platform.
Franco Arda, Co-Founder
Toptal offers a no-compromise solution to businesses undergoing rapid development and scale. Every engineer we've contracted through Toptal has quickly integrated into our team and held their work to the highest standard of quality while maintaining blazing development speed.
Greg Kimball, Co-Founder
How to Hire Data Scientists through Toptal
Talk to One of Our Industry Experts
A Toptal director of engineering will work with you to understand your goals, technical needs, and team dynamics.
Work With Hand-Selected Talent
Within days, we'll introduce you to the right data scientist for your project. Average time to match is under 24 hours.
The Right Fit, Guaranteed
Work with your new data scientist for a trial period (pay only if satisfied), ensuring they're the right fit before starting the engagement.
How are Toptal data scientists different?
At Toptal, we thoroughly screen our data scientists to ensure we only match you with talent of the highest caliber. Of the more than 100,000 people who apply to join the Toptal network each year, fewer than 3% make the cut. You'll work with engineering experts (never generalized recruiters or HR reps) to understand your goals, technical needs, and team dynamics. The end result: expert vetted talent from our network, custom matched to fit your business needs. Start now.
Can I hire data scientists in less than 48 hours through Toptal?
Depending on availability and how fast you can progress, you could start working with a data scientist within 48 hours of signing up. Start now.
What is the no-risk trial period for Toptal data scientists?
We make sure that each engagement between you and your data scientist begins with a trial period of up to two weeks. This means that you have time to confirm the engagement will be successful. If you're completely satisfied with the results, we'll bill you for the time and continue the engagement for as long as you'd like. If you're not completely satisfied, you won't be billed. From there, we can either part ways, or we can provide you with another expert who may be a better fit and with whom we will begin a second, no-risk trial. Start now.
How to Hire a Great Data Scientist
Data science is an exploding field. According to the Linkedin Workforce Report, demand for data scientists in 2018 was off the charts.
Data science is a multidisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.(Wikipedia)
There is a huge movement in the industry toward the democratization of artificial intelligence and data science. This means that it’s now easier than ever to pick up relevant skills, and more and more people are trying to break into this field. This is a positive trend, yet at the same time, it is a double-edged sword. It makes hiring data scientists more difficult due to high demand and the constant influx of new people into the field.
This data science hiring guide should help you select the right person for the job.
Is a Data Scientist What You Really Need?
Before we dive into the how, you really need to answer the why. Is a data scientist really what you need? With the democratization of data science came a flood of tools and solutions that can be used as a black box. Tools like Google Vision API and Cloud AutoML help you achieve a lot without having to worry about what’s happening behind the scenes.
Before you start a full-fledged search for a data scientist, know your needs, and search if there is a way to use existing APIs and tools to get the same (and in some cases better) result than an in-house custom solution. The age-old adage “Don’t reinvent the wheel” is very relevant in this case.
Challenges of Hiring Data Scientists
One of the challenges of hiring a data scientist is judging whether the candidate has in-depth knowledge or just relies on blackbox libraries that do all the work for you. These work, but if the candidate has no knowledge of how it works, debugging any problem will be an issue.
The other challenge is knowing what you need the data scientist for, as there are many problems under the umbrella of data science (e.g., big data, Natural Language Processing (NLP), Computer Vision (CV), etc.). The first step before hiring data science talent is to define the scope of the problem you are trying to solve.
Obviously, the traditional ways of finding candidates are still there—such as word of mouth or posting on various websites like LinkedIn or Stack Overflow—but one new place to find good experienced data scientists is Kaggle.
Kaggle is an online community for data scientists and ML practitioners. However, this isn’t a one-size-fits-all type of recommendation. There are candidates who are great at Kaggle but might not be right for your organization. The flip side is also true, as there are candidates who are great data scientists but are not on Kaggle.
Selecting the Right Candidate
The first thing you need to do is define the problem a data scientist will solve for you, and then try to find the bucket of data science it falls into. If you have a language task, such as keyword classification or building a text summarizer, you need an NLP specialist. If the task is more of dealing with pictures and photos, you need a computer vision expert. Similarly, if the job involves dealing with labeled data (facial recognition, spam classification, etc.), you need someone with expertise in supervised learning. If the job is more along the lines of stats, such as A/B testing or analyzing results of controlled trials, you need someone who has a strong skillset in stats (ideally, all data science candidates should have statistical knowledge).
This figure offers a good generalization of different branches of data science:
To be fair, this is a generalization; in reality, the lines are fuzzy and there is some overlap, so the skill chart looks like this:
Note:A good data scientist should know about all these skills and branches and have expertise in one or more of them.
Data Scientist Hard Skills and Experience
Any data scientist should have statistical analysis knowledge, data analytics experience, and a real-world understanding of maths behind the algorithms they use. Other than that, software development and coding experience in Python, R, MATLAB, or Octave is a must. Experience with Hadoop and Spark is always important.
If you want to hire an NLP expert, the candidate should have ideally used NLP packages like NLTK, Stanford NLP, etc.
For computer vision, they should have experience working with CNN (Convolutional Neural Networks), OpenCV, etc.
Experience with algorithms like random forest, decision tree, naive Bayes, linear regression, or SVM (Support Vector Machine) is also necessary.
If the job is more on the side of using deep learning and neural networks, they should have experience building neural networks with TensorFlow, PyTorch, Theano, Torch, Sonnet, Keras, and MXNet.
Asking the Right Questions
1) What is the difference between supervised and unsupervised machine learning?
In supervised machine learning, the algorithm is trained on a labeled dataset, with the task being that the algorithm will find labels given new data.
In unsupervised learning, the data is unlabeled.
2) Can you provide some examples of supervised and unsupervised ML algorithms?
Supervised: random forest, decision tree, naive Bayes, linear regression, SVM, KNN (K-nearest neighbors).
A gradient is the magnitude and direction of the update that is calculated while training a neural network.
An exploding gradient is a problem that happens when the gradient gets accumulated and the value becomes so large that it causes overflow, and the gradient becomes NaN, causing the training to stop.
4) What is a confusion matrix?
A confusion matrix is a 2x2 table that contains the metrics to judge the performance of a binary classifier.
True Positives: values that belonged to the positive class, and the classifier labeled correctly.
False Negatives: values that belonged to the positive class but the classifier labeled them as negative.
False Positives: values that belonged to the negative class but the classifier labeled them as positive.
True Negatives: values that belonged to the negative class, and the classifier labeled correctly.
5) What is the ROC curve and how does it work?
The ROC curve is a graph that tells us about the true positive rates (TPR) and false positive rates (FPR) at various thresholds. It is used to tell the tradeoff between TPR (sensitivity) and FPR (1 - specificity).
6) What is a decision tree and how does it work?
A decision tree is a classification/regression algorithm that uses a tree-like structure to learn the structure in data.
Each node in the tree represents a test. For example, one node can be whether the asset’s input is low and depending on the answer (yes or no), the decision tree will output bad risk or good risk.
7) How does the random forest algorithm work?
Random forest algorithm is an ensemble learning technique which creates multiple decision trees and trains them on the given data. The final prediction is the mode of all the classes predicted by the decision trees in case of a classification problem, and the average of all the values in case of a regression problem.
8) What are overfitting and underfitting?
Overfitting is a phenomenon where a machine learning algorithm fits too closely to a training dataset, which causes high accuracy/low loss on the training set, but low accuracy/high loss on the test/validation set.
Underfitting is a phenomenon where a machine learning algorithm is not able to learn the structure of the dataset, and hence this causes low accuracy/high loss.
9) How do you prevent overfitting in a decision tree?
There are two ways to prevent overfitting in a decision tree: pre-pruning and post-pruning.
In pre-pruning, we stop the decision tree before it becomes a full-grown tree.
Here are the typical stopping conditions for a node:
Stop if all instances belong to the same class.
Stop if all attribute values are the same.
In post-pruning, we let the decision tree grow to full length, then trim the nodes of the tree in a bottom-up fashion.
10) What is a normal distribution?
Data is often distributed with a bias toward the left or right of the mean. When the data is symmetric around the mean, i.e., more frequent near the mean and less so the further away you go from the mean, then the data follows a normal distribution.
11) What problems can arise from a too high learning rate in a deep neural network, and why?
If the learning rate is too high, the loss can jump around the minima. Since the learning rate decides how big the updates to the weights of the network are, a high learning rate can cause large updates, which in turn cause divergent behavior.
12) While working on a dataset, you find that two variables have negligible Pearson correlation value. Does this mean that the two variables are independent of each other? Why/why not?
Correlation is defined as the relationship between two variables. Positive correlation is when a variable increases, prompting the other variable to increase as well. Similarly, negative correlation is when a variable decreases but the other increases.
Pearson correlation is a measure of linear relationship between two variables. Just because two variables have close to zero Pearson correlation doesn’t mean that they are independent; it just means that there is no linear relationship between them, but a relationship of higher order can still exist.
13) There is a deep neural network with 15 hidden layers, all of which have the same activation function, i.e., tanh. While training, you notice that training loss stays the same. What seems to be the problem and what is the simplest change you can do to tackle this problem?
This is a vanishing gradient problem. Neural networks are usually trained with gradient-based training methods and backpropagation. Essentially, we send each weight an update proportional to partial derivative of the error with respect to the current weight. Sometimes, this update becomes so small as an error is back-propagated that weights don’t change, and hence, training loss stays the same.
To fix this problem, we can reduce the number of layers and replace tanh with ReLU activation function.
14) You are training a simple RNN network to predict the new word in a sentence, but while training, the accuracy is not good. After debugging, you find that the RNN is unable to follow the context beyond a couple of words. Which deep learning model do you use to fix this?
A Recurrent Neural Network (RNN) is a deep neural network with such a structure that they are able to use their internal state to remember the previous inputs. Unlike feedforward neural networks, in RNN, the output from the previous step is fed as input to the current step. This forms a loop allowing information to persist.
RNNs suffer from a problem where they are unable to “remember” a lot of data due to their simple structure. Using something like LSTM (Long Short-Term Memory) can help solve this problem.
LSTMs are a special kind of RNNs capable of learning long-term dependencies.
In LSTM, there is an explicit memory state which can be updated by the LSTM cell using carefully controlled gates. This memory state allows the LSTM to retain the memory much longer, and hence, it can follow the context for a much longer sequence.
15) During training on a binary dataset, you find that the accuracy of your algorithm is high but while analyzing the dataset, you find that there is a very high disbalance between the samples of one class. For example, the positive samples make up almost 99% of the dataset. Can you still trust the accuracy metric? If not, what metric should you use?
When dealing with an imbalanced dataset, accuracy is not the best measure of the performance, because if in this case the algorithm marks all data as positive, it will still get 99% of the answers right, and will thus have 99% accuracy. In such cases, metrics like balanced accuracy, f1 scores are used.
Precision: the value of true positives/(true positives + false positives)
Recall: the value of true positives/(true positives + false negatives)
F1 Score: the harmonic mean of precision and recall
Balanced Accuracy: the average recall of each of the classes in the dataset
16) You are working on a clustering problem and you have a high-dimensional dataset. While training a basic k-means clustering algorithm on the data, you notice that no matter how much you tweak the hyperparameters, the clusters keep changing between runs. Why do you think this is the case? What can you do to overcome this problem?
K-means clustering works on distance-based metrics. While working on a high-dimensional dataset, distance-based metrics is rendered almost useless by the curse of dimensionality since all the data points in the dataset appear to be equidistant due to the large number of dimensions. Hence, in multiple runs, the algorithm is unable to get consistent results. To fix this, you should try applying PCA/t-SNE or some other dimensionality reduction algorithm.
17) While using Stochastic Gradient Descent on an optimization problem with a large number of parameters, you find that the algorithm is oscillating and is unable to achieve the global optima. Why is this? How will you fix it?
One of the biggest cons of Stochastic Gradient Descent (SGD) is that the learning rate is the same for all parameters. So if one of the parameters needs a smaller learning rate and some other parameter needs a larger learning rate, SGD is unable to perform well in such a scenario.
This problem is magnified with the optimization problem that has a lot of parameters, as the probability of such a situation arising is also large. In such cases, you should use algorithms with adaptive learning rate like Adam optimizer.
These are some common issues of hiring data scientists, and you should avoid them as much as possible.
Don’t ask “gotcha questions.”
Gotcha questions are questions that are very hard to solve unless you know a particular technique. These questions may seem tempting, but avoid them since knowing or not knowing the solution to such a question is not a good indicator of the candidate’s knowledge.
Don’t take projects at face value.
Projects are great but now, more than ever, it is easier to do machine learning in just a few lines of code. While this is great news, it also means that some candidates might just use prebuilt Python libraries without really understanding the working of the same. You must ask questions and make the candidate explain the work they did.
Don’t insist that the candidate should know your exact technology stack.
You should probe the candidate for their willingness and ability to learn new techniques and methods, rather than insisting that they know what you have been using. You might find a great candidate who doesn’t know your tech stack but is able to learn it quickly.
The growing amount of data in organizations creates an ever-growing demand for data scientists, data analysts, and data engineers.
With the democratization of machine learning, it’s now easier than ever to gain relevant skills in this field. But there are also challenges in finding the right candidate for your team—as if looking for your very own needle in a haystack.
Our selection of tips and questions can aid in hiring the right data scientist for your team; however, it is meant to be an addition to your overall hiring strategy, as only you know who the right candidate for your team is. This guide is designed to help you with the decision-making process, and we hope that the above questions and tips will help you identify the right candidate for your organization and project.