The Vital Guide to Big Data Interviewing

Big data is an extremely broad domain, typically addressed by a hybrid team of data scientists, software engineers, and statisticians. Real expertise in big data therefore requires far more than learning the ins and outs of a particular technology. This guide offers a sampling of effective questions to help evaluate the breadth and depth of a candidate's mastery of this complex domain.

Hire a top Data Scientist now.
Toptal is a marketplace for top Data Scientists. Top companies and start-ups choose Toptal freelancers for their mission critical software projects.
Full
profile
Carl DunhamUnited States
Carl has a lifelong passion for building software, systems, and teams. He caught the bug in college, began working with a startup the summer before graduation, and hasn't stopped since. He loves learning new languages and technologies, and most of all, he like using them to build large, interesting things.
[click to continue…]
Data SciencePythonDjango RESTAgile Software DevelopmentMySQL
Hire
Full
profile
Richard RozsaNetherlands
Richard Rozsa offers a vision of data as a self formatting entity. For more than 30 years, he's delivered top quality technical architecture, programming, testing and solutions for complex problems--on-time and within budget. He's extremely flexible and able to integrate as a standalone freelancer or within teams.
[click to continue…]
Data ScienceSQLC++CC#ASP.NET MVC.NETASP.NETjQueryMS Visual StudioVisual Studio 2008SQL Server Management StudioMicrosoft Visual C++
Hire
Full
profile
George-Bogdan IvanovRomania
George-Bogdan is a software engineer and aspiring entrepreneur with a passion for artificial intelligence, natural language processing, and related areas like machine learning and data mining. He loves developing adaptive and smart web applications making use of intelligent algorithms.
[click to continue…]
Data SciencePython
Hire
Full
profile
Victor TyutyunovUkraine
Victor is a successful lead engineer with extensive experience in the development of high-loaded distributed systems, data processing, and data analysis.
[click to continue…]
Data ScienceC++JavaScriptNode.jsPostgreSQLMySQLBack-end Development
Hire
Full
profile
Mojmir VinklerCzech Republic
Mojmir is a full-stack data scientist with strong development skills, allowing him to handle model design, data collection, and final implementation of software. He has a strong background in statistics, machine learning, business, computer science, and predictive modeling of big data sets.
[click to continue…]
Data SciencePythonMachine Learning
Hire
Full
profile
Benjamin HopferMalta
Benjamin is an algorithmic problem solver with a strong background in C# and C++. His university education emphasized Computer Graphics and Computer Vision. He is looking to strengthen his presence in other areas, especially ASP.NET and Android development.
[click to continue…]
Data ScienceC++C#.NETMicrosoft Visual StudioAndroidWindows
Hire
Full
profile
Yaakov BelchIsrael
Yaakov is a top engineer with proven ability to develop efficient, scalable, and fault-tolerant full-stack solutions for complex problems. He has extensive experience and skills with all levels of software and architecture. He has a PhD from Cambridge University. Yaakov is developing the Yomo library (yomojs.com) that combines Redux and Mobx on both client and server.
[click to continue…]
Data ScienceCoffeeScriptPerlJavaScriptMobXReact.jsNode.jsBack-end Development
Hire
Full
profile
Peter James RowAustralia
Peter has 5 years of Python experience in a scientific research environment. He has also been coding for iOS for 1 year, and has made two personal apps on the App Store: a physics-based game using Cocos2D (the predecessor to SpriteKit) and an educational app.
[click to continue…]
Data SciencePython
Hire

A Data Scientist is someone who makes value out of data. Such a person proactively fetches information from various sources and analyzes it for better understanding about how the business performs, and to build AI tools that automate certain processes within the company.

There are many definitions of this job, and it is sometimes mixed with the Big Data Engineer occupation. A Data Scientist or Engineer may be X% scientist, Y% software engineer, and Z% hacker, which is why the definition of the job becomes convulted. The actual ratios vary depending on the skills required and type of job. Usually, it’s considered normal to bring people with different sets of skills into the data science team.

Data Scientist duties typically include creating various Machine Learning-based tools or processes within the company, such as recommendation engines or automated lead scoring systems. People within this role should also be able to perform statistical analysis.

In this article, we present a sample Data Scientist job description, for you to adjust depending on your actual needs to create a perfect job advertisement, and to find the person that will help you get the answers you are looking for.

Data Scientist - Job Description and Ad Template

Company Introduction

{{Write a short and catchy paragraph about your company. Make sure to provide information about the company culture, perks, and benefits. Mention office hours, remote working possibilities, and everything else you think makes your company interesting. Data Scientists like to take challenges - anything that shows how the role could make an impact might help attract top talent.}}

Job Description

We are looking for a Data Scientist that will help us discover the information hidden in vast amounts of data, and help us make smarter decisions to deliver even better products. Your primary focus will be in applying data mining techniques, doing statistical analysis, and building high quality prediction systems integrated with our products. {{Depending on your needs, you can write very specific requirements here, like: “automate scoring using machine learning techniques”, “build recommendation systems”, “improve and extend the features used by our existing classifier”, “develop internal A/B testing procedures”, “build system for automated fraud detection”, etc.}}

Responsibilities

  • Selecting features, building and optimizing classifiers using machine learning techniques
  • Data mining using state-of-the-art methods
  • Extending company’s data with third party sources of information when needed
  • Enhancing data collection procedures to include information that is relevant for building analytic systems
  • Processing, cleansing, and verifying the integrity of data used for analysis
  • Doing ad-hoc analysis and presenting results in a clear manner
  • Creating automated anomaly detection systems and constant tracking of its performance
  • {{Select from the above and add other responsibilities that are relevant}}

Skills and Qualifications

  • Excellent understanding of machine learning techniques and algorithms, such as k-NN, Naive Bayes, SVM, Decision Forests, etc.
  • Experience with common data science toolkits, such as R, Weka, NumPy, MatLab, etc {{depending on specific project requirements}}. Excellence in at least one of these is highly desirable
  • Great communication skills
  • Experience with data visualisation tools, such as D3.js, GGplot, etc.
  • Proficiency in using query languages such as SQL, Hive, Pig {{actual list depends on what you are currently using in your company}}
  • Experience with NoSQL databases, such as MongoDB, Cassandra, HBase {{depending on project needs}}
  • Good applied statistics skills, such as distributions, statistical testing, regression, etc.
  • Good scripting and programming skills {{if you expect that the person in this role will integrate the solution within the base application, list any programming languages and core frameworks currently being used}}
  • Data-oriented personality
  • {{Mention any other technology that such person is going to commonly work with within the organization}}
  • {{List education level or certification you require}}
Hire Data Scientists now
  • Trusted by: