The Vital Guide to Big Data Interviewing

Big data is an extremely broad domain, typically addressed by a hybrid team of data scientists, software engineers, and statisticians. Real expertise in big data therefore requires far more than learning the ins and outs of a particular technology. This guide offers a sampling of effective questions to help evaluate the breadth and depth of a candidate's mastery of this complex domain.

Hire a top Big Data architect now.
Toptal is a marketplace for top Big Data architects. Top companies and start-ups choose Toptal freelancers for their mission critical software projects.
Full
profile
Reuben FirminUnited States
Reuben is an experienced software architect and engineer with significant technical and project management experience. He boasts expertise in big data, data warehousing, and scalable and distributed applications. He excels with Java, relational and NoSQL databases, and web technologies.
[click to continue…]
Big DataJavaAgile Software DevelopmentPostgreSQL
Hire
Full
profile
Bryce OttUnited States
With more than 13 years working as an engineer, architect, director, vice president, and chief technology officer, Bryce brings a deep understanding of enterprise software, management, and technical strategy to any project. His specialties include real-time systems, business intelligence, big data, enterprise web apps, scalability, and open-source software.
[click to continue…]
Big DataCSSPHPVisual BasicSQLJavaScriptJavaHTML5Zend Framework 2Yii FrameworkTwitter BootstrapCakePHPAngularJS
Hire
Full
profile
Ioana GrigoropolUnited Kingdom
Ioana is an experienced J2SE software engineer with a Master of Science degree in Artificial Intelligence. She is passionate about natural language processing and semantic analysis, analytic by nature, and a dedicated problem solver.
[click to continue…]
Big DataJava
Hire
Full
profile
Mark Wong-VanHarenSpain
Mark is an entrepreneur, engineer, CTO, and artisan with decades of startup experience, including co-founding Excite.com. He makes complex problems simple with expressive, maintainable code. He believes in building small, well-tested, functional pieces, loosely joined by a well-documented contract.
[click to continue…]
Big DataSwiftClojurePythonRubyJavaScriptHTML5CSSOCamlCoffeeScriptRuby on RailsAndroid SDKjQuery
Hire
Full
profile
Valentin GolevRussia
Valentin is an experienced developer working in a variety of areas including browser graphics, device drivers, large scale deployment, data gathering, and processing systems. He takes pride in solving tasks quickly, cleanly, and reliably. He's excited about working on APIs and reliable tools.
[click to continue…]
Big DataJavaScriptPythonDjangojQueryAmazon API
Hire
Full
profile
Raul GuiuSpain
Raul is a software developer with over 15 years of commercial experience. He is technology agnostic, and is enthusiastic about keeping up to date with the latest trends and investigating their possible applications to business problems.
[click to continue…]
Big DataJavaRubySpring
Hire
Full
profile
Pablo LalloniArgentina
Pablo is an architect and developer with extensive experience in a wide range of techniques and technologies and a strong ability to understand and solve problems efficiently while keeping in mind the big picture. He consistently achieves very high quality and has successfully led several projects with small teams.
[click to continue…]
Big DataGoJavaScriptJavaScalaHadoopHibernateExt JSScala IDEJava EEJava SE
Hire
Full
profile
Nikolay DyankovBulgaria
Nikolay is a software engineer with 6 years of systems programming experience and excellent knowledge in applied Python and web technologies. He has been a successful member of big software development organizations as well as an individual developer or team leader for smaller software projects.
[click to continue…]
Big DataPython
Hire
Full
profile
Konstantin KanishchevFrance
Konstantin is a Theoretical Physicist with a strong background in C++, Python, and JavaScript programming. With deep experience in research-level software development, heavy data analysis (WLCG), and data visualization (d3.js), he provides high-level expertise in Physics, CS, and Applied Mathematics.
[click to continue…]
Big DataC++JavaScript
Hire

A Big Data Engineer is a person who creates and manages a company’s Big Data infrastructure and tools, and is someone that knows how to get results from vast amounts of data quickly.

The actual definition of this role varies, and often mixes with the Data Scientist role. Here, we will assume that it is a role focused on engineering, without statistics and strong machine learning skills required.

The world of Big Data has grown significantly during the last decade; therefore, the skills started to be more specific. While in the majority of cases it is built around Hadoop, there are many tools that have become very significant on their own. We have covered some common cases in the following sample description.

Big Data Engineer - Job Description and Ad Template

Company Introduction

{{Write a short and catchy paragraph about your company. Make sure to provide information about the company culture, perks, and benefits. Mention office hours, remote working possibilities, and everything else you think makes your company interesting. Big Data Engineers like to work on huge problems - mentioning the scale (or the potential) can help gain the attention of top talent.}}

Job Description

We are looking for a Big Data Engineer that will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them. You will also be responsible for integrating them with the architecture used across the company.

Responsibilities

  • Selecting and integrating any Big Data tools and frameworks required to provide requested capabilities
  • Implementing ETL process {{if importing data from existing data sources is relevant}}
  • Monitoring performance and advising any necessary infrastructure changes
  • Defining data retention policies
  • {{Add any other responsibility that is relevant}}

Skills and Qualifications

  • Proficient understanding of distributed computing principles
  • Management of Hadoop cluster, with all included services {{unless you are going to have specific Big Data DevOps roles for this}}
  • Ability to solve any ongoing issues with operating the cluster {{unless you are going to have specific Big Data DevOps roles for this}}
  • Proficiency with Hadoop v2, MapReduce, HDFS
  • Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming {{if stream-processing is relevant for the role}}
  • Good knowledge of Big Data querying tools, such as Pig, Hive, and Impala
  • Experience with Spark {{if you are including or planning to include it}}
  • Experience with integration of data from multiple data sources
  • Experience with NoSQL databases, such as HBase, Cassandra, MongoDB
  • Knowledge of various ETL techniques and frameworks, such as Flume
  • Experience with various messaging systems, such as Kafka or RabbitMQ
  • Experience with Big Data ML toolkits, such as Mahout, SparkML, or H2O {{if you are going to integrate Machine Learning in your Big Data infrastructure}}
  • Good understanding of Lambda Architecture, along with its advantages and drawbacks
  • Experience with Cloudera/MapR/Hortonworks {{you can specify the distribution you are currently using or planning to use here}}
  • {{List any other technologies you are using or planning to use. Most Big Data Engineers will know some of the ones listed here: The Hadoop Ecosystem Table}}
  • {{List education level or certification you require}}
Hire Big Data architects now

Recent Big Data Articles by Toptal Engineers

  • Trusted by: