Spark Developer Job Description Template

Apache Spark has become one of the most used frameworks for distributed data processing. Its mature codebase, horizontal scalability, and resilience make it a great tool to process huge amounts of data.

Hire a Top Apache Spark Developer Now

Trusted by leading brands and startups

Spark’s great power and flexibility requires a developer that does not only know the Spark API well: They must also know about the pitfalls of distributed storage, how to structure a data processing pipeline that has to handle the 5V of Big Data—volume, velocity, variety, veracity, and value—and how to turn that into maintainable code.

Spark Developer - Job Description and Ad Template

Copy this template, and modify it as your own:

Company Introduction

{{ Write a short and catchy paragraph about your company. Make sure to provide information about the company’s culture, perks, and benefits. Mention office hours, remote working possibilities, and everything else that you think makes your company interesting. }}

Job Description

We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster.

You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts.

This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.

Responsibilities

Create Scala/Spark jobs for data transformation and aggregation
Produce unit tests for Spark transformations and helper methods
Write Scaladoc-style documentation with all code
Design data processing pipelines

Skills

Scala (with a focus on the functional programming paradigm)
Scalatest, JUnit, Mockito {{ , Embedded Cassandra }}
Apache Spark 2.x
{{ Apache Spark RDD API }}
{{ Apache Spark SQL DataFrame API }}
{{ Apache Spark MLlib API }}
{{ Apache Spark GraphX API }}
{{ Apache Spark Streaming API }}
Spark query tuning and performance optimization
SQL database integration {{ Microsoft, Oracle, Postgres, and/or MySQL }}
Experience working with {{ HDFS, S3, Cassandra, and/or DynamoDB }}
Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)