Spark Developer Job Description Template
Apache Spark has become one of the most used frameworks for distributed data processing. Its mature codebase, horizontal scalability, and resilience make it a great tool to process huge amounts of data.
Apache Spark has become one of the most used frameworks for distributed data processing. Its mature codebase, horizontal scalability, and resilience make it a great tool to process huge amounts of data.
Spark’s great power and flexibility requires a developer that does not only know the Spark API well: They must also know about the pitfalls of distributed storage, how to structure a data processing pipeline that has to handle the 5V of Big Data—volume, velocity, variety, veracity, and value—and how to turn that into maintainable code.
Spark Developer - Job Description and Ad Template
Copy this template, and modify it as your own:
Company Introduction
{{ Write a short and catchy paragraph about your company. Make sure to provide information about the company’s culture, perks, and benefits. Mention office hours, remote working possibilities, and everything else that you think makes your company interesting. }}
Job Description
We are looking for a Spark developer who knows how to fully exploit the potential of our Spark cluster.
You will clean, transform, and analyze vast amounts of raw data from various systems using Spark to provide ready-to-use data to our feature developers and business analysts.
This involves both ad-hoc requests as well as data pipelines that are embedded in our production environment.
Responsibilities
- Create Scala/Spark jobs for data transformation and aggregation
- Produce unit tests for Spark transformations and helper methods
- Write Scaladoc-style documentation with all code
- Design data processing pipelines
Skills
- Scala (with a focus on the functional programming paradigm)
- Scalatest, JUnit, Mockito {{ , Embedded Cassandra }}
- Apache Spark 2.x
- {{ Apache Spark RDD API }}
- {{ Apache Spark SQL DataFrame API }}
- {{ Apache Spark MLlib API }}
- {{ Apache Spark GraphX API }}
- {{ Apache Spark Streaming API }}
- Spark query tuning and performance optimization
- SQL database integration {{ Microsoft, Oracle, Postgres, and/or MySQL }}
- Experience working with {{ HDFS, S3, Cassandra, and/or DynamoDB }}
- Deep understanding of distributed systems (e.g. CAP theorem, partitioning, replication, consistency, and consensus)
Recent Spark Articles by Toptal Engineers
Introduction to Apache Spark With Examples and Use Cases
Apache Spark Streaming Tutorial: Identifying Trending Twitter Hashtags
How I Used Apache Spark and Docker in a Hackathon to Build a Weather App
Will JS Frameworks Spark a Front-end Revolution?
Python vs. R: Syntactic Sugar Magic
Big Data Architecture for the Masses: A ksqlDB and Kubernetes Tutorial
Find the right Spark interview questions
Read a list of great community-driven Spark interview questions.
Read them, comment on them, or even contribute your own.
Hire a Top Apache Spark Developer Now
Toptal is a marketplace for top Apache Spark developers, engineers, programmers, coders, architects, and consultants. Top companies and startups choose Toptal Apache Spark freelancers for their mission-critical software projects.
See Their ProfilesMateusz Cieślak
Mateusz is an experienced data engineer with more than 20 projects delivered in areas of data analytics and IT implementations. He is an expert in big data technologies (Hadoop, Python, Apache Spark, Azure) and SQL (T-SQL) and is known for building high-performing ETL/ELT data pipelines. At PwC, Mateusz developed a large volume data mart to profile over 300 million citizens by more than 500 variables and an analytics engine to recommend optimal promotional actions with more than 20,000 products.
Show MoreSebastian Brestin
Since 2012, Sebastian has been developing distributed systems for various platforms ranging from Solaris, IBM AIX, HP-UX to Linux and Windows. He's worked with various technologies such as Apache Spark, Elasticsearch, PostgreSQL, RabbitMQ, Django, and Celery to build data-intensive scalable software. Sebastian is passionate about delivering high-quality solutions and is extremely interested in big data challenges.
Show MoreRadek Ostrowski
Radek is a certified Toptal blockchain engineer particularly interested in Ethereum and smart contracts. In the fiat world, he is experienced in big data and machine learning projects. He is a triple winner in two different international IBM Apache Spark competitions, co-creator of PlayStation 4's back end, a successful hackathon competitor, and a speaker at conferences in Australia, Poland, and Serbia.
Show MoreDiscover More Apache Spark Developers in the Toptal Network
Start HiringToptal Connects the Top 3% of Freelance Talent All Over The World.
Join the Toptal community.