Peter Jakubčo, Developer in Prague, Czech Republic
Peter is available for hire
Hire Peter

Peter Jakubčo

Verified Expert  in Engineering

Scala Developer

Location
Prague, Czech Republic
Toptal Member Since
June 8, 2020

Peter is a skilled software craftsman with more than nine years of experience. Top areas of his talent include big data engineering, ETL, and building complex data pipelines. The top pick he implemented was a big data platform, aggregating various metrics from clickstream of 100 million users (cca 2 billion clicks a day). His strong will toward project success has been applied in core customer-facing projects and internal support tools. Peter enjoys working either individually or in small teams.

Portfolio

OpenBean
SBT, Scala, Akka, Akka HTTP, Amazon Web Services (AWS), Amazon S3 (AWS S3)...
Jumpshot
Azkaban, Cluster, Jupyter Notebook, Gradle, REST, Functional Programming...
ZOOM International, s.r.o.
Session Initiation Protocol (SIP), REST, Apache Maven, IntelliJ IDEA, Python...

Experience

Availability

Part-time

Preferred Environment

Scala, Linux, Git

The most amazing...

...thing I've developed was a big data platform showing eCommerce insights, various metrics from click stream of 100 million anonymous users.

Work Experience

Senior Software Engineer

2021 - PRESENT
OpenBean
  • Standardized several types of workflows of EMR clusters throughout the company, including implementation of sophisticated tools for comfortable management of the clusters - scripts, Python library, and Airflow DAGs (e.g. terminate inactive clusters).
  • Implemented HTTP service for clients requesting EMR cluster and step states. Based on event-sourcing architecture. The goal is to avoid limitations put on calling AWS EMR API (ThrottlingExceptions); be fast and responsive.
  • Implemented custom Airflow plugin for working with EMR clusters: operator like find-or-start cluster, a sensor for EMR cluster and steps based on mentioned HTTP service, and more.
  • Implemented a command-proxy Airflow DAG, which performs various actions based on input and responds into an SQS queue.
Technologies: SBT, Scala, Akka, Akka HTTP, Amazon Web Services (AWS), Amazon S3 (AWS S3), Amazon Elastic MapReduce (EMR), Amazon EventBridge, Amazon Managed Workflows for Apache Airflow (MWAA), Apache Airflow, Bash, Python, Git, Pulumi, Amazon EKS, Amazon Simple Queue Service (SQS), Apache Spark

Senior Software Engineer

2017 - 2020
Jumpshot
  • Implemented a big data platform used for comparing eCommerce activities of various brands and domains, based on anonymous clickstream from 100 million users daily.
  • Used a custom computation cluster provided by Avast and Jumpshot. The cluster consisted of two data centers in the Czech Republic, each of 250 nodes: 12,000 cores, 100TB of RAM, and 10PB of disk space. Data was stored in HDFS or Amazon S3.
  • Implemented a big data platform storing various meta-information found in analytic calls (Google Analytics, Facebook Analytics). Included parsing and text normalization.
  • Implemented REST back-end service for the Insights platform.
  • Implemented automated unit tests of big data computations for the Insights data platform using Cucumber, Scala, and Apache Spark.
  • Implemented a back-end REST service and Apache Spark job for converting and storing data in Apache Parquet format into MongoDB, and a custom "file-based" database (a "dumper").
Technologies: Azkaban, Cluster, Jupyter Notebook, Gradle, REST, Functional Programming, Concurrent Programming, Akka Actors, Cucumber, Data Analytics, Git, Big Data, ETL, Data Engineering, Spark SQL, Akka HTTP, MongoDB, Apache Hive, Cloud, Apache Spark, Scala, Akka

Senior Software Engineer

2011 - 2017
ZOOM International, s.r.o.
  • Implemented a live screen monitoring real-time player of remote video of computer desktops. Those were recorded during active phone calls of agents in call centers (5,000+ agents).
  • Maintained a screen recording service of computer desktops, a Windows service communicating with the back-end server using custom TCP protocol.
  • Maintained call recording software using VoIP protocols like SIP, SDP, RTP, or RTCP.
  • Maintained integrations of various call and agent meta-information management platforms like UCCE or UCCX.
  • Implemented a service for parsing and processing call data records (CDRs) used mainly for the reconstruction of call scenarios.
  • Implemented a parametrized call generator (metadata, audio, and video) used for preparing datasets during automated QA.
  • Migrated the whole platform for call and video recording from CentOS 6 to CentOS 7, especially custom bash startup scripts and configuration to systemd.
  • Implemented monitoring of the call recording platform using bash, Nagios, and SNMP protocol.
Technologies: Session Initiation Protocol (SIP), REST, Apache Maven, IntelliJ IDEA, Python, Concurrent Programming, PostgreSQL, Java 8, Git, Multithreading, SQL, Bash, Linux, C++, Java

ETL of Google and Facebook Analytics

Created a data platform for parsing and structuring Google and Facebook analytic calls to better understand the market and various competitors that use these technologies. The platform was used to augment metadata to more general Insight platform. In production, the platform could handle around 200 million PII-stripped URLs per day. The platform has been written in Scala + Apache Spark.

eCommerce Insights

Jumpshot Insights was a big data platform providing a comparison of various eCommerce activities of different brands over the world, including big players like Amazon, eBay, Walmart, and others. Those activities were found in the anonymous clicks of 100 million users per day. The most useful metrics included a basic funnel (number of sessions for visits, interactions, add to carts, start checkouts, or conversions on a domain), cross-shopping sessions (sessions in which when I buy brand X I buy also brand Y, etc.), channel performance (how many referrals are coming from different "channels" like email, social, or ads), referring domains/keywords, average time to conversion, and more.

EmuStudio

https://www.emustudio.net/
A free computer emulation platform and framework used in the academic area and for enthusiasts. Computer components are plugins, which can be formed in computer schemas and then emulated. Also, it includes a source code editor and various compilers. Among emulated computers can be found MITS Altair8800, SSEM machine, and abstract machines (brainf*ck, RAM, RASP). The project includes thorough documentation and a website.

IntelliJ Cucumber+Scala Plugin

https://github.com/vbmacher/intellij-cucumber-scala
A plugin to IntelliJ IDEA, enabling navigation between step definitions (a developer writes in Scala) and Gherkin steps. Besides navigation, it allows finding usages of step definitions, automation of step definition creation (template wizard), and more.

It is an open-source project, used monthly by ~2000 people.

Languages

Scala, Java 8, Java, SQL, Python, C++, Bash, HTML

Frameworks

Apache Spark, Hadoop, Akka, Bootstrap

Tools

Spark SQL, Git, IntelliJ IDEA, Gradle, Apache Maven, Cucumber, Amazon Elastic MapReduce (EMR), Cluster, i3, Amazon Athena, SBT, Apache Airflow, Amazon EKS, Amazon Simple Queue Service (SQS)

Paradigms

ETL, Automation, Functional Programming, Concurrent Programming, REST

Platforms

Linux, Jupyter Notebook, Docker, Amazon Web Services (AWS)

Storage

MongoDB, Data Pipelines, Apache Hive, PostgreSQL, Azkaban, Amazon S3 (AWS S3)

Other

Akka HTTP, Akka Actors, Data Engineering, Data Warehousing, Big Data, SOAP, Multithreading, Data Warehouse Design, IntelliJ SDK, VoIP, Session Initiation Protocol (SIP), Data Analytics, Cloud, Amazon EventBridge, Amazon Managed Workflows for Apache Airflow (MWAA), Pulumi, EMR

2009 - 2011

Ph.D. in Computer Science

Technical University of Košice - Košice, Slovakia

2004 - 2009

Master of Science Degree in Computer Science

Technical University of Košice - Košice, Slovakia

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring