Peter Jakubčo
Verified Expert in Engineering
Scala Developer
Peter is a skilled software craftsman with more than nine years of experience. Top areas of his talent include big data engineering, ETL, and building complex data pipelines. The top pick he implemented was a big data platform, aggregating various metrics from clickstream of 100 million users (cca 2 billion clicks a day). His strong will toward project success has been applied in core customer-facing projects and internal support tools. Peter enjoys working either individually or in small teams.
Portfolio
Experience
Availability
Preferred Environment
Scala, Linux, Git
The most amazing...
...thing I've developed was a big data platform showing eCommerce insights, various metrics from click stream of 100 million anonymous users.
Work Experience
Senior Software Engineer
OpenBean
- Standardized several types of workflows of EMR clusters throughout the company, including implementation of sophisticated tools for comfortable management of the clusters - scripts, Python library, and Airflow DAGs (e.g. terminate inactive clusters).
- Implemented HTTP service for clients requesting EMR cluster and step states. Based on event-sourcing architecture. The goal is to avoid limitations put on calling AWS EMR API (ThrottlingExceptions); be fast and responsive.
- Implemented custom Airflow plugin for working with EMR clusters: operator like find-or-start cluster, a sensor for EMR cluster and steps based on mentioned HTTP service, and more.
- Implemented a command-proxy Airflow DAG, which performs various actions based on input and responds into an SQS queue.
Senior Software Engineer
Jumpshot
- Implemented a big data platform used for comparing eCommerce activities of various brands and domains, based on anonymous clickstream from 100 million users daily.
- Used a custom computation cluster provided by Avast and Jumpshot. The cluster consisted of two data centers in the Czech Republic, each of 250 nodes: 12,000 cores, 100TB of RAM, and 10PB of disk space. Data was stored in HDFS or Amazon S3.
- Implemented a big data platform storing various meta-information found in analytic calls (Google Analytics, Facebook Analytics). Included parsing and text normalization.
- Implemented REST back-end service for the Insights platform.
- Implemented automated unit tests of big data computations for the Insights data platform using Cucumber, Scala, and Apache Spark.
- Implemented a back-end REST service and Apache Spark job for converting and storing data in Apache Parquet format into MongoDB, and a custom "file-based" database (a "dumper").
Senior Software Engineer
ZOOM International, s.r.o.
- Implemented a live screen monitoring real-time player of remote video of computer desktops. Those were recorded during active phone calls of agents in call centers (5,000+ agents).
- Maintained a screen recording service of computer desktops, a Windows service communicating with the back-end server using custom TCP protocol.
- Maintained call recording software using VoIP protocols like SIP, SDP, RTP, or RTCP.
- Maintained integrations of various call and agent meta-information management platforms like UCCE or UCCX.
- Implemented a service for parsing and processing call data records (CDRs) used mainly for the reconstruction of call scenarios.
- Implemented a parametrized call generator (metadata, audio, and video) used for preparing datasets during automated QA.
- Migrated the whole platform for call and video recording from CentOS 6 to CentOS 7, especially custom bash startup scripts and configuration to systemd.
- Implemented monitoring of the call recording platform using bash, Nagios, and SNMP protocol.
Experience
ETL of Google and Facebook Analytics
eCommerce Insights
EmuStudio
https://www.emustudio.net/IntelliJ Cucumber+Scala Plugin
https://github.com/vbmacher/intellij-cucumber-scalaIt is an open-source project, used monthly by ~2000 people.
Skills
Languages
Scala, Java 8, Java, SQL, Python, C++, Bash, HTML
Frameworks
Apache Spark, Hadoop, Akka, Bootstrap
Tools
Spark SQL, Git, IntelliJ IDEA, Gradle, Apache Maven, Cucumber, Amazon Elastic MapReduce (EMR), Cluster, i3, Amazon Athena, SBT, Apache Airflow, Amazon EKS, Amazon Simple Queue Service (SQS)
Paradigms
ETL, Automation, Functional Programming, Concurrent Programming, REST
Platforms
Linux, Jupyter Notebook, Docker, Amazon Web Services (AWS)
Storage
MongoDB, Data Pipelines, Apache Hive, PostgreSQL, Azkaban, Amazon S3 (AWS S3)
Other
Akka HTTP, Akka Actors, Data Engineering, Data Warehousing, Big Data, SOAP, Multithreading, Data Warehouse Design, IntelliJ SDK, VoIP, Session Initiation Protocol (SIP), Data Analytics, Cloud, Amazon EventBridge, Amazon Managed Workflows for Apache Airflow (MWAA), Pulumi, EMR
Education
Ph.D. in Computer Science
Technical University of Košice - Košice, Slovakia
Master of Science Degree in Computer Science
Technical University of Košice - Košice, Slovakia
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring