Balint Kubik, Developer in Berlin, Germany
Balint is available for hire
Hire Balint

Balint Kubik

Verified Expert  in Engineering

Data Engineer and Developer

Location
Berlin, Germany
Toptal Member Since
October 20, 2020

Balint is a versatile senior data engineer who has used cloud-hosted technologies to solve data use cases in the corporate and academic sectors. He has implemented data architectures from scratch, enabled advanced reporting capabilities, and supported data science use cases on AWS, Microsoft Azure, and Google Cloud Platform. Driven to produce well-tested, maintainable software, Balint is experienced in Python and Scala programming languages and several (No)SQL dialects.

Portfolio

12traits
BigQuery, Google Cloud Platform (GCP), Python, Go, Kubernetes, Apache Beam...
Cleverbridge AG (Freelance)
Kubernetes, Apache Airflow, Azure Data Factory, Microsoft Power BI...
Cleverbridge AG
Databricks, Azure Data Lake, Azure Data Factory, Microsoft Azure, Python...

Experience

Availability

Part-time

Preferred Environment

Linux, PyCharm, Visual Studio Code (VS Code)

The most amazing...

...role I've had was driving the build of a large-scale, fully cloud-hosted data warehousing system that hosts eCommerce data for large clients.

Work Experience

Senior Data Engineer

2020 - PRESENT
12traits
  • Streamlined the movement, processing, and transformation of rich, behavioral big data from a number of large clients from the gaming and health industries with over 300 million EUR in revenue.
  • Enabled the performant, scalable access to business-critical KPIs derived from hundreds of GBs of data through back-end APIs.
  • Introduced the usage of modern batch and stream-processing pipelines and workflow scheduling engines in the organization.
Technologies: BigQuery, Google Cloud Platform (GCP), Python, Go, Kubernetes, Apache Beam, Apache Airflow, ETL, Metabase, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Google Cloud, Apache Spark, Data Engineering, Google BigQuery, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)

Senior Data Engineer

2020 - PRESENT
Cleverbridge AG (Freelance)
  • Supported the release of a Microsoft Azure-hosted reporting product to one of the company's top-three enterprise clients, a company generating $300+ million in annual revenue.
  • Assisted with scaling the calculation of exhaustive eCommerce KPIs, which increased speed by approximately 80%.
  • Implemented state-of-the-art security best practices in Microsoft Azure to protect business-sensitive information and share data with external parties.
  • Improved the scalability and monitorability of a large reporting system through consistent QA testing and efficient backfilling mechanisms.
Technologies: Kubernetes, Apache Airflow, Azure Data Factory, Microsoft Power BI, Azure Data Lake, Databricks, SQL, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Amazon Web Services (AWS), Azure SQL, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)

Data Engineer

2017 - 2019
Cleverbridge AG
  • Planned and implemented a data warehousing system for reporting and analytics on Microsoft Azure for enterprise clients that generated $400+ million in aggregate annual revenue.
  • Managed the fully cloud-hosted environment using infrastructure as code (IaC); designed and implemented database schema; and built ETL pipelines for processing granular, eCommerce datasets comprising hundreds of millions of rows of data.
  • Communicated product goals to internal and external stakeholders and managed the backlog of a three-person Agile development team.
Technologies: Databricks, Azure Data Lake, Azure Data Factory, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Apache Spark, Kubernetes, Amazon Web Services (AWS), Azure SQL, Data Engineering, Apache Airflow, ETL, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD), Hadoop

Software Developer

2015 - 2017
Starschema Ltd
  • Automated provisioning and recovery mechanisms of Hadoop and Tableau clusters hosted on AWS for Fortune 500 clients.
  • Deployed image classification for anomaly detection in power plants for one of the largest industrial companies in the world.
  • Implemented a solution to host containerized (Dockerized) Apache Kafka on Apache Mesos.
Technologies: Amazon Web Services (AWS), Image Processing, Microsoft Azure, Tableau, Hadoop, Apache Mesos, Apache Spark, R, Python, Database Management, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Kubernetes, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)

Researcher

2014 - 2017
Hungarian Academy of Sciences
  • Collected, processed, and performed text analysis on large corpora consisting of millions of sentences derived from audio recordings covering more than five days.
  • Presented research results at the International Conference on Computational Social Science in 2018, the largest conference of its type in the world.
  • Mapped the network of pieces of Hungarian legislation using text mining techniques. The findings were published in a scientific publication.
Technologies: Research, Django, Elasticsearch, Python, R, Databases, Data Pipelines

Large-scale, Cloud-hosted Data Warehouse for eCommerce

Drove the planning and implementation of a data warehouse system for reporting and analytics hosted on Microsoft Azure. The product was released to top clients with $400+ million in aggregate annual revenue.

Role:
- Managed the cloud environment (IaC).
- Built ETL pipelines.
- Managed database schema.
- Built standardized reporting dashboards and performed ad hoc analytics.
- Developed Python microservices.
- Calculated eCommerce (subscription) KPIs.
- Communicated with internal and external stakeholders.

Languages

SQL, Python, R, Go

Frameworks

Apache Spark, Hadoop, Django

Tools

Microsoft Power BI, Apache Airflow, Tableau, PyCharm, Apache Mesos, BigQuery, Apache Beam

Paradigms

ETL, Database Development, Test-driven Development (TDD)

Platforms

Docker, Kubernetes, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Visual Studio Code (VS Code), Linux

Storage

Azure SQL, Data Lakes, Data Pipelines, Databases, Database Management, PostgreSQL, Google Cloud, Elasticsearch

Other

Data Analysis, Microsoft Azure, Azure Data Factory, Google BigQuery, Data Engineering, Data Quality, Data Modeling, Reports, Data Warehousing, Data Architecture, DAX, Data Warehouse Design, Azure Data Lake, Security, Data Visualization, Research, Image Processing, Metabase

2014 - 2018

Master's Degree in Economics

Eötvös Lóránd University - Budapest, Hungary

2012 - 2015

Bachelor's Degree in Political Science

Corvinus University of Budapest - Budapest, Hungary

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring