Balint Kubik, Data Engineer and Developer in Berlin, Germany
Balint Kubik

Data Engineer and Developer in Berlin, Germany

Member since October 20, 2020
Balint is a versatile senior data engineer who has used cloud-hosted technologies to solve data use cases in the corporate and academic sectors. He has implemented data architectures from scratch, enabled advanced reporting capabilities, and supported data science use cases on AWS, Microsoft Azure, and Google Cloud Platform. Driven to produce well-tested, maintainable software, Balint is experienced in Python and Scala programming languages and several (No)SQL dialects.
Balint is now available for hire

Portfolio

  • 12traits
    BigQuery, Google Cloud Platform (GCP), Python, Go, Kubernetes, Apache Beam...
  • Cleverbridge AG (Freelance)
    Kubernetes, Apache Airflow, Azure Data Factory, Microsoft Power BI...
  • Cleverbridge AG
    Databricks, Azure Data Lake, Azure Data Factory, Microsoft Azure, Python...

Experience

Location

Berlin, Germany

Availability

Part-time

Preferred Environment

Linux, PyCharm, Visual Studio Code

The most amazing...

...role I've had was driving the build of a large-scale, fully cloud-hosted data warehousing system that hosts eCommerce data for large clients.

Employment

  • Senior Data Engineer

    2020 - PRESENT
    12traits
    • Streamlined the movement, processing, and transformation of rich, behavioral big data from a number of large clients from the gaming and health industries with over 300 million EUR in revenue.
    • Enabled the performant, scalable access to business-critical KPIs derived from hundreds of GBs of data through back-end APIs.
    • Introduced the usage of modern batch and stream-processing pipelines and workflow scheduling engines in the organization.
    Technologies: BigQuery, Google Cloud Platform (GCP), Python, Go, Kubernetes, Apache Beam, Apache Airflow, ETL, Metabase, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Google Cloud, Apache Spark, Data Engineering, Google BigQuery, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)
  • Senior Data Engineer

    2020 - PRESENT
    Cleverbridge AG (Freelance)
    • Supported the release of a Microsoft Azure-hosted reporting product to one of the company's top-three enterprise clients, a company generating $300+ million in annual revenue.
    • Assisted with scaling the calculation of exhaustive eCommerce KPIs, which increased speed by approximately 80%.
    • Implemented state-of-the-art security best practices in Microsoft Azure to protect business-sensitive information and share data with external parties.
    • Improved the scalability and monitorability of a large reporting system through consistent QA testing and efficient backfilling mechanisms.
    Technologies: Kubernetes, Apache Airflow, Azure Data Factory, Microsoft Power BI, Azure Data Lake, Databricks, SQL, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehousing, Data Warehouse Design, Databases, Amazon Web Services (AWS), Azure SQL, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)
  • Data Engineer

    2017 - 2019
    Cleverbridge AG
    • Planned and implemented a data warehousing system for reporting and analytics on Microsoft Azure for enterprise clients that generated $400+ million in aggregate annual revenue.
    • Managed the fully cloud-hosted environment using infrastructure as code (IaC); designed and implemented database schema; and built ETL pipelines for processing granular, eCommerce datasets comprising hundreds of millions of rows of data.
    • Communicated product goals to internal and external stakeholders and managed the backlog of a three-person Agile development team.
    Technologies: Databricks, Azure Data Lake, Azure Data Factory, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Apache Spark, Kubernetes, Amazon Web Services (AWS), Azure SQL, Data Engineering, Apache Airflow, ETL, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD), Hadoop
  • Software Developer

    2015 - 2017
    Starschema Ltd
    • Automated provisioning and recovery mechanisms of Hadoop and Tableau clusters hosted on AWS for Fortune 500 clients.
    • Deployed image classification for anomaly detection in power plants for one of the largest industrial companies in the world.
    • Implemented a solution to host containerized (Dockerized) Apache Kafka on Apache Mesos.
    Technologies: Amazon Web Services (AWS), Image Processing, Microsoft Azure, Tableau, Hadoop, Apache Mesos, Apache Spark, R, Python, Database Management, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Kubernetes, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)
  • Researcher

    2014 - 2017
    Hungarian Academy of Sciences
    • Collected, processed, and performed text analysis on large corpora consisting of millions of sentences derived from audio recordings covering more than five days.
    • Presented research results at the International Conference on Computational Social Science in 2018, the largest conference of its type in the world.
    • Mapped the network of pieces of Hungarian legislation using text mining techniques. The findings were published in a scientific publication.
    Technologies: Research, Django, Elasticsearch, Python, R, Databases, Data Pipelines

Experience

  • Large-scale, Cloud-hosted Data Warehouse for eCommerce

    Drove the planning and implementation of a data warehouse system for reporting and analytics hosted on Microsoft Azure. The product was released to top clients with $400+ million in aggregate annual revenue.

    Role:
    - Managed the cloud environment (IaC).
    - Built ETL pipelines.
    - Managed database schema.
    - Built standardized reporting dashboards and performed ad hoc analytics.
    - Developed Python microservices.
    - Calculated eCommerce (subscription) KPIs.
    - Communicated with internal and external stakeholders.

Skills

  • Languages

    SQL, Python, R, Go
  • Frameworks

    Apache Spark, Hadoop, Django
  • Tools

    Microsoft Power BI, Apache Airflow, Tableau, PyCharm, BigQuery, Apache Beam
  • Paradigms

    ETL, Database Development, Test-driven Development (TDD)
  • Platforms

    Docker, Kubernetes, Amazon Web Services (AWS), Databricks, Google Cloud Platform (GCP), Visual Studio Code, Linux
  • Storage

    Azure SQL, Data Lakes, Data Pipelines, Databases, Database Management, PostgreSQL, Google Cloud, Elasticsearch
  • Other

    Data Analysis, Microsoft Azure, Azure Data Factory, Google BigQuery, Data Engineering, Data Quality, Data Modeling, Reports, Data Warehousing, Data Architecture, DAX, Data Warehouse Design, Azure Data Lake, Data Visualization, Apache Mesos, Research, Image Processing, Metabase
  • Industry Expertise

    Security

Education

  • Master's Degree in Economics
    2014 - 2018
    Eötvös Lóránd University - Budapest, Hungary
  • Bachelor's Degree in Political Science
    2012 - 2015
    Corvinus University of Budapest - Budapest, Hungary

To view more profiles

Join Toptal
Share it with others