Senior Data Engineer
2020 - PRESENT12traits- Streamlined the movement, processing, and transformation of rich, behavioral big data from a number of large clients from the gaming and health industries with over 300 million EUR in revenue.
- Enabled the performant, scalable access to business-critical KPIs derived from hundreds of GBs of data through back-end APIs.
- Introduced the usage of modern batch and stream-processing pipelines and workflow scheduling engines in the organization.
Technologies: BigQuery, Google Cloud Platform (GCP), Python, Go, Kubernetes, Apache Beam, Apache Airflow, ETL, Metabase, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Google Cloud, Apache Spark, Data Engineering, Google BigQuery, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)Senior Data Engineer
2020 - PRESENTCleverbridge AG (Freelance)- Supported the release of a Microsoft Azure-hosted reporting product to one of the company's top-three enterprise clients, a company generating $300+ million in annual revenue.
- Assisted with scaling the calculation of exhaustive eCommerce KPIs, which increased speed by approximately 80%.
- Implemented state-of-the-art security best practices in Microsoft Azure to protect business-sensitive information and share data with external parties.
- Improved the scalability and monitorability of a large reporting system through consistent QA testing and efficient backfilling mechanisms.
Technologies: Kubernetes, Apache Airflow, Azure Data Factory, Microsoft Power BI, Azure Data Lake, Databricks, SQL, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehousing, Data Warehouse Design, Databases, Amazon Web Services (AWS), Azure SQL, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)Data Engineer
2017 - 2019Cleverbridge AG- Planned and implemented a data warehousing system for reporting and analytics on Microsoft Azure for enterprise clients that generated $400+ million in aggregate annual revenue.
- Managed the fully cloud-hosted environment using infrastructure as code (IaC); designed and implemented database schema; and built ETL pipelines for processing granular, eCommerce datasets comprising hundreds of millions of rows of data.
- Communicated product goals to internal and external stakeholders and managed the backlog of a three-person Agile development team.
Technologies: Databricks, Azure Data Lake, Azure Data Factory, Microsoft Azure, Python, Database Management, DAX, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Apache Spark, Kubernetes, Amazon Web Services (AWS), Azure SQL, Data Engineering, Apache Airflow, ETL, Data Modeling, Database Development, Data Visualization, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD), HadoopSoftware Developer
2015 - 2017Starschema Ltd- Automated provisioning and recovery mechanisms of Hadoop and Tableau clusters hosted on AWS for Fortune 500 clients.
- Deployed image classification for anomaly detection in power plants for one of the largest industrial companies in the world.
- Implemented a solution to host containerized (Dockerized) Apache Kafka on Apache Mesos.
Technologies: Amazon Web Services (AWS), Image Processing, Microsoft Azure, Tableau, Hadoop, Apache Mesos, Apache Spark, R, Python, Database Management, Data Architecture, Data Warehouse Design, Data Warehousing, Databases, Docker, Kubernetes, Data Engineering, ETL, Data Modeling, Database Development, Data Pipelines, Data Lakes, PostgreSQL, Data Quality, Test-driven Development (TDD)Researcher
2014 - 2017Hungarian Academy of Sciences- Collected, processed, and performed text analysis on large corpora consisting of millions of sentences derived from audio recordings covering more than five days.
- Presented research results at the International Conference on Computational Social Science in 2018, the largest conference of its type in the world.
- Mapped the network of pieces of Hungarian legislation using text mining techniques. The findings were published in a scientific publication.
Technologies: Research, Django, Elasticsearch, Python, R, Databases, Data Pipelines