Pavol Knapek
Verified Expert in Engineering
Data Engineer and Systems Developer
Pavol is a passionate data engineer and integrator who helps businesses manage their products and data. For the past 10+ years, Pavol has focused on a wide range of integration approaches, internal and external databases, row and column stores, SQL/NoSQL databases, OLTP/OLAP use cases, and everything that touches data. Pavol decommissioned an obsolete AWS Redshift DWH and established an alternative solution using relational databases, column-stores, and S3 storage.
Portfolio
Experience
Availability
Preferred Environment
Amazon Web Services (AWS), Linux, Slack, Git
The most amazing...
...project I've developed was an internal ETL-DWH-reporting from top to bottom that enabled internal customers to grow the company's business by 150%.
Work Experience
Senior Data Platform Engineer
Emplifi (Acquired Socialbakers)
- Designed the architecture of an internal big data platform, leading to better data democratization across the whole company, including its core products.
- Implemented efficient Spark pipelines on top of social media data.
- Led an internal data engineering edu group for active knowledge sharing.
- Built a custom and efficient checkpoint manager for using Amazon S3 as a source for Spark Structured Streaming pipelines.
- Owned SQL lectures designed for people from non-tech departments at the company.
Big Data Solution Architect
Dashmote B.V.
- Migrated raw Python applications into dockerized PySpark applications on EMR.
- Optimized most of the internal Spark pipelines to achieve better cost efficiency.
- Introduced a Databricks platform to match more modern platform designs.
- Built resilient pipelines for data enrichment using internally developed ML models.
- Led a series of internal workshops aimed at best practices, specifically for more junior engineers (Git, Python, PySpark, SQL, CI/CD, and software testing).
- Implemented internal Airflow operators for triggering EMR Serverless jobs.
- Built a metadata-driven generic and dynamic internal tool for an efficient and transactional data migration from Amazon S3 to PostgreSQL.
- Cooperated with software and data engineers based in Amsterdam and Shanghai.
Data Platform Engineer
Socialbakers
- Initiated the process of designing the architecture of the internal big data platform.
- Established content-enrichment pipelines by applying internally developed AI/ML models for both batched and real-time streaming scenarios.
- Built an internal tool to support better CI/CD on Databricks projects.
- Deployed Airflow on internal infrastructure and established internal guidelines.
- Implemented a generic streaming framework in raw Python.
Back-end Platform Developer | Front-end R&D Developer
Edvisor
- Improved the interaction between the front end and back end by migrating the traditional REST architecture into a GraphQL back end.
- Implemented generic endpoints for data reporting purposes.
- Built a React widget pluggable on B2B clients' websites.
- Evangelized the development team with data-related technologies during internal lunch-and-learn sessions.
Data Engineer and Integrator
Socialbakers
- Maintained an integration solution by using Pentaho ETL, Java, and Node.js.
- Built foundations of an internal DWH solution (traditional Kimball's dim/facts).
- Integrated Salesforce, Mixpanel, and Zendesk with our main SaaS products.
- Implemented internal REST APIs for product, integration, and reporting uses.
- Migrated parts of internal DWH into Redshift and evangelized the team about the biggest advantages of using columnar-store database engines.
Junior Big Data Specialist (Internship)
IBM
- Got selected for a summer internship, where I entered the world of big data.
- Worked on scientific comparison and exploration of using GPFS over HDFS.
- Completed a wide range of time management and soft skills training.
Java Developer and Integrator
Zitec
- Designed and implemented an integration architecture using SOAP and REST protocols (ERP, CRM, POS terminals, eCommerce, and BI).
- Customized ADempiere, an open-source ERP system, to match our client's needs.
- Managed an on-premise server using VMware ESXi and multiple virtual Linux server instances.
Experience
Data Lake at Socialbakers
Skills
Languages
SQL, Java, Python, JavaScript, Stored Procedure, T-SQL (Transact-SQL), Scala, GraphQL, PHP, Snowflake, C++, R
Frameworks
Spark, Presto, AngularJS, Apache Spark
Libraries/APIs
Node.js, PySpark, React
Tools
Git, RabbitMQ, VirtualBox, Apache Tomcat, Apache Airflow, AWS Glue, Amazon Athena, Jira, ADempiere, iDempiere ERP, Amazon Elastic MapReduce (EMR), Terraform, Slack, Sentry, Amazon Elastic Container Service (Amazon ECS)
Paradigms
Agile Software Development, REST, ETL, Distributed Computing
Platforms
Amazon Web Services (AWS), Linux, Pentaho, Databricks, Docker, Apache Kafka, AWS Lambda, Amazon EC2, CentOS, JBoss, Apache Pig
Storage
Redshift, PostgreSQL, Amazon S3 (AWS S3), MySQL, Data Pipelines, Databases, PL/SQL, Data Lakes, Relational Databases, MongoDB, MariaDB, HDFS, Apache Hive, Amazon DynamoDB, GPFS, NoSQL
Other
Processing & Threading, Software Architecture, OOP Designs, Amazon Kinesis, Streaming, Data Engineering, Big Data, Data Architecture, Solution Architecture, ELT, Web Scraping, API Design, Amazon RDS, Big Data Architecture, Data Warehousing, Message Queues, Data Transformation, Data Migration, Performance Tuning, ETL Tools, Mathematics, EMR, VMware ESXi, GlassFish, Data Build Tool (dbt), Machine Learning, Keen.io, IBM BigInsights, Delta Lake, Distributed Systems, Mentorship & Coaching, Team Mentoring, Troubleshooting, Teamwork
Education
Master's Degree in Web and Software Engineering, Focused on Information Systems and Management
Czech Technical University in Prague - Prague, Czech Republic
Bachelor's Degree in Informatics
Slovak University of Technology in Bratislava - Bratislava, Slovak Republic
Certifications
Machine Learning Foundations: A Case Study Approach
University of Washington | via Coursera
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring