Verified Expert in Engineering
Data Engineer and Software Developer
Denys has spent the better part of his career extracting, transforming, integrating, and storing all forms of big data (structured, unstructured, in real-time, batches, and more). Along with being proficient in Python, Denys has successfully built data processing, storage, and analysis solutions using AWS and Google Cloud services.
Microsoft Open Source API, Unix
The most amazing...
...thing I've done was to reduce the processing time from six hours to 20 minutes for a few production computations while playing around with a Hadoop cluster.
Team Lead, Data Engineering and Analytics
- Led a team of data engineers and built an enterprise data lake on AWS with Amazon S3, Glue Data Catalog and Glue PySpark jobs, Athena, and Redshift—integrated data from multiple subsidiary companies into a single place for analytics and predictive modeling.
- Managed a team of engineers to build an event streaming system with Apache Kafka with Schema Registry to unify the transactions coming from different business entities and make data centrally available.
- Implemented a dbt set up across two Redshift clusters (one for ETLs and one solely for reporting) and two other teams (data analytics and science) for managing data warehouse transformations and models.
- Made multiple data sources (raw and enriched) available in Elasticsearch for consumption.
- Implemented custom real-time alerts with Elasticsearch and Datadog for technical support and operations teams.
- Supported reporting and data self-service by managing a Tableau server and a Redash instance.
- Collaborated closely with the platform engineering team to keep up with the best practices for automated deployments (Github Actions and Jenkins) and IaC (Terraform).
- Worked with the data science and analytics teams on the best practices for model training and deployment, data modeling, organizing development process, and automation.
Data Engineer, Consultant (Freelance)
- Restructured a monolith ML model in PySpark to well-defined data load, processing, training, prediction, and output generation stages.
- Expressed the multiple stages of the model's lifecycle through an Airflow dag; used parallelism, logging, and notification utilities; implemented data quality checks as part of the pipeline.
- Introduced data-processing speed improvements—mainly through adjusting data compression formats for I/O operations, partitioning data, and using PySpark native functions instead of UDFs.
- Gathered requirements, designed, and built a PostgreSQL data warehouse focused on marketing and investment performance adhering to Kimball's classic facts and dimensions principles.
- Built data pipelines to populate the data warehouse with marketing and market analysis data from a variety of sources.
- Supported the head of BI in setting up the Tableau reporting infrastructure.
- Helped to split the reporting requirements and implementation into two buckets—real-time reporting with Elasticsearch and batch reporting that requires pre-processing, joining reference data, and aggregation with BigQuery.
- Refactored and sped up the performance of PySpark ML models predicting returns and cancellations.
- Automated the deployment process of models to EMR on-demand clusters.
- Remapped data sources from Exasol to a data lake built on top of Amazon S3 with Presto.
Senior Data Engineer
- Built data pipelines to ingest data from Kafka, relational databases, MongoDB, financial agencies' APIs, marketing platforms, and Salesforce.
- Managed ingested data sources into a centralized data lake on top of Amazon S3 (for UK and US business) and a PostgreSQL data warehouse (for EU business).
- Integrated on-demand AWS EMR cluster with Hive and PySpark into the company's data warehousing, ETL, and reporting activities—to replace the long-running workloads inside PostgreSQL relational database.
- Built data marts and models for automated reporting with PostgreSQL, Redshift, Hive, Amazon S3, and Athena (depending on the geography and stack) for C-level stakeholders and governmental agencies.
Data Engineer and Release Manager
- Developed ETL processes using IBM Datastage on top of Oracle database and SQL Server suite.
- Executed, supervised, and communicated the release process with the stakeholders.
- Built and presented multiple prototypes, as a member of the pre-sales squad, with Hadoop, Hive, and Spark.
Rebuilt an ETL for Loading Master Data for One of the Units of a Company in Gas and Oil Industry
POC Project for Using Spark SQL
SQL, Python, Bash, Snowflake
Hadoop, Spark, Django, Flask
Amazon Athena, Amazon Elastic MapReduce (EMR), Kafka Streams, Jenkins, Apache Airflow, Tableau
Docker, Amazon EC2, Amazon Web Services (AWS), Apache Kafka, Linux, Unix, Oracle
PostgreSQL, Redshift, Database Architecture, Data Lakes, MySQL, Apache Hive, Amazon S3 (AWS S3), Microsoft SQL Server, Datastage, MongoDB, Elasticsearch
Data Engineering, Data Build Tool (dbt), Data Architecture, Data Aggregation, Data Modeling, ETL Tools, Data Warehousing, Architecture, Performance Tuning, APIs
Pandas, PySpark, Microsoft Open Source API, Spark ML
Master of Science Degree in Strategic Information Systems
University of East Anglia - Norwich, UK
Bachelor of Science Degree in Computer Science
National Technical University of Ukraine – Kiev Polytechnic Institute - Kiev, Ukraine
AWS Certified Solutions Architect Associate
Microsoft Certified Solutions Associate — SQL Server 2012
Cloudera Certified Developer for Apache Hadoop
Oracle PS/SQL Developer Certified Associate
IBM Certified Solution Developer InfoSphere DataStage v8.5
Oracle Database SQL Certified SQL Expert