Denys Stetsenko, Developer in Berlin, Germany
Denys is available for hire
Hire Denys

Denys Stetsenko

Verified Expert  in Engineering

Data Engineer and Software Developer

Location
Berlin, Germany
Toptal Member Since
June 18, 2020

Denys has spent the better part of his career extracting, transforming, integrating, and storing all forms of big data (structured, unstructured, in real-time, batches, and more). Along with being proficient in Python, Denys has successfully built data processing, storage, and analysis solutions using AWS and Google Cloud services.

Portfolio

Unzer
Redshift, Kafka Streams, Apache Kafka, Python, Data Build Tool (dbt), SQL...
Multiple clients
Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture...
Funding Circle
Amazon Web Services (AWS), Kafka Streams, Amazon Elastic MapReduce (EMR)...

Experience

Availability

Part-time

Preferred Environment

Microsoft Open Source API, Unix

The most amazing...

...thing I've done was to reduce the processing time from six hours to 20 minutes for a few production computations while playing around with a Hadoop cluster.

Work Experience

Team Lead, Data Engineering and Analytics

2020 - PRESENT
Unzer
  • Led a team of data engineers and built an enterprise data lake on AWS with Amazon S3, Glue Data Catalog and Glue PySpark jobs, Athena, and Redshift—integrated data from multiple subsidiary companies into a single place for analytics and predictive modeling.
  • Managed a team of engineers to build an event streaming system with Apache Kafka with Schema Registry to unify the transactions coming from different business entities and make data centrally available.
  • Implemented a dbt set up across two Redshift clusters (one for ETLs and one solely for reporting) and two other teams (data analytics and science) for managing data warehouse transformations and models.
  • Made multiple data sources (raw and enriched) available in Elasticsearch for consumption.
  • Implemented custom real-time alerts with Elasticsearch and Datadog for technical support and operations teams.
  • Supported reporting and data self-service by managing a Tableau server and a Redash instance.
  • Collaborated closely with the platform engineering team to keep up with the best practices for automated deployments (Github Actions and Jenkins) and IaC (Terraform).
  • Worked with the data science and analytics teams on the best practices for model training and deployment, data modeling, organizing development process, and automation.
Technologies: Redshift, Kafka Streams, Apache Kafka, Python, Data Build Tool (dbt), SQL, Data Engineering, ETL, Snowflake, Data Architecture, Database Architecture, Data Lakes, PySpark, Elasticsearch, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Docker, Data Warehousing, Architecture, Performance Tuning

Data Engineer, Consultant (Freelance)

2019 - 2020
Multiple clients
  • Restructured a monolith ML model in PySpark to well-defined data load, processing, training, prediction, and output generation stages.
  • Expressed the multiple stages of the model's lifecycle through an Airflow dag; used parallelism, logging, and notification utilities; implemented data quality checks as part of the pipeline.
  • Introduced data-processing speed improvements—mainly through adjusting data compression formats for I/O operations, partitioning data, and using PySpark native functions instead of UDFs.
  • Gathered requirements, designed, and built a PostgreSQL data warehouse focused on marketing and investment performance adhering to Kimball's classic facts and dimensions principles.
  • Built data pipelines to populate the data warehouse with marketing and market analysis data from a variety of sources.
  • Supported the head of BI in setting up the Tableau reporting infrastructure.
  • Helped to split the reporting requirements and implementation into two buckets—real-time reporting with Elasticsearch and batch reporting that requires pre-processing, joining reference data, and aggregation with BigQuery.
  • Refactored and sped up the performance of PySpark ML models predicting returns and cancellations.
  • Automated the deployment process of models to EMR on-demand clusters.
  • Remapped data sources from Exasol to a data lake built on top of Amazon S3 with Presto.
Technologies: Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, Spark ML, PySpark, Amazon Elastic MapReduce (EMR), Elasticsearch, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Redshift, Docker, Apache Kafka, Data Warehousing, Architecture, Performance Tuning

Senior Data Engineer

2016 - 2018
Funding Circle
  • Built data pipelines to ingest data from Kafka, relational databases, MongoDB, financial agencies' APIs, marketing platforms, and Salesforce.
  • Managed ingested data sources into a centralized data lake on top of Amazon S3 (for UK and US business) and a PostgreSQL data warehouse (for EU business).
  • Integrated on-demand AWS EMR cluster with Hive and PySpark into the company's data warehousing, ETL, and reporting activities—to replace the long-running workloads inside PostgreSQL relational database.
  • Built data marts and models for automated reporting with PostgreSQL, Redshift, Hive, Amazon S3, and Athena (depending on the geography and stack) for C-level stakeholders and governmental agencies.
Technologies: Amazon Web Services (AWS), Kafka Streams, Amazon Elastic MapReduce (EMR), Redshift, Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, PySpark, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Spark, Amazon EC2, Docker, Apache Kafka, Data Warehousing, Architecture, Performance Tuning

Data Engineer and Release Manager

2013 - 2016
EPAM Systems
  • Developed ETL processes using IBM Datastage on top of Oracle database and SQL Server suite.
  • Executed, supervised, and communicated the release process with the stakeholders.
  • Built and presented multiple prototypes, as a member of the pre-sales squad, with Hadoop, Hive, and Spark.
Technologies: Datastage, Microsoft SQL Server, Oracle, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, Data Modeling, Data Aggregation, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Docker, Python, Data Warehousing, Architecture, Performance Tuning

Rebuilt an ETL for Loading Master Data for One of the Units of a Company in Gas and Oil Industry

As my first ever working assignment, my initial step was to optimize it and then I decided to completely rebuild the ETL process that used to take up to six hours daily resulting in that it now finishes under 20 minutes.

POC Project for Using Spark SQL

In a team of five, we built a Hadoop cluster of four nodes using Ambari, ingested public datasets into HDFS, defined Hive tables around the data, queried it with the help of Spark SQL, and finally visualized the results with the new MicroStrategy 10.

Languages

SQL, Python, Bash, Snowflake

Frameworks

Hadoop, Spark, Django, Flask

Tools

Amazon Athena, Amazon Elastic MapReduce (EMR), Kafka Streams, Jenkins, Apache Airflow, Tableau

Paradigms

ETL, Agile

Platforms

Docker, Amazon EC2, Amazon Web Services (AWS), Apache Kafka, Linux, Unix, Oracle

Storage

PostgreSQL, Redshift, Database Architecture, Data Lakes, MySQL, Apache Hive, Amazon S3 (AWS S3), Microsoft SQL Server, Datastage, MongoDB, Elasticsearch

Other

Data Engineering, Data Build Tool (dbt), Data Architecture, Data Aggregation, Data Modeling, ETL Tools, Data Warehousing, Architecture, Performance Tuning, APIs

Libraries/APIs

Pandas, PySpark, Microsoft Open Source API, Spark ML

2011 - 2012

Master of Science Degree in Strategic Information Systems

University of East Anglia - Norwich, UK

2007 - 2011

Bachelor of Science Degree in Computer Science

National Technical University of Ukraine – Kiev Polytechnic Institute - Kiev, Ukraine

DECEMBER 2018 - DECEMBER 2020

AWS Certified Solutions Architect Associate

AWS

AUGUST 2015 - PRESENT

Microsoft Certified Solutions Associate — SQL Server 2012

Microsoft

NOVEMBER 2014 - PRESENT

Cloudera Certified Developer for Apache Hadoop

Cloudera

JULY 2014 - PRESENT

Oracle PS/SQL Developer Certified Associate

Oracle

APRIL 2014 - PRESENT

IBM Certified Solution Developer InfoSphere DataStage v8.5

IBM

APRIL 2014 - PRESENT

Oracle Database SQL Certified SQL Expert

Oracle

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring