Denys Stetsenko, Data Engineer and Software Developer in Berlin, Germany
Denys Stetsenko

Data Engineer and Software Developer in Berlin, Germany

Member since June 18, 2020
Denys has spent the better part of his career extracting, transforming, integrating, and storing all forms of big data (structured, unstructured, in real-time, batches, and more). Along with being proficient in Python, Denys has successfully built data processing, storage, and analysis solutions using AWS and Google Cloud services.
Denys is now available for hire

Portfolio

  • Unzer
    Redshift, Kafka Streams, Apache Kafka, Python, Data Build Tool (dbt), SQL...
  • Multiple clients
    Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture...
  • Funding Circle
    Amazon Web Services (AWS), Kafka Streams, AWS EMR, Redshift, Python...

Experience

Location

Berlin, Germany

Availability

Full-time

Preferred Environment

Microsoft Open Source API, Unix

The most amazing...

...thing I've done was to reduce the processing time from six hours to 20 minutes for a few production computations while playing around with a Hadoop cluster.

Employment

  • Team Lead, Data Engineering and Analytics

    2020 - PRESENT
    Unzer
    • Led a team of data engineers and built an enterprise data lake on AWS with Amazon S3, Glue Data Catalog and Glue PySpark jobs, Athena, and Redshift—integrated data from multiple subsidiary companies into a single place for analytics and predictive modeling.
    • Managed a team of engineers to build an event streaming system with Apache Kafka with Schema Registry to unify the transactions coming from different business entities and make data centrally available.
    • Implemented a dbt set up across two Redshift clusters (one for ETLs and one solely for reporting) and two other teams (data analytics and science) for managing data warehouse transformations and models.
    • Made multiple data sources (raw and enriched) available in Elasticsearch for consumption.
    • Implemented custom real-time alerts with Elasticsearch and Datadog for technical support and operations teams.
    • Supported reporting and data self-service by managing a Tableau server and a Redash instance.
    • Collaborated closely with the platform engineering team to keep up with the best practices for automated deployments (Github Actions and Jenkins) and IaC (Terraform).
    • Worked with the data science and analytics teams on the best practices for model training and deployment, data modeling, organizing development process, and automation.
    Technologies: Redshift, Kafka Streams, Apache Kafka, Python, Data Build Tool (dbt), SQL, Data Engineering, ETL, Snowflake, Data Architecture, Database Architecture, Data Lakes, PySpark, Elasticsearch, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Docker, Data Warehousing, Architecture, Performance Tuning
  • Data Engineer, Consultant (Freelance)

    2019 - 2020
    Multiple clients
    • Restructured a monolith ML model in PySpark to well-defined data load, processing, training, prediction, and output generation stages.
    • Expressed the multiple stages of the model's lifecycle through an Airflow dag; used parallelism, logging, and notification utilities; implemented data quality checks as part of the pipeline.
    • Introduced data-processing speed improvements—mainly through adjusting data compression formats for I/O operations, partitioning data, and using PySpark native functions instead of UDFs.
    • Gathered requirements, designed, and built a PostgreSQL data warehouse focused on marketing and investment performance adhering to Kimball's classic facts and dimensions principles.
    • Built data pipelines to populate the data warehouse with marketing and market analysis data from a variety of sources.
    • Supported the head of BI in setting up the Tableau reporting infrastructure.
    • Helped to split the reporting requirements and implementation into two buckets—real-time reporting with Elasticsearch and batch reporting that requires pre-processing, joining reference data, and aggregation with BigQuery.
    • Refactored and sped up the performance of PySpark ML models predicting returns and cancellations.
    • Automated the deployment process of models to EMR on-demand clusters.
    • Remapped data sources from Exasol to a data lake built on top of Amazon S3 with Presto.
    Technologies: Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, Spark ML, PySpark, AWS EMR, Elasticsearch, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Redshift, Docker, Apache Kafka, Data Warehousing, Architecture, Performance Tuning
  • Senior Data Engineer

    2016 - 2018
    Funding Circle
    • Built data pipelines to ingest data from Kafka, relational databases, MongoDB, financial agencies' APIs, marketing platforms, and Salesforce.
    • Managed ingested data sources into a centralized data lake on top of Amazon S3 (for UK and US business) and a PostgreSQL data warehouse (for EU business).
    • Integrated on-demand AWS EMR cluster with Hive and PySpark into the company's data warehousing, ETL, and reporting activities—to replace the long-running workloads inside PostgreSQL relational database.
    • Built data marts and models for automated reporting with PostgreSQL, Redshift, Hive, Amazon S3, and Athena (depending on the geography and stack) for C-level stakeholders and governmental agencies.
    Technologies: Amazon Web Services (AWS), Kafka Streams, AWS EMR, Redshift, Python, PostgreSQL, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, PySpark, Data Modeling, Data Aggregation, Amazon Athena, ETL Tools, Tableau, Apache Airflow, Spark, Amazon EC2, Docker, Apache Kafka, Data Warehousing, Architecture, Performance Tuning
  • Data Engineer and Release Manager

    2013 - 2016
    EPAM Systems
    • Developed ETL processes using IBM Datastage on top of Oracle database and SQL Server suite.
    • Executed, supervised, and communicated the release process with the stakeholders.
    • Built and presented multiple prototypes, as a member of the pre-sales squad, with Hadoop, Hive, and Spark.
    Technologies: Datastage, Microsoft SQL Server, Oracle, SQL, Data Engineering, ETL, Data Architecture, Database Architecture, Data Modeling, Data Aggregation, ETL Tools, Tableau, Apache Airflow, Amazon Web Services (AWS), Spark, Amazon EC2, Docker, Python, Data Warehousing, Architecture, Performance Tuning

Experience

  • Rebuilt an ETL for Loading Master Data for One of the Units of a Company in Gas and Oil Industry

    As my first ever working assignment, my initial step was to optimize it and then I decided to completely rebuild the ETL process that used to take up to six hours daily resulting in that it now finishes under 20 minutes.

  • POC Project for Using Spark SQL

    In a team of five, we built a Hadoop cluster of four nodes using Ambari, ingested public datasets into HDFS, defined Hive tables around the data, queried it with the help of Spark SQL, and finally visualized the results with the new MicroStrategy 10.

Skills

  • Languages

    SQL, Python, Bash, Snowflake
  • Frameworks

    Hadoop, Spark, Django, Flask, AWS EMR
  • Tools

    Amazon Athena, Kafka Streams, Jenkins, Apache Airflow, Tableau
  • Paradigms

    ETL, Agile
  • Platforms

    Docker, Amazon EC2, Amazon Web Services (AWS), Apache Kafka, Linux, Unix, Oracle
  • Storage

    PostgreSQL, Redshift, Database Architecture, Data Lakes, MySQL, Apache Hive, Amazon S3 (AWS S3), Microsoft SQL Server, Datastage, MongoDB, Elasticsearch
  • Other

    Data Engineering, Data Build Tool (dbt), Data Architecture, Data Aggregation, Data Modeling, ETL Tools, Data Warehousing, Architecture, Performance Tuning, APIs
  • Libraries/APIs

    Pandas, PySpark, Microsoft Open Source API, Spark ML

Education

  • Master of Science Degree in Strategic Information Systems
    2011 - 2012
    University of East Anglia - Norwich, UK
  • Bachelor of Science Degree in Computer Science
    2007 - 2011
    National Technical University of Ukraine – Kiev Polytechnic Institute - Kiev, Ukraine

Certifications

  • AWS Certified Solutions Architect Associate
    DECEMBER 2018 - DECEMBER 2020
    AWS
  • Microsoft Certified Solutions Associate — SQL Server 2012
    AUGUST 2015 - PRESENT
    Microsoft
  • Cloudera Certified Developer for Apache Hadoop
    NOVEMBER 2014 - PRESENT
    Cloudera
  • Oracle PS/SQL Developer Certified Associate
    JULY 2014 - PRESENT
    Oracle
  • IBM Certified Solution Developer InfoSphere DataStage v8.5
    APRIL 2014 - PRESENT
    IBM
  • Oracle Database SQL Certified SQL Expert
    APRIL 2014 - PRESENT
    Oracle

To view more profiles

Join Toptal
Share it with others