Igor Gorbenko, Developer in Dubai, United Arab Emirates
Igor is available for hire
Hire Igor

Igor Gorbenko

Verified Expert  in Engineering

Bio

Igor is a seasoned data architect with 16+ years of experience in high-load systems, DWH, ETL, and ML pipelines. He has delivered innovative solutions for industry leaders such as TangoMe, Gazprombank, Stanford University, and Royal Mail. As a cloud-agnostic expert specializing in Flask, FastAPI, and database integration, Igor builds robust, scalable architectures. His passion for cloud-based systems empowers businesses to operate efficiently, gain flexibility, and achieve strategic advantages.

Portfolio

Omniverse
Amazon Web Services (AWS), Snowflake, SQL, Python 3, Apache Iceberg, ClickHouse...
Tango
Google Cloud Platform (GCP), Redis Clusters, Google Bigtable, Cloud Dataflow...
EPAM Systems
Scala, Apache NiFi, Apache Kafka, Pub/Sub, Machine Learning...

Experience

  • Data Pipelines - 16 years
  • SQL - 13 years
  • Python - 10 years
  • Amazon Web Services (AWS) - 8 years
  • Big Data Architecture - 6 years
  • Big Data - 6 years
  • Google Cloud Platform (GCP) - 5 years
  • Machine Learning Operations (MLOps) - 4 years

Availability

Part-time

Preferred Environment

PyCharm, Slack, Linux, Git

The most amazing...

...thing I've built: a lakehouse system on Apache Iceberg, processing 2PB data with batch and real-time layers, boosting data processing efficiency and analytics.

Work Experience

Head of Data | Software Architect

2022 - PRESENT
Omniverse
  • Developed and implemented the company's data strategy (around 2PB of data and 300,000 RPS).
  • Built architectures for a DWH, DSP, and DMP using AWS, Kafka, RabbitMQ, Snowflake, ClickHouse, Apache Spark, Airflow, Kubernetes (K8s), Python/Go, and Aerospike.
  • Implemented a data lakehouse based on Apache Iceberg and Apache Spark.
  • Created a BI environment, driving data-driven decision-making across the organization.
  • Led the data team in adopting data and MLOps practices, enhancing skills and fostering innovation.
  • Launched an anti-fraud system and integrated ML pipelines.
Technologies: Amazon Web Services (AWS), Snowflake, SQL, Python 3, Apache Iceberg, ClickHouse, Apache Airflow, Apache Kafka, Spark, Machine Learning Operations (MLOps), Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Tableau, Pandas, Cypher, Foundry, Palantir, Databases, Amazon RDS, Complex SQL Queries, Database Schema Design, PySpark, Distributed Systems, Data Migration, Data Modeling, Data Reporting, NoSQL, Large Language Models (LLMs), Artificial Intelligence (AI), NumPy

Head of ML Engineering | Big Data Architect

2021 - 2022
Tango
  • Implemented real-time recommendation and anti-fraud systems, resulting in a 25% increase in revenue.
  • Installed best practices in data and MLOps from the ground up.
  • Co-founded the data department, leading data initiatives and strategic direction.
  • Led the ML engineering department, managing a team of up to 30 engineers.
  • Optimized data loading into storage, refactoring the legacy code.
  • Implemented a real-time image recognition system. Leveraged technologies GCP, Kafka, Kubernetes, Python, Dataflow, BigQuery, Airflow, Redis, etc.
  • Created a mechanism for monitoring the operation of all components of the recommendation system.
Technologies: Google Cloud Platform (GCP), Redis Clusters, Google Bigtable, Cloud Dataflow, Google BigQuery, Machine Learning Operations (MLOps), Apache Airflow, GitLab, Docker, Machine Learning, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Google Cloud, Tableau, Pandas, Databases, Amazon RDS, Complex SQL Queries, Database Schema Design, PySpark, Distributed Systems, Databricks, Data Migration, Data Modeling, Data Reporting, NoSQL, Large Language Models (LLMs), Artificial Intelligence (AI), NumPy

Key Big Data Developer

2020 - 2021
EPAM Systems
  • Designed an apartment's interior design recommendation system.
  • Developed the back-end part of the flat interior recommendations system, including a scraper for collecting information for training models and all data processing processes.
  • Solved incidents reported on Jira related to data pipelines.
Technologies: Scala, Apache NiFi, Apache Kafka, Pub/Sub, Machine Learning, Google Cloud Platform (GCP), SQL, BigQuery, Apache Airflow, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Google Cloud, Pandas, Databases, Amazon RDS, Complex SQL Queries, Database Schema Design, PySpark, Distributed Systems, Data Migration, Data Modeling, Data Reporting, NoSQL, Microsoft Power BI, Artificial Intelligence (AI), NumPy

Big Data Architect

2019 - 2020
Netwrix
  • Migrated anomaly calculation processes from Docker containers to an EMR Apache Spark cluster. This allowed optimizing the speed of calculations several times.
  • Reduced the cost of using AWS severalfold due to dynamic calculation EMR cluster configuration.
  • Developed the monitoring system with reports and alert mechanisms. Implemented the CI/CD process.
  • Performed tech leadership for the cloud-based prediction system design.
  • Implemented MLOps from scratch using AWS, Python, Redshift, EMR, and API Gateway.
  • Developed a User and Entity Behavior Analytics (UEBA) system for anomaly detection, which became the company’s flagship product and enhanced its competitiveness in the market.
Technologies: Amazon Web Services (AWS), Apache Spark, Machine Learning, Redshift, Terraform, Amazon DynamoDB, Amazon Cognito, Dropbox API, Google APIs, Docker, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Pandas, Databases, Amazon RDS, Complex SQL Queries, Database Schema Design, PySpark, Distributed Systems, Data Migration, Data Modeling, Data Reporting, NoSQL, Microsoft Power BI, Artificial Intelligence (AI), NumPy

Lead Big Data Developer

2018 - 2019
First Line Software
  • Developed the full cycle of the ETL process for transforming customers' raw data into the OMOP Common Data Model (CDM) standard.
  • Developed and implemented a tool to automate data conversion using Python, SQL, and Spark.
  • Created and executed a tool for visualizing the converted data with Python, Django, and JavaScript.
Technologies: Amazon Web Services (AWS), Google Cloud Platform (GCP), Apache Spark, SQL, Google BigQuery, Redshift, Django, Docker, Python 3, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Google Cloud, Pandas, Databases, Amazon RDS, Complex SQL Queries, Database Schema Design, PySpark, Distributed Systems, Data Migration, Data Modeling, Data Reporting, NoSQL, Microsoft Power BI, NumPy

Senior Software Developer

2016 - 2018
Fujitsu Global
  • Built a system for distributing tickets by the performer of incidents.
  • Developed and implemented a tracking system on the project.
  • Migrated the billing reporting system to SQL Server Reporting Services (SSRS).
Technologies: SQL, Bash, Linux, Microsoft SQL Server, IBM Informix, C#.NET, Oracle, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Python, Pandas, Databases, Complex SQL Queries, Database Schema Design, Distributed Systems, Data Migration, Data Modeling, Data Reporting, NoSQL, Microsoft Power BI, NumPy

Chief Software Engineer

2008 - 2016
Gazprombank
  • Developed an analytical and management reporting system.
  • Built an automated system for installing retail exchange rates. This system increased the bank's income several times from currency exchange operations to reduce currency risks.
  • Created a system for planning and monitoring the execution of the plan.
  • Built a system for combating fraudulent transactions through the Client Bank functionality.
Technologies: SQL, Excel VBA, C#.NET, Microsoft SQL Server, Investments, Stock Market, Analytics, Data Analysis, Data Visualization, Data Warehousing, Data Analytics, Python, Databases, Complex SQL Queries, Database Schema Design, Distributed Systems, Data Migration, Data Modeling, Data Reporting, Microsoft Power BI

Experience

LakeHouse for Omniverse

I designed and implemented a lakehouse system based on Apache Iceberg and S3, capable of processing 2PB of data. The architecture utilized Apache Spark as the primary ETL tool and featured two data ingestion paths: batch and real-time layers.

Every day, the system processes up to 10TB of new data, efficiently handling both historical and streaming data. The batch layer manages large-scale data processing tasks, enabling efficient transformation and loading of massive datasets. The real-time layer ingests streaming data, allowing for immediate analytics and up-to-the-minute insights. This dual ingestion approach significantly enhanced data processing efficiency and expanded analytics capabilities.

By leveraging Apache Iceberg for table formats and S3 for scalable storage, the system provides robust data storage solutions with support for ACID transactions and schema evolution. Apache Spark's powerful processing engine facilitates complex data transformations and computations across large clusters.

Recommendation System for TangoMe

https://www.tango.me/live/recommended
A GCP-based system for the purpose of recommendation. This system allows users to receive the most relevant content based on their interests.

I was the engineering team leader and owned the entire development process on the data and cloud sides.

An Apartment's Interior Design Recommendation System for EPAM

A GCP-based recommendation system. The system of recommendations for the interior design of apartments offers the most optimal arrangement of furniture based on a given apartment plan.

I was a project architect, as well as a data engineer and back-end developer. I designed the architecture of the system and the interaction of all components.

A Complex ETL of Medical Data with a Custom Conversion Kit for First Line Software

https://www.ohdsi.org/data-standardization/the-common-data-model/
The main task of this project was to convert raw data into a standardized format. The original datasets could be of various types and stored in different storages, such as AWS S3, GCP GCS, Hadoop HDFS, PostgreSQL, Amazon Redshift, and more. The project needed a tool to prepare a conversion in automatic mode and minimize the issues during the ETL process with Spark SQL.

I was a tech lead on this project. My responsibilities were developing the core part of the framework’s components using Python, which allowed us to automate scheduled ETL steps and run other tasks after conversion, such as unit tests, stats reports, and so on. I also performed code reviewing and ran the ETL pipelines.

Education

2003 - 2008

Master's Degree in Information Technologies

Kazan National Research Technical University - Kazan, Russia

Certifications

OCTOBER 2024 - OCTOBER 2026

AWS Certified Machine Learning - Specialty

AWS

DECEMBER 2022 - DECEMBER 2024

SnowPro Core Certification

Snowflake

DECEMBER 2021 - DECEMBER 2024

AWS Certified Solutions Architect Associate

AWS

JANUARY 2021 - JANUARY 2023

Professional Cloud Architect

Google Cloud

JANUARY 2021 - JANUARY 2023

Professional Data Engineer

Google Cloud

NOVEMBER 2020 - NOVEMBER 2022

Associate Cloud Engineer

Google Cloud

DECEMBER 2019 - DECEMBER 2022

AWS Certified Developer

PSI

AUGUST 2019 - DECEMBER 2022

AWS Certified Cloud Practitioner

PSI

Skills

Libraries/APIs

Pandas, Complex SQL Queries, PySpark, NumPy, Dropbox API, Google APIs

Tools

PyCharm, Git, Apache Airflow, Terraform, BigQuery, Tableau, Microsoft Power BI, Apache Beam, Postman, Slack, Grafana, Amazon Cognito, Cloud Dataflow, GitLab, Apache NiFi, Google Kubernetes Engine (GKE), Spark SQL, Amazon Athena, Google Cloud Dataproc, Apache Iceberg, Amazon SageMaker, AWS Glue

Languages

SQL, Bash, Python, Snowflake, Cypher, Scala, C#.NET, Excel VBA, Python 3

Paradigms

REST, ETL, Database Design

Platforms

Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Docker, Apache Kafka, New Relic, Oracle, Cloud Run, Kubernetes, Databricks

Storage

PostgreSQL, Microsoft SQL Server, Amazon DynamoDB, Data Pipelines, JSON, Databases, Google Cloud, NoSQL, Redshift, Google Bigtable, IBM Informix, Cloud Firestore, ClickHouse, Amazon S3 (AWS S3)

Frameworks

Flask, Apache Spark, Django, Locust, Spark, Trino

Other

IT Systems Architecture, Google BigQuery, Big Data, Big Data Architecture, Data Architecture, Data Engineering, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Foundry, Palantir, Amazon RDS, Database Schema Design, Distributed Systems, Data Migration, Data Modeling, Data Reporting, Large Language Models (LLMs), Artificial Intelligence (AI), FastAPI, Redis Clusters, Machine Learning Operations (MLOps), Machine Learning, Data Build Tool (dbt), Pub/Sub, Investments, Stock Market, Google Cloud Functions, EMR, Data Science

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring