
Igor Gorbenko
Verified Expert in Engineering
Database and Back-end Developer
Dubai, United Arab Emirates
Toptal member since October 18, 2021
Igor is a seasoned data architect with 16+ years of experience in high-load systems, DWH, ETL, and ML pipelines. He has delivered innovative solutions for industry leaders such as TangoMe, Gazprombank, Stanford University, and Royal Mail. As a cloud-agnostic expert specializing in Flask, FastAPI, and database integration, Igor builds robust, scalable architectures. His passion for cloud-based systems empowers businesses to operate efficiently, gain flexibility, and achieve strategic advantages.
Portfolio
Experience
- Data Pipelines - 16 years
- SQL - 13 years
- Python - 10 years
- Amazon Web Services (AWS) - 8 years
- Big Data Architecture - 6 years
- Big Data - 6 years
- Google Cloud Platform (GCP) - 5 years
- Machine Learning Operations (MLOps) - 4 years
Availability
Preferred Environment
PyCharm, Slack, Linux, Git
The most amazing...
...thing I've built: a lakehouse system on Apache Iceberg, processing 2PB data with batch and real-time layers, boosting data processing efficiency and analytics.
Work Experience
Head of Data | Software Architect
Omniverse
- Developed and implemented the company's data strategy (around 2PB of data and 300,000 RPS).
- Built architectures for a DWH, DSP, and DMP using AWS, Kafka, RabbitMQ, Snowflake, ClickHouse, Apache Spark, Airflow, Kubernetes (K8s), Python/Go, and Aerospike.
- Implemented a data lakehouse based on Apache Iceberg and Apache Spark.
- Created a BI environment, driving data-driven decision-making across the organization.
- Led the data team in adopting data and MLOps practices, enhancing skills and fostering innovation.
- Launched an anti-fraud system and integrated ML pipelines.
Head of ML Engineering | Big Data Architect
Tango
- Implemented real-time recommendation and anti-fraud systems, resulting in a 25% increase in revenue.
- Installed best practices in data and MLOps from the ground up.
- Co-founded the data department, leading data initiatives and strategic direction.
- Led the ML engineering department, managing a team of up to 30 engineers.
- Optimized data loading into storage, refactoring the legacy code.
- Implemented a real-time image recognition system. Leveraged technologies GCP, Kafka, Kubernetes, Python, Dataflow, BigQuery, Airflow, Redis, etc.
- Created a mechanism for monitoring the operation of all components of the recommendation system.
Key Big Data Developer
EPAM Systems
- Designed an apartment's interior design recommendation system.
- Developed the back-end part of the flat interior recommendations system, including a scraper for collecting information for training models and all data processing processes.
- Solved incidents reported on Jira related to data pipelines.
Big Data Architect
Netwrix
- Migrated anomaly calculation processes from Docker containers to an EMR Apache Spark cluster. This allowed optimizing the speed of calculations several times.
- Reduced the cost of using AWS severalfold due to dynamic calculation EMR cluster configuration.
- Developed the monitoring system with reports and alert mechanisms. Implemented the CI/CD process.
- Performed tech leadership for the cloud-based prediction system design.
- Implemented MLOps from scratch using AWS, Python, Redshift, EMR, and API Gateway.
- Developed a User and Entity Behavior Analytics (UEBA) system for anomaly detection, which became the company’s flagship product and enhanced its competitiveness in the market.
Lead Big Data Developer
First Line Software
- Developed the full cycle of the ETL process for transforming customers' raw data into the OMOP Common Data Model (CDM) standard.
- Developed and implemented a tool to automate data conversion using Python, SQL, and Spark.
- Created and executed a tool for visualizing the converted data with Python, Django, and JavaScript.
Senior Software Developer
Fujitsu Global
- Built a system for distributing tickets by the performer of incidents.
- Developed and implemented a tracking system on the project.
- Migrated the billing reporting system to SQL Server Reporting Services (SSRS).
Chief Software Engineer
Gazprombank
- Developed an analytical and management reporting system.
- Built an automated system for installing retail exchange rates. This system increased the bank's income several times from currency exchange operations to reduce currency risks.
- Created a system for planning and monitoring the execution of the plan.
- Built a system for combating fraudulent transactions through the Client Bank functionality.
Experience
LakeHouse for Omniverse
Every day, the system processes up to 10TB of new data, efficiently handling both historical and streaming data. The batch layer manages large-scale data processing tasks, enabling efficient transformation and loading of massive datasets. The real-time layer ingests streaming data, allowing for immediate analytics and up-to-the-minute insights. This dual ingestion approach significantly enhanced data processing efficiency and expanded analytics capabilities.
By leveraging Apache Iceberg for table formats and S3 for scalable storage, the system provides robust data storage solutions with support for ACID transactions and schema evolution. Apache Spark's powerful processing engine facilitates complex data transformations and computations across large clusters.
Recommendation System for TangoMe
https://www.tango.me/live/recommendedI was the engineering team leader and owned the entire development process on the data and cloud sides.
An Apartment's Interior Design Recommendation System for EPAM
I was a project architect, as well as a data engineer and back-end developer. I designed the architecture of the system and the interaction of all components.
A Complex ETL of Medical Data with a Custom Conversion Kit for First Line Software
https://www.ohdsi.org/data-standardization/the-common-data-model/I was a tech lead on this project. My responsibilities were developing the core part of the framework’s components using Python, which allowed us to automate scheduled ETL steps and run other tasks after conversion, such as unit tests, stats reports, and so on. I also performed code reviewing and ran the ETL pipelines.
Education
Master's Degree in Information Technologies
Kazan National Research Technical University - Kazan, Russia
Certifications
AWS Certified Machine Learning - Specialty
AWS
SnowPro Core Certification
Snowflake
AWS Certified Solutions Architect Associate
AWS
Professional Cloud Architect
Google Cloud
Professional Data Engineer
Google Cloud
Associate Cloud Engineer
Google Cloud
AWS Certified Developer
PSI
AWS Certified Cloud Practitioner
PSI
Skills
Libraries/APIs
Pandas, Complex SQL Queries, PySpark, NumPy, Dropbox API, Google APIs
Tools
PyCharm, Git, Apache Airflow, Terraform, BigQuery, Tableau, Microsoft Power BI, Apache Beam, Postman, Slack, Grafana, Amazon Cognito, Cloud Dataflow, GitLab, Apache NiFi, Google Kubernetes Engine (GKE), Spark SQL, Amazon Athena, Google Cloud Dataproc, Apache Iceberg, Amazon SageMaker, AWS Glue
Languages
SQL, Bash, Python, Snowflake, Cypher, Scala, C#.NET, Excel VBA, Python 3
Paradigms
REST, ETL, Database Design
Platforms
Linux, Amazon Web Services (AWS), Google Cloud Platform (GCP), Docker, Apache Kafka, New Relic, Oracle, Cloud Run, Kubernetes, Databricks
Storage
PostgreSQL, Microsoft SQL Server, Amazon DynamoDB, Data Pipelines, JSON, Databases, Google Cloud, NoSQL, Redshift, Google Bigtable, IBM Informix, Cloud Firestore, ClickHouse, Amazon S3 (AWS S3)
Frameworks
Flask, Apache Spark, Django, Locust, Spark, Trino
Other
IT Systems Architecture, Google BigQuery, Big Data, Big Data Architecture, Data Architecture, Data Engineering, Analytics, Cloud, Data Analysis, Cloud Platforms, Data Visualization, Data Warehousing, Data Analytics, Foundry, Palantir, Amazon RDS, Database Schema Design, Distributed Systems, Data Migration, Data Modeling, Data Reporting, Large Language Models (LLMs), Artificial Intelligence (AI), FastAPI, Redis Clusters, Machine Learning Operations (MLOps), Machine Learning, Data Build Tool (dbt), Pub/Sub, Investments, Stock Market, Google Cloud Functions, EMR, Data Science
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring