Verified Expert in Engineering
Data Warehouse Architect and Developer
Aleksei is a highly-qualified data warehouse (DWH) architect with more than a decade of experience in developing full-stack enterprise standalone and big data DWH solutions. He has extensive technical knowledge and experience in DWH implementations that require attention to detail. A proven team leader with exemplary communication skills and a drive to keep up to date with industry innovation, Aleksei thrives under pressure to build robust products and deliver projects on time.
Hadoop, Spark, Python, Microsoft SQL Server, PostgreSQL, Vertica
The most amazing...
...ETL framework I've built automated the creation of 1,500+ manually built transformations and made it easy to migrate from proprietary to open-source ETL tools.
Staff BI Data Architect
- Integrated a brand new in-memory cluster-based BI solution for the customer experience department.
- Tested different ETL approaches (Spark JDBC, a proprietary loader) and implemented the best variant with the required integration, performance, and monitoring capabilities.
- Designed CI/CD requirements and built the process of deploying ETL scripts into HDFS using Python; implemented version tracking using Git and TeamCity as a deployment tool.
- Implemented a backup-and-recovery process along with data, schema changes, and ETL flows (including disaster and recovery tests).
- Defined metrics for monitoring and logging and integrated with company monitoring solutions: InfluxDB, Telegraf, and Grafana.
- Built a data quality (DQ) solution to test consistency, freshness, and integrity between the new BI API and Hadoop storage.
- Constructed a pipeline for collecting data from the internal flow automation system to Hadoop storage using Spark and implemented monitoring and DQ.
- Built an ETL framework for loading files to the Vertica MPP system with scheduling via Airflow. The frameworks could generate DAG based on YAML configuration files. In the configuration, you can control schema and ETL parameters, DQ, and logging.
- Made a BI uptime tracking solution. Used Apache Airflow to collect usage stats from various storage types/engines, and then DQ checks the results. After calculating the uptime metrics, they're compared to the KPIs and sent back to the users.
Leroy Merlin (Russia)
- Designed a data model for the ODS, DDS, and DM layers of a data lake.
- Implemented a monitoring and data quality process of distributed enterprise retail management systems with over 100 retail stores using NiFi. Created a Python application for generating metadata-based NiFi templates.
- Built a data ingesting pipeline from Microsoft SQL databases to MPP data lake using Debezium, Kafka, and NiFi.
- Designed NiFi templates for ingesting data from various data sources, including RDMS, NoSQL, and API sources.
Senior Data Warehouse Architect
PJSC Mobile TeleSystems (Russia)
- Designed and implemented a security role model for Hadoop, MMP, and RDMS data lake.
- Reduced the Hive query execution time by implementing LLAP technology. The top 20 most common queries got a 10x speed increase.
- Designed a logical and physical model for the DDS layer of subscriber event geo entity. Composed all the documentation for ETL and DevOps teams.
- Designed and implemented a POC for a company website event tracking system using the analytical framework Snowplow. The work included estimating current and 1-3 horizon workload, infrastructure planning, procurement, and designing a data model.
Business Intelligence Team Lead
JSC Europlan (Russia)
- Designed the architecture and created from scratch a consistent well-managed enterprise data warehouse (DWH) system which accumulates data from tens of external sources and meets all customer needs;
- Developed OLAP cubes (15+ databases). Reports and cubes were made for different internal customers from leasing, banking, and insurance divisions, including accounting, finance, risk management, personnel, financial, security, and sales departments.
- Managed a project that involved migrating part of the existing BI solution to open-source software.
- Developed a multithread Java application to automate an ETL package creation process using only metadata and templates (to get rid of the need to manually create and maintain thousands of transformation packages).
- Led a team in the management of a data warehouse (DWH) which became not only a source for enterprise reports but also a source of high-grade verified data for many other intercorporate systems. This allowed us to unload algorithms on other systems.
Airflow Architecture Migration Project for Real Estate
• Conducted analysis of the current EC2 standalone solution, evaluating its strengths and weaknesses.
• Set up a new AWS EKS cluster to support the updated architecture.
• Configured Helm Charts to deploy the Airflow application within the Kubernetes environment.
• Implemented FluxCD to embrace the GitOps paradigm for managing Kubernetes infrastructure and application.
• Provided fully hybrid environments:
a) Two cloud-based environments for production and staging.
b) One local environment for DAG development and debugging within an IDE.
c) Docker Compose for automated unit testing and integration tests.
• Developed unit tests for Airflow DAGs, enabling execution both locally and in CI/CD pipelines.
• Introduced a CI/CD pipeline within the GitLab ecosystem to ensure a proper software development lifecycle (SDLC) with mandatory unit tests during merge requests.
• Secured external access through dynamic DNS provisioning for Kubernetes with Cloudflare for enhanced security.
• Configured a hybrid logging S3 and local storage.
• Implemented database change management using Liquibase.
The final solution had a GitOps approach, scalability, flexible configuration, and proper SDCL support.
Travel Agency Startup | Cloud Data Warehouse Architect
What was done:
• Analyzed data sources for structure and built S2T mapping.
• Selected target data storage as BigQuery; integrated with customer GCP environment.
• Set up data collectors to extract real-time changes from Firebase to BigQuery RAW schema.
• Created an ETL framework to transform data from RAW schema to the ODS layer (GCP Cloud Scheduled queries).
• Added metadata-based transformation for Firebase documents JSON fields to BQ schema.
• Created reporting layer in DWH with recent snapshots and historical data.
• Engineered a cloud set up and BI tool Metabase.
• Migrated manual exports from Retool to five BigQuery-based Metabase dashboards with more than 30 reports.
• Designed data quality policy and processes:
- Freshness checks for each fact table
- Consistency checks for source Firebase collections; targeted raw tables, specifically Python Cloud Functions.
- Calculated uptime KPI metric.
• Enabled alerting for DQ checks via Metabase alert feature for the dev team and business users.
• Documented all solutions in Notion.
• Implemented BigQuery cost monitoring Cloud Logging Sink to BigQuery and alerting via Metabase question subscription.
Real Estate Company | Data Warehouse Solution
1. Made the ETL process using Airflow for loading various sources—Amazon RDS MS SQL sources, S3 Flat files (Excel, CSV), and API sources—into the ingesting layer of the data warehouse.
2. Built the data model and ETL DAGs for data layer storage—preprocessing and detailed data storage (DDS)—on PostgreSQL Amazon RDS.
3. Created DQ processes for controlling freshness and consistency between data sources and ingestion layer tables, ingestion layer and preprocessing, and preprocessing and mart layer. This included the data structure for customizing checks, saving aggregated and detailed results, and Airflow DAGs for running these checks.
4. Created the preprocessing layer's adjustment framework, which allowed the following:
• a user can define datasets for the target table;
• validation of changesets before applying changes (Alembic style but for data);
• reverse changes if required;
• log of changes and tracking for audit purposes.
5. Implemented SCD 4 for DDS.
6. Built the mart layer for views and materialized views for reporting purposes.
7. Prepared guidelines and built the POC for the migration to Snowflake for future extensions.
Agoda | Creation of an ETL Framework for Vertica on Airflow
The created framework could generate a DAG-based on YAML configuration files:
• Typical pipeline for file storage: grab a file from the remote file storage.
• Push to a staging table in Vertica with minimum transformation.
• Merge or overwrite to a destination table in Vertica.
• Apply business-specific transformation in Vertica using provided SQL (for the reporting layer).
• Push date linage and statistical information to data metric storage.
• Push data freshness values.
• Data source data validation and alerting it not correct.
An additional type of tasks:
• Run DQ checks to check integrity between data layers and accuracy.
• CI/CD was done via TeamCity.
The created solution became scalable, monitorable, and more robust. CI/CD enabled tracking on source changes and deployment process.
Agoda | Implementation of a Data Lake Integration with Customer Service Automation Software
• Interacted with vendor representatives to understand how to extract data from the API.
• Analyzed data available in the REST API.
• Designed transactional (ODS) and data mart (DM) data models for storage API data in Hadoop.
• Collaborated with the dev team responsible for system implementation about multi-datacenter HA deployment to consider future ETL and DQ tools.
• Built an ETL Apache Spark-based solution for incremental and full data loads.
• Integrated ETL app metrics to the enterprise monitoring system based on Cassandra and Grafana.
• Built the ETL for transformation data from the transactional layer to business-specific data marts.
• Created Metabase reports for business owners, which helped analyze the flow's effectiveness built by the flow interaction dev team.
• Established data quality processes to check consistency, accuracy, and freshness for each data pipeline stage.
This project created tremendous value for the analytical department, increased agent interaction performance transparency, and enabled real-time analyses.
SQL, Python, Bash Script, Python 3, Scala
ETL, ETL Implementation & Design, DevOps
Microsoft BI Stack, Google Cloud Platform (GCP), Firebase, Apache Kafka, Oracle, CentOS
Microsoft SQL Server, Apache Hive, PostgreSQL, JSON, Databases, Vertica, Greenplum, MongoDB, Amazon S3 (AWS S3)
Data Warehousing, Data Quality, ETL Development, Data Modeling, Data Warehouse Design, Metabase, Data Engineering, Big Data, ThoughtSpot, Data Analysis, Reporting, CI/CD Pipelines, Cloud, Analytics, quilliup, Computer Science, Amazon RDS, Google Cloud Functions, Catalog Data Entry Services, Web Scraping, Machine Learning, Cloudflare
Apache Spark, Hadoop, Spark, Scrapy, Flux
Apache Impala, Apache Airflow, BigQuery, Apache NiFi, Git, Vim Text Editor, Grafana, Amazon EKS, AWS ELB
Pandas, REST APIs, Liquibase
Master's Degree in Automated System Faculty, Computer Systems and Networks
Tver State Technical University - Tver, Russia
Bachelor’s Degree with Honors in Automated System Faculty, Computer Systems and Networks
Tver State Technical University - Tver, Russia