Aleksei Ildiakov, Developer in Cancún, Mexico
Aleksei is available for hire
Hire Aleksei

Aleksei Ildiakov

Verified Expert  in Engineering

Bio

Aleksei is a skilled software engineer with expertise in back-end development, including APIs, SQL, business intelligence, and big data. Committed to delivering high-performance data solutions that catalyze business achievements, he brings proficiency in Python, Scala, Go, and other leading-edge technologies. Aleksei is interested in pursuing a challenging role where he can harness his development architectural skills to deliver outstanding outcomes and boost business value.

Portfolio

Workiz.com
Python, Apache Airflow, Spark, Snowflake, BigQuery, Google Cloud Storage, Cloud...
Mediascope
Scala, Spark, Apache Airflow, Hadoop, Apache Hive, HBase, Apache Kafka...
RT-labs
PostgreSQL, Python, Django, REST, GraphQL, MongoDB, AWS IoT, Neo4j, ClickHouse...

Experience

  • Python - 7 years
  • CI/CD Pipelines - 7 years
  • Apache Kafka - 5 years
  • Apache Airflow - 5 years
  • Spark - 5 years
  • Data Warehousing - 5 years
  • Google Cloud Storage - 4 years
  • Snowflake - 4 years

Availability

Part-time

Preferred Environment

Google Cloud Storage, Snowflake, Apache Airflow, Spark, AWS IoT

The most amazing...

...thing I've worked on is an ETL pipeline factory that allows non-engineering professionals to build and maintain high-performance data pipelines.

Work Experience

Senior Data Engineer

2022 - 2024
Workiz.com
  • Led the design and implementation of a cutting-edge Airflow ETL pipeline factory for a startup project comprising over 50 directed acyclic graphs (DAGs) with auto-scheduling and auto-building capabilities using Kafka, GCP, Python, and PySpark.
  • Designed a scalable and efficient Data Vault 2.0 and data warehousing solution for the project, incorporating different types of tables with their respective relations, including mostly historical insert-only tables.
  • Achieved a 99.99% data accuracy and match rate by implementing rigorous data validation and data governance practices using Python, BigQuery, and Spark.
  • Implemented solutions for low resource utilization calculations for batch and stream flows using BigQuery, SnowFlake, and MySQL.
  • Developed Go microservices to improve data transformation and processing, resulting in a 30% increase in performance and throughput.
Technologies: Python, Apache Airflow, Spark, Snowflake, BigQuery, Google Cloud Storage, Cloud, CI/CD Pipelines, Git, Data Warehousing, Go, Data Engineering, Large Language Models (LLMs), SQL, Data Modeling, Data Analysis, PySpark, ETL, Apache Spark, API Databases, Data Pipelines, Google Cloud, Google Cloud Platform (GCP), Big Data, Data, Distributed Systems, Startups, Scalability, Data Flows, Automation, Git Repo, Data Build Tool (dbt), Orchestration

Senior Big Data Engineer

2020 - 2022
Mediascope
  • Championed designing and implementing an innovative Airflow auto DAG-building solution, utilizing Python, PostgreSQL, AWS, Hadoop, and Git, reducing release time by up to 40% for a high-volume data processing system.
  • Implemented and optimized a high load of over ten terabytes a day of data processing pipelines using Spark and PySpark, resulting in a 40% reduction in hardware utilization.
  • Boosted team efficiency by 20% by creating bespoke automation and CI/CD tools, empowering colleagues to streamline their workflows and focus on higher-value tasks.
Technologies: Scala, Spark, Apache Airflow, Hadoop, Apache Hive, HBase, Apache Kafka, Greenplum, CockroachDB, GitLab CI/CD, AWS IoT, Amazon Web Services (AWS), Data Engineering, SQL, Data Modeling, PySpark, ETL, Apache Spark, API Databases, Data Pipelines, Google Cloud, Google Cloud Platform (GCP), Big Data, Data, Distributed Systems, Scalability, Data Flows, Static Analysis, Automation, Git Repo, Orchestration, Apache NiFi

Software Engineer

2019 - 2020
RT-labs
  • Designed and implemented a custom BI service with a multifunctional interface for analyzing geographic, graph, historical, and relational data sources, utilizing Python, PostgreSQL, ClickHouse, Neo4j, and MongoDB.
  • Provided the migration from a self-hosted cluster to the AWS cloud.
  • Designed and implemented a geographic information system (GIS) platform for flexible data sources and purposes, such as emergency reachability or waste pollution, including parts in computer vision and graph mathematics.
  • Improved pre-processing and post-processing functions performance by 20-50% for an analytical web service and back-end, utilizing Lua, Python, and SQL.
  • Implemented complex data warehouses and data lakes for analytical purposes and automatic ETL pipelines for parsing, sorting, and verifying data using Python, SQL, and Airflow.
Technologies: PostgreSQL, Python, Django, REST, GraphQL, MongoDB, AWS IoT, Neo4j, ClickHouse, Data Engineering, SQL, Data Analysis, T-SQL (Transact-SQL), SQL Server Integration Services (SSIS), PySpark, ETL, API Databases, Data Pipelines, FastAPI, Big Data, Data, Distributed Systems, Data Flows, Automation, Git Repo, Redshift, Orchestration, Stitch Data

Software Engineer

2017 - 2019
Onduline
  • Conducted statistical analysis and data manipulation using SQL and Python.
  • Developed analytical functions and procedures in T-SQL.
  • Deployed and managed applications using Azure Kubernetes Service (AKS) for efficient and scalable application hosting.
  • Created dashboards and reports for data visualizations using Tableau and SQL Server Data Tools (SSDT).
Technologies: Microsoft SQL Server, Microsoft Power BI, Tableau, Python, SQL Server Data Tools (SSDT), Azure, SQL, ETL, Data Pipelines, Data

Experience

Pipeline Factory

I worked on developing a framework for constructing batch and stream data pipelines, empowering non-engineers to independently create and manage their pipelines with high performance. I leveraged Python and Scala for seamless integration with various data warehouse solutions.

I enabled capabilities for establishing connections and extracting data from external APIs and storage repositories.

Education

2006 - 2013

Master's Degree in Statistics

Siberian State University of Telecommunications and Information Sciences - Novosibirsk, Russia

Skills

Libraries/APIs

PySpark

Tools

Apache Airflow, Git, GitLab CI/CD, Apache NiFi, BigQuery, Stitch Data, Microsoft Power BI, Tableau

Languages

Snowflake, Python, SQL, Scala, T-SQL (Transact-SQL), GraphQL, Go

Frameworks

Apache Spark, Spark, Hadoop, Django

Paradigms

ETL, REST, Automation

Platforms

Apache Kafka, Amazon Web Services (AWS), AWS IoT, Google Cloud Platform (GCP), Azure

Storage

Google Cloud Storage, PostgreSQL, MongoDB, API Databases, Data Pipelines, ClickHouse, Google Cloud, Redshift, Apache Hive, HBase, Greenplum, CockroachDB, Neo4j, Microsoft SQL Server, SQL Server Data Tools (SSDT), SQL Server Integration Services (SSIS)

Other

Cloud, CI/CD Pipelines, Data Warehousing, Data Engineering, Data Modeling, Data Analysis, Big Data, Data, Distributed Systems, Scalability, Data Flows, Git Repo, Orchestration, Data Science, Startups, Static Analysis, Data Build Tool (dbt), Large Language Models (LLMs), FastAPI

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring