Alexander Sokolov, Developer in Bucharest, Romania
Alexander is available for hire
Hire Alexander

Alexander Sokolov

Verified Expert  in Engineering

Bio

Alex is a technology evangelist and entrepreneur specializing in data engineering, analytics, cloud computing, and DevOps. With extensive experience in building engineering teams and hands-on solution architecture, he excels in Modern Data Stack (MDS) implementations, MLOps pipelines, and cloud-native architectures using Kubernetes. Alex's expertise spans optimizing data workflows and cloud infrastructure for diverse clients, combining technical proficiency with strategic vision.

Portfolio

Virtido
Kubernetes, ETL, Amazon Web Services (AWS), Data Engineering, Python...
Private Consulting Services
Google Cloud Platform (GCP), Amazon Web Services (AWS), ETL, BigQuery...
Toptal
Python, Google Cloud, SQL, Apache Airflow, ETL, Kubernetes, Distributed Systems

Experience

  • SQL - 12 years
  • Python - 10 years
  • Linux - 10 years
  • Data Engineering - 10 years
  • Apache Parquet - 7 years
  • Docker - 7 years
  • Google Cloud - 7 years
  • Kubernetes - 5 years

Availability

Part-time

Preferred Environment

MacOS, Linux, Visual Studio Code (VS Code), PyCharm, Slack

The most amazing...

...project I’ve developed is a cloud-native data platform that optimized real-time analytics for a global retailer, reducing processing time by 80%.

Work Experience

DevOps and Data Engineering Consultant

2022 - PRESENT
Virtido
  • Created technical drafts and proofs of concept (PoCs) for data engineering solutions and cloud architecture.
  • Bootstrapped software project frameworks, data engineering pipelines, testing approaches, and CI/CD pipelines.
  • Designed, developed, and deployed efficient data infrastructures and ETL/ELT processes on Google Cloud, AWS, and self-managed data platforms.
Technologies: Kubernetes, ETL, Amazon Web Services (AWS), Data Engineering, Python, Google Cloud, SQL, Amazon Athena, Amazon S3 (AWS S3), Terraform, Dagster, Apache Parquet, Apache Arrow, Pandas, Amazon Elastic Container Service (ECS), AWS Fargate, DuckDB, Meltano, PostgreSQL, Continuous Integration (CI), Continuous Delivery (CD), Data Lineage, Data Quality, Data Lakehouse, Data Warehouse Design, Docker, AWS Glue, AWS ELB, Amazon Virtual Private Cloud (VPC), Amazon Elastic Container Registry (ECR), Amazon CloudWatch, Amazon RDS, Distributed Systems

Cloud & Data Architect

2017 - PRESENT
Private Consulting Services
  • Designed, developed, and deployed efficient data infrastructures and ETL/ELT processes on Google Cloud, AWS, Azure, and self-managed data platforms.
  • Shared my expertise in both streaming and batch analytics. Implemented Lambda and Kappa architectures.
  • Designed data warehouses (Kimball's dimensional model), data lakes, and data lakehouses with a Medallion architecture.
  • Implemented approaches to ensure and monitor data quality, data lineage, and data provenance.
  • Consulted and guided the implementation of DataOps and MLOps practices in teams of data engineers and data scientists.
  • Designed and developed CI/CD pipelines using GitHub Actions and Jenkins.
  • Performed platform engineering with a Kubernetes and CNCF stack for cloud, hybrid cloud, and on-premise environments.
  • Guided initiatives to improve developer productivity using DORA metrics, shift-left testing, and GitOps practices.
  • Developed REST APIs, microservices, authentication flows, and CLI automation tools.
Technologies: Google Cloud Platform (GCP), Amazon Web Services (AWS), ETL, BigQuery, Amazon Athena, Snowflake, Apache Iceberg, Apache Hudi, Delta Lake, Apache Airflow, Dagster, DuckDB, Airbyte, Meltano, Metabase, Apache Superset, Data Build Tool (dbt), PostgreSQL, ClickHouse, Apache Spark, Pandas, Scikit-learn, Keras, TensorFlow, Kubeflow, LangChain, Pgvector, Kubernetes, Helm, Flux CD, Docker, Terraform, GitHub Actions, Jenkins, Apache Parquet, Python, SQL, Data Engineering, Data Architecture, DataOps, Machine Learning Operations (MLOps), PySpark, Databricks, Distributed Systems, Azure

Senior Data Engineer

2024 - 2024
Toptal
  • Developed API ingestion frameworks with built-in retry mechanisms and monitoring, achieving high data reliability.
  • Created data engineering frameworks adopted by more than 15 engineers, reducing new pipeline development time through standardized templates and reusable components.
  • Optimized docker container builds for faster builds and smaller image footprints.
Technologies: Python, Google Cloud, SQL, Apache Airflow, ETL, Kubernetes, Distributed Systems

CTO

2020 - 2024
WeOne
  • Participated in day-to-day development and cloud infrastructure tasks, ensuring hands-on involvement and oversight.
  • Led software engineering management, overseeing the entire development lifecycle.
  • Cultivated and managed a high-caliber technical talent pool.
  • Designed and executed effective technical talent-hiring and interview processes.
  • Orchestrated internal software architecture and development processes.
  • Upheld a robust DevOps culture and software development best practices.
Technologies: Kubernetes, Google Cloud, Amazon Web Services (AWS), Python, Data Engineering, Software Architecture, Agile, Tech Sales

Co-owner and CEO

2018 - 2020
Semicolon Lab
  • Headed agile and focused team of dozens highly skilled engineers and consultants.
  • Involved in technical sales and presales, as well as software and cloud architecture development.
  • Operated in various development and consulting areas, primarily focusing on DevOps and cloud infrastructure engineering, data science, data engineering, and software testing automation.
Technologies: Tech Sales, Software Architecture, Agile Delivery, Engineering Management

Data Engineer

2017 - 2019
Toptal
  • Served as a Toptal core team member on the data engineering and data science team.
  • Designed, developed, and maintained high-performance ETL, data processing, and data analytics solutions, data warehouses, and data lakes.
  • Maintained the stability of Google Cloud Platform data infrastructure and troubleshot data pipeline issues to minimize data downtime.
  • Designed and developed software for data quality, data observability, and data lineage.
Technologies: Python, Scala, Google Cloud, SQL, ETL, Luigi, Pandas, Jupyter Notebook, Distributed Systems

Senior Software Engineer

2014 - 2017
EPAM Systems
  • Developed big data solutions using Apache Hadoop and Apache Spark, improving data processing efficiency and scalability.
  • Applied machine learning algorithms, particularly XGBoost, to enhance predictive modeling and decision-making processes.
  • Conducted comprehensive data analysis using Python, Pandas, and Jupyter Notebooks, delivering actionable insights to stakeholders.
Technologies: SQL, Python, Scala, Java, Apache Spark, Hadoop, XGBoost, Machine Learning, Pandas, PySpark, Distributed Systems, Business Intelligence (BI), Microsoft SQL Server

Software Engineer

2012 - 2014
EPAM Systems
  • Oversaw the design and implementation of databases on the Microsoft SQL Server Platform, optimizing performance and ensuring data integrity.
  • Developed and maintained ETL processes using SSIS and T-SQL, enhancing data flow and integration across systems.
  • Collaborated with cross-functional teams to ensure seamless integration of Cloudera and Hortonworks platforms into existing workflows.
Technologies: SQL, Python, Scala, Java, Business Intelligence (BI), Microsoft SQL Server

Experience

AWS Data Lakehouse

Led the development of a modern data lakehouse architecture on AWS, implementing a scalable solution using AWS S3 as the primary storage layer for Parquet files. I designed and deployed ETL workflows using Dagster for reliable orchestration, running on Amazon ECS for optimal resource utilization. Also, I leveraged Amazon Athena for cost-effective serverless SQL querying of the data lake, enabling ad-hoc analysis and reporting.

I implemented comprehensive data quality checks using Great Expectations to ensure data reliability and consistency throughout the pipeline. Finally, I built intuitive business intelligence dashboards using Metabase to provide stakeholders with self-service analytics capabilities and real-time insights.

This architecture significantly improved data accessibility while reducing query costs compared to previous warehouse solutions. I maintained high data quality standards with automated validation of schema changes and data integrity while enabling non-technical users to derive valuable insights through customizable Metabase visualizations.

Certifications

DECEMBER 2023 - DECEMBER 2025

Certified Kubernetes Security Specialist

The Linux Foundation

OCTOBER 2023 - OCTOBER 2025

Google Cloud Certified Professional Cloud Architect

Google Cloud

JULY 2023 - PRESENT

DeepLearning.AI TensorFlow Developer

Coursera

AUGUST 2022 - AUGUST 2025

Certified Kubernetes Administrator

The Linux Foundation

DECEMBER 2014 - PRESENT

MCSA: SQL Server 2012/2014

Microsoft

Skills

Libraries/APIs

PySpark, TensorFlow, XGBoost, Pandas, Luigi, Scikit-learn, Keras

Tools

PyCharm, Slack, Amazon Athena, Amazon Elastic Container Service (ECS), AWS Glue, Apache Airflow, BigQuery, Apache Iceberg, Helm, Terraform, Jenkins, AWS Fargate, AWS ELB, Amazon Virtual Private Cloud (VPC), Amazon Elastic Container Registry (ECR), Amazon CloudWatch

Languages

SQL, Python, Scala, Java, Snowflake

Platforms

Docker, Kubernetes, Amazon Web Services (AWS), MacOS, Linux, Visual Studio Code (VS Code), Jupyter Notebook, Google Cloud Platform (GCP), Apache Hudi, Airbyte, Meltano, Kubeflow, Apache Arrow, Databricks, Azure

Storage

Apache Parquet, Google Cloud, Amazon S3 (AWS S3), PostgreSQL, ClickHouse, Microsoft SQL Server

Frameworks

Apache Spark, Hadoop, Data Lakehouse

Paradigms

ETL, Agile, Continuous Integration (CI), Continuous Delivery (CD), Business Intelligence (BI)

Other

Data Engineering, Distributed Systems, Kubernetes Security, Machine Learning, Tech Sales, Software Architecture, Agile Delivery, Engineering Management, Dagster, Metabase, Delta Lake, DuckDB, Apache Superset, Data Build Tool (dbt), LangChain, Pgvector, Flux CD, GitHub Actions, Data Architecture, DataOps, Machine Learning Operations (MLOps), Data Lineage, Data Quality, Data Warehouse Design, Amazon RDS

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring