Alexander Sokolov
Verified Expert in Engineering
Software Developer
Bucharest, Romania
Toptal member since November 4, 2024
Alex is a technology evangelist and entrepreneur specializing in data engineering, analytics, cloud computing, and DevOps. With extensive experience in building engineering teams and hands-on solution architecture, he excels in Modern Data Stack (MDS) implementations, MLOps pipelines, and cloud-native architectures using Kubernetes. Alex's expertise spans optimizing data workflows and cloud infrastructure for diverse clients, combining technical proficiency with strategic vision.
Portfolio
Experience
- SQL - 12 years
- Python - 10 years
- Linux - 10 years
- Data Engineering - 10 years
- Apache Parquet - 7 years
- Docker - 7 years
- Google Cloud - 7 years
- Kubernetes - 5 years
Availability
Preferred Environment
MacOS, Linux, Visual Studio Code (VS Code), PyCharm, Slack
The most amazing...
...project I’ve developed is a cloud-native data platform that optimized real-time analytics for a global retailer, reducing processing time by 80%.
Work Experience
DevOps and Data Engineering Consultant
Virtido
- Created technical drafts and proofs of concept (PoCs) for data engineering solutions and cloud architecture.
- Bootstrapped software project frameworks, data engineering pipelines, testing approaches, and CI/CD pipelines.
- Designed, developed, and deployed efficient data infrastructures and ETL/ELT processes on Google Cloud, AWS, and self-managed data platforms.
Cloud & Data Architect
Private Consulting Services
- Designed, developed, and deployed efficient data infrastructures and ETL/ELT processes on Google Cloud, AWS, Azure, and self-managed data platforms.
- Shared my expertise in both streaming and batch analytics. Implemented Lambda and Kappa architectures.
- Designed data warehouses (Kimball's dimensional model), data lakes, and data lakehouses with a Medallion architecture.
- Implemented approaches to ensure and monitor data quality, data lineage, and data provenance.
- Consulted and guided the implementation of DataOps and MLOps practices in teams of data engineers and data scientists.
- Designed and developed CI/CD pipelines using GitHub Actions and Jenkins.
- Performed platform engineering with a Kubernetes and CNCF stack for cloud, hybrid cloud, and on-premise environments.
- Guided initiatives to improve developer productivity using DORA metrics, shift-left testing, and GitOps practices.
- Developed REST APIs, microservices, authentication flows, and CLI automation tools.
Senior Data Engineer
Toptal
- Developed API ingestion frameworks with built-in retry mechanisms and monitoring, achieving high data reliability.
- Created data engineering frameworks adopted by more than 15 engineers, reducing new pipeline development time through standardized templates and reusable components.
- Optimized docker container builds for faster builds and smaller image footprints.
CTO
WeOne
- Participated in day-to-day development and cloud infrastructure tasks, ensuring hands-on involvement and oversight.
- Led software engineering management, overseeing the entire development lifecycle.
- Cultivated and managed a high-caliber technical talent pool.
- Designed and executed effective technical talent-hiring and interview processes.
- Orchestrated internal software architecture and development processes.
- Upheld a robust DevOps culture and software development best practices.
Co-owner and CEO
Semicolon Lab
- Headed agile and focused team of dozens highly skilled engineers and consultants.
- Involved in technical sales and presales, as well as software and cloud architecture development.
- Operated in various development and consulting areas, primarily focusing on DevOps and cloud infrastructure engineering, data science, data engineering, and software testing automation.
Data Engineer
Toptal
- Served as a Toptal core team member on the data engineering and data science team.
- Designed, developed, and maintained high-performance ETL, data processing, and data analytics solutions, data warehouses, and data lakes.
- Maintained the stability of Google Cloud Platform data infrastructure and troubleshot data pipeline issues to minimize data downtime.
- Designed and developed software for data quality, data observability, and data lineage.
Senior Software Engineer
EPAM Systems
- Developed big data solutions using Apache Hadoop and Apache Spark, improving data processing efficiency and scalability.
- Applied machine learning algorithms, particularly XGBoost, to enhance predictive modeling and decision-making processes.
- Conducted comprehensive data analysis using Python, Pandas, and Jupyter Notebooks, delivering actionable insights to stakeholders.
Software Engineer
EPAM Systems
- Oversaw the design and implementation of databases on the Microsoft SQL Server Platform, optimizing performance and ensuring data integrity.
- Developed and maintained ETL processes using SSIS and T-SQL, enhancing data flow and integration across systems.
- Collaborated with cross-functional teams to ensure seamless integration of Cloudera and Hortonworks platforms into existing workflows.
Experience
AWS Data Lakehouse
I implemented comprehensive data quality checks using Great Expectations to ensure data reliability and consistency throughout the pipeline. Finally, I built intuitive business intelligence dashboards using Metabase to provide stakeholders with self-service analytics capabilities and real-time insights.
This architecture significantly improved data accessibility while reducing query costs compared to previous warehouse solutions. I maintained high data quality standards with automated validation of schema changes and data integrity while enabling non-technical users to derive valuable insights through customizable Metabase visualizations.
Certifications
Certified Kubernetes Security Specialist
The Linux Foundation
Google Cloud Certified Professional Cloud Architect
Google Cloud
DeepLearning.AI TensorFlow Developer
Coursera
Certified Kubernetes Administrator
The Linux Foundation
MCSA: SQL Server 2012/2014
Microsoft
Skills
Libraries/APIs
PySpark, TensorFlow, XGBoost, Pandas, Luigi, Scikit-learn, Keras
Tools
PyCharm, Slack, Amazon Athena, Amazon Elastic Container Service (ECS), AWS Glue, Apache Airflow, BigQuery, Apache Iceberg, Helm, Terraform, Jenkins, AWS Fargate, AWS ELB, Amazon Virtual Private Cloud (VPC), Amazon Elastic Container Registry (ECR), Amazon CloudWatch
Languages
SQL, Python, Scala, Java, Snowflake
Platforms
Docker, Kubernetes, Amazon Web Services (AWS), MacOS, Linux, Visual Studio Code (VS Code), Jupyter Notebook, Google Cloud Platform (GCP), Apache Hudi, Airbyte, Meltano, Kubeflow, Apache Arrow, Databricks, Azure
Storage
Apache Parquet, Google Cloud, Amazon S3 (AWS S3), PostgreSQL, ClickHouse, Microsoft SQL Server
Frameworks
Apache Spark, Hadoop, Data Lakehouse
Paradigms
ETL, Agile, Continuous Integration (CI), Continuous Delivery (CD), Business Intelligence (BI)
Other
Data Engineering, Distributed Systems, Kubernetes Security, Machine Learning, Tech Sales, Software Architecture, Agile Delivery, Engineering Management, Dagster, Metabase, Delta Lake, DuckDB, Apache Superset, Data Build Tool (dbt), LangChain, Pgvector, Flux CD, GitHub Actions, Data Architecture, DataOps, Machine Learning Operations (MLOps), Data Lineage, Data Quality, Data Warehouse Design, Amazon RDS
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring