Anton Wolkov
Verified Expert in Engineering
Big Data Architect and ML Developer Developer
San Gwann, Malta
Toptal member since December 14, 2022
Anton specializes in big data infrastructure architecture and machine learning operations (MLOps). He worked with many high-profile Fortune 500s and startups. He holds a bachelor's degree in computer science, with experience in big data infrastructure development and DevOps. He can get you started with data lakes, onboard data scientists, automation, and create production-grade self-service batch pipelines. Anton prefers using Airflow, Presto, TensorFlow, Kubernetes, and Grafana.
Portfolio
Experience
Availability
Preferred Environment
Ubuntu, Amazon Web Services (AWS), Google Cloud Platform (GCP), MacOS, Databricks, Jupyter Notebook, Zeppelin, IntelliJ IDEA, GitHub
The most amazing...
...thing I've created is a data pipeline infrastructure for six teams worldwide, generating a graph of all internet browsers and mobile phones.
Work Experience
MLOps Engineer
Toptal
- Established a CI/CD deployment pipeline for new AI-enabled apps to a new set of Kubernetes clusters.
- Integrated with existing networking, GitHub Actions, single sign-on (SSO), and monitoring tools and enabled rapid app development.
- Backported Dify and n8n for UI-based AI app deployment for less technical users and POCs.
DevOps Expert
Neobrain
- Engaged as a DevOps expert for an AI SaaS in development.
- Created Terraform and Helm infrastructures for a new production Kubernetes cluster in Azure.
- Provisioned and configured TPU machines for a one-off training session.
- Installed and configured a Prefect data pipeline with a CI/CD in GitLab and monitored in Prometheus with Grafana.
Lead MLOps,| DevOps | Software Engineer
Proofpoint
- Integrated multiple teams' data into a natural language processing (NLP) oriented batch data pipeline.
- Used ETL for data exploration and integration tests on anonymized data.
- Designed microservices architecture using Python, Docker, Helm, and AWS Service Operator.
- Built a Jenkins-based CI/CD pipeline Kubernetes deployment for the front and back ends.
- Merged Prometheus and Grafana dashboards from multiple Amazon EKS clusters using Thanos.
- Automated PagerDuty incident management with playbooks and CI/CD pipeline deployments.
Principal Software Engineer
Oracle
- Created a self-service process for data scientists to productize their data proof of concepts (POCs).
- Integrated metrics collection and reporting into all parts of the pipeline and GitHub pull requests.
- Migrated AWS EMR workloads using Spark and Kubernetes running on OCI infrastructure.
- Developed lightweight microservices to handle real-time pixel requests with strict service level agreements (SLAs).
- Extended the Python and Amazon S3 (AWS S3) library to support Oracle Cloud. Optimized for high-latency operations.
Software Engineer II
Amazon.com
- Onboarded a new real-time database to sync annotators' inputs. Used JavaScript, ETL, report generator, and data exploration tools for AI experiments and proof of concepts (POCs).
- Repurposed an internal voice annotation platform to be used for computer vision.
- Automated status reporting from an experiment management platform to Confluence.
- Created a Jira ticket templating system to simplify operational process status tracking.
Software Engineer II
Microsoft
- Integrated users and file APIs from Microsoft Office 365, Google, ServiceNow, Salesforce, and Okta. Used custom asynchronous distributed rate limiter logic.
- Created a data playground and scale test with automated CI/CD pipelines for data science proof of concepts (POCs). Utilized a huge anonymized production data sample.
- Integrated data pipelines to Splunk monitoring. Continued with later iterations of Apache Flink, which were integrated into Prometheus and Grafana.
- Optimized MongoDB and Elasticsearch-based pipelines to scale for all of Microsoft's customers' data from Outlook and SharePoint.
Experience
Android Automation App
Hackathon Project
Education
Bachelor's Degree in Computer Science
Technion – Israel Institute of Technology - Haifa, Israel
Skills
Libraries/APIs
Luigi, Apache, Pandas, TensorFlow, Jenkins Pipeline, REST API, React.js, PySpark, PyTorch, Node.js, CatBoost
Tools
Jenkins, Apache Airflow, Spark, Grafana, Amazon OpenSearch, Jira, Kibana, Amazon Elastic MapReduce (EMR), System Security, Qubole, AWS, Terraform, Tableau Development, Business Intelligence Development, IntelliJ IDEA, CircleCI, AWS ELB, AWS IAM, Chef, RabbitMQ, Ansible, Amazon EKS, AWS, AWS, AWS, Helm, ELK (Elastic Stack), GitHub, Artifactory, Confluence, Amazon Elastic Container Service (ECS), Docker Hub, AWS CLI, Amazon Virtual Private Cloud (VPC), AWS, Nagios, Git, Postman, Amazon SageMaker, Azure Kubernetes Service (AKS), Sentry, SonarQube, Gradle, Business Intelligence Development, Google Kubernetes Engine (GKE), Azure Machine Learning, Bitbucket, Splunk, Apache, AWS Glue, Apache, Prefect, BigQuery, JW Player
Languages
Python, Go, Java, Bash, Snowflake, SQL, HTML, JavaScript, GraphQL, TypeScript, C, Scala, R
Frameworks
Presto, Big Data Architecture, Swagger, Spark, Django, Hadoop
Paradigms
DevOps, ETL, Anomaly Detection, Continuous Integration (CI), Microservices Development, Automated Testing, DevSecOps, Microservices Architecture, Agile Development
Platforms
Cloud Engineering, Jupyter Notebook, Zeppelin, Docker, Kubernetes, Azure, Apache Kafka, AWS, Linux, Ubuntu, Azure, Oracle Cloud Infrastructure (OCI), Amazon EC2, AWS Lambda, Harbor, Apache, Android, Apache Pig
Storage
Elasticsearch, MongoDB, Redis, Amazon S3, PostgreSQL, RethinkDB, Hadoop, Auto-scaling Cloud Infrastructure, Oracle Development, Data Lakes, Database, AWS, Azure Blobs, NoSQL, Relational Databases, MySQL, Google Cloud Development, Data Integration, Database, Database, Redshift, Amazon Aurora, Aerospike, Database, OVH
Other
Machine Learning Operations (MLOps), Prometheus, MLflow, Cloudflare, Cloud Infrastructure, Cloud Security, AWS Auto Scaling, Back-end Developers, System Architecture, Scalability, Data Science, ETL Tools, ETL Testing, Apache Superset, Content Delivery Networks (CDN), NLP, AWS RDS, AWS Cloud, CI/CD Pipelines, Big Data Architecture, Data Protection, Cost Reduction & Optimization (Cost-down), AWS DevOps, DNS, API Gateways, Serverless, Identity & Access Management (IAM), Data Science, Data Engineering, Artificial Intelligence, Machine Learning, Big Data Architecture, Shell Scripting, HAProxy, Architecture, Cloud Architecture, Migration Engineering, Cloud Migration, Kubernetes Operations (kOps), Data Visualization, Data Science, Cloud Engineering, Data Modeling, Algorithms, Mathematics, Data, Mathematical Analysis, Data Analysis, Generative Adversarial Networks (GANs), Image Processing, Generative Design, 3D Modeling, Data Science, API Integration, Ads, Performance Optimization, Cost Management, DevOps, Personally Identifiable Information (PII), Infrastructure as Code (IaC), Site Reliability, Infrastructure, Argo CD, APIs, Proxies, Developer Portals, GPU Computing, NVIDIA TensorRT, Containers, Argo Workflow, IT Support, Software Architecture, GPT-4, Data Migration, EMR, LLM, FastAPI, GitHub Actions, GitOps, Apache Cassandra, Computer Vision, Computer Vision Algorithms, Generative Pre-trained Transformers (GPT), Dagster, Google BigQuery, Artificial Intelligence as a Service (AIaaS), TPU, Data Build Tool (dbt), Multimodal GenAI
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring