Big Data Architect and ML Developer Developer
Anton specializes in big data infrastructure architecture and machine learning operations (MLOps). He worked with many high-profile Fortune 500s and startups. He holds a bachelor's degree in computer science, with experience in big data infrastructure development and DevOps. He can get you started with data lakes, onboard data scientists, automation, and create production-grade self-service batch pipelines. Anton prefers using Airflow, Presto, TensorFlow, Kubernetes, and Grafana.
ExperienceAmazon Web Services (AWS) - 9 yearsETL Tools - 8 yearsElasticsearch - 8 yearsMachine Learning Operations (MLOps) - 8 yearsData Pipelines - 7 yearsApache Airflow - 5 yearsAmazon Athena - 5 yearsPrometheus - 5 years
Ubuntu, IntelliJ IDEA, Amazon Web Services (AWS), Google Cloud Platform (GCP), MacOS, Databricks, Jupyter Notebook, Zeppelin, IntelliJ, GitHub
The most amazing...
...thing I've created is a data pipeline infrastructure for six teams worldwide, generating a graph of all internet browsers and mobile phones.
DevOps Expert for AI SaaS service in development
- Created terraform and helm infrastructure for a new production kubernetes cluster in Azure.
- Provisioned and configured TPU machines for a one off training session.
- Installed and configured prefect data pipeline with a CI/CD in gitlab and monitoring in prometheus with grafana.
Lead MLOps, DevOps, Software Engineer
- Integrated multiple teams' data into a natural language processing (NLP) oriented batch data pipeline.
- Used ETL for data exploration and integration tests on anonymized data.
- Designed microservices architecture using Python, Docker, Helm, and AWS Service Operator.
- Built a Jenkins based CI/CD pipelines Kubernetes deployment for the front and back end.
- Merged Prometheus and Grafana dashboards from multiple Amazon EKS clusters using Thanos.
- Automated PagerDuty incident management with playbooks and CI/CD pipelines deployments.
Principal Software Engineer
- Created a self-service process for data scientists to productize their data proof of concepts (POCs).
- Integrated metrics collection and reporting into all parts of the pipeline and GitHub pull requests.
- Migrated AWS EMR workloads using Spark and Kubernetes running on OCI infrastructure.
- Developed lightweight microservices to handle real-time pixel requests with strict service level agreements (SLAs).
- Extended the Python and Amazon S3 (AWS S3) library to support Oracle Cloud. Optimized for high-latency operations.
Software Engineer II
- Repurposed an internal voice annotation platform to be used for computer vision.
- Automated status reporting from an experiment management platform to Confluence.
- Created a Jira ticket templating system to simplify operational process status tracking.
Software Engineer II
- Integrated users and file APIs from Microsoft Office 365, Google, ServiceNow, Salesforce, and Okta. Used custom asynchronous distributed rate limiter logic.
- Created a data playground and scale test with automated CI/CD pipelines for data science proof of concepts (POCs). Utilized a huge anonymized production data sample.
- Integrated data pipelines to Splunk monitoring. Continued with later iterations of Apache Flink, which were integrated into Prometheus and Grafana.
- Optimized MongoDB and Elasticsearch-based pipelines to scale for all of Microsoft's customers' data from Outlook and SharePoint.
Android Automation App
Presto DB, Apache Spark, Swagger, Spark
Luigi, Spark ML, Pandas, TensorFlow, Jenkins Pipeline, OCI, REST APIs, PyTorch, Node.js
Jenkins, Apache Airflow, Spark SQL, Grafana, Jira, Kibana, Amazon Elastic MapReduce (EMR), Vault, Qubole, Amazon Athena, Terraform, Tableau, Superset, IntelliJ, CircleCI, AWS ELB, AWS IAM, Chef, RabbitMQ, Ansible, Amazon EKS, Amazon CloudFront CDN, AWS CloudFormation, Amazon CloudWatch, Helm, ELK (Elastic Stack), GitHub, Artifactory, Confluence, Amazon Elastic Container Service (Amazon ECS), Docker Hub, AWS CLI, Amazon Virtual Private Cloud (VPC), Amazon Elastic Container Registry (Amazon ECR), Nagios, Git, Postman, Amazon SageMaker, Azure Kubernetes Service (AKS), Sentry, SonarQube, Gradle, Microsoft Power BI, Splunk, Apache Ignite, AWS Glue, Flink, BigQuery, Google Kubernetes Engine (GKE)
DevOps, ETL, Anomaly Detection, Continuous Integration (CI), Microservices, Automated Testing, Data Science, DevSecOps, Microservices Architecture, Continuous Delivery (CD)
Google Cloud Platform (GCP), Jupyter Notebook, Zeppelin, Docker, Kubernetes, Azure, Apache Kafka, Amazon Web Services (AWS), Linux, Ubuntu, Azure IaaS, Amazon EC2, AWS Lambda, Harbor, Apache Flink, Android, Apache Pig
Elasticsearch, MongoDB, Redis, Amazon S3 (AWS S3), PostgreSQL, RethinkDB, Apache Hive, Auto-scaling Cloud Infrastructure, Oracle Cloud, Data Lakes, Data Pipelines, Amazon DynamoDB, Azure Blobs, NoSQL, Relational Databases, MySQL, Google Cloud, Data Integration, Database Security, Company Databases, Redshift, Amazon Aurora, Aerospike, ScyllaDB, OVH
Machine Learning Operations (MLOps), Prometheus, Amazon OpenSearch, MLflow, Cloudflare, Cloud Infrastructure, Cloud Security, AWS Auto Scaling, Back-end, System Architecture, Scalability, Data Analytics, ETL Tools, ETL Testing, Apache Superset, Content Delivery Networks (CDN), Amazon RDS, AWS Cloud Architecture, CI/CD Pipelines, Big Data Architecture, Data Protection, Cost Reduction & Optimization, AWS DevOps, DNS, API Gateways, Serverless, Identity & Access Management (IAM), Data Engineering, Artificial Intelligence (AI), Machine Learning, Big Data, Shell Scripting, HAProxy, Architecture, Cloud Architecture, Migration, Cloud Migration, Kubernetes Operations (Kops), Data Visualization, Statistical Analysis, Cloud, Data Modeling, Algorithms, Mathematics, Data, Mathematical Analysis, Data Analysis, Generative Adversarial Networks (GANs), Image Processing, Generative Design, 3D Modeling, Data Reporting, API Integration, Ads, Performance Optimization, Cost Management, DevOps Engineer, Personally Identifiable Information (PII), Infrastructure as Code (IaC), Site Reliability Engineering (SRE), Infrastructure, Argo CD, APIs, Proxies, Developer Portals, GPU Computing, TensorRT, Containers, Apache Cassandra, Computer Vision, Computer Vision Algorithms, Natural Language Processing (NLP), Prefect, GPT, Generative Pre-trained Transformers (GPT), Google BigQuery, Artificial Intelligence as a Service (AIaaS), TPU, Data Build Tool (dbt)
Bachelor's Degree in Computer Science
Technion – Israel Institute of Technology - Haifa, Israel