Alex Zhang
Verified Expert in Engineering
DevOps Engineer and AWS Developer
Toronto, ON, Canada
Toptal member since October 4, 2021
Alex is a DevOps engineer with over 10 years of experience in cloud infrastructure, infrastructure as code (IaC), containerization, CI/CD, monitoring and observability, configuration management, and shell scripting. His technologies include AWS, Terraform, Terragrunt, Kubernetes, Argo CD, GitHub, GitLab, Datadog, Prometheus, Grafana, Ansible, Python, and Linux. Alex learns fast and continuously welcomes a challenge.
Portfolio
Experience
- Linux - 10 years
- Python - 10 years
- Amazon Web Services (AWS) - 10 years
- Kubernetes - 8 years
- Terraform - 5 years
- GitHub Actions - 3 years
- Argo CD - 3 years
- Databricks - 3 years
Availability
Preferred Environment
Amazon Web Services (AWS), Terraform, Kubernetes, Python, Linux, GitHub, Datadog, Databricks
The most amazing...
...project I've implemented is an ML operations project on AWS using Kubeflow for model deployment.
Work Experience
Senior Infrastructure Engineer
Furniture
- Designed and managed AWS infrastructure that ensured scalability, availability, and security, supporting millions of users.
- Built and maintained infrastructure as code using Terraform, Terragrunt, AWS CloudFormation, and AWS CDK, automating the provisioning and lifecycle of AWS resources.
- Led efforts to deploy and maintain Kubernetes clusters, optimizing for resource efficiency, scalability, and high availability.
- Developed and maintained a robust CI/CD pipeline with GitHub Actions and Argo CD, enabling seamless deployment of containerized microservices to Kubernetes clusters.
- Created and enhanced monitoring and alerting solutions with Datadog, AWS CloudWatch, and other tools, proactively identifying and resolving performance issues.
- Partnered with security teams to meet security standards and comply with regulations such as CIS and SOC 2.
- Supported the data engineering team by designing and implementing a scalable, secure data analytics and processing platform on Databricks.
- Collaborated closely with development, QA, and other teams to optimize CI/CD pipelines, accelerate deployments, and ensure code quality.
- Produced and maintained detailed documentation of infrastructure, workflows, and configurations to support knowledge sharing and best practices.
- Diagnosed and resolved complex system issues, troubleshooted network problems, and provided technical support as needed.
Kubernetes and Terraform Specialist
Hub International
- Deployed and managed the production-grade infrastructure, including the network topology, orchestration tools, databases, caches, load balancers, CI/CD pipeline, monitoring, alerting, log aggregation, etc.
- Defined and managed infrastructure as code using Terraform, Terragrunt, and Gruntwork Service Catalog; also developed custom Terraform modules to meet the company's needs.
- Moved services from AWS ECS (an old AWS account) to AWS EKS without downtime.
Site Reliability Engineer
Sweetgreen
- Built, tested, troubleshot, and maintained IaC code for software systems and infrastructure.
- Promoted strong design principles as they relate to scalability, disposable resources, automation, loose coupling, databases, removing single points of failure, cache, cost, and security.
- Worked closely with lead engineers, software engineering managers and principals, they champion architectural decisions that facilitate the work of their teams and our business as a whole.
AWS Cloud Infrastructure Engineer
Alteryx
- Used the latest DevOps and cloud deployment techniques—Amazon EKS and AWS ECR, RDS, EC2, and S3—to create and integrate innovative new products into the company's platform.
- Built and deployed infrastructure as code using Terraform.
- Designed and tested high-security, leakproof back-end infrastructure to host multitenant and single-tenant deployments.
- Provided support to small-but-mighty engineering teams that utilized CI/CD principles in GitLab and Argo CD.
- Performed standard database administration duties using RDS, PostgreSQL, Snowflake, and other ANSI SQL platforms.
Senior DevOps Engineer
Pro Notary LLC
- Designed, improved, and monitored the cloud infrastructure on AWS and Azure.
- Developed and managed the infrastructure as code using Terraform.
- Created and maintained the CI/CD pipelines using Azure Pipelines, Docker, Kubernetes, Helm, and Argo CD.
- Set up and maintained monitoring and observability systems using ELK (Elastic Stack).
- Worked on configuration management and application deployment using Ansible, Python, and Bash.
DevOps Engineer
Neusoft
- Developed microservices and REST APIs using Python Django.
- Built and deployed Docker containers to break up a monolithic app into microservices, which improved the developer workflow and increased scalability.
- Automated the deployment, scaled, and managed the Docker containers using Kubernetes.
- Designed, improved, and monitored the cloud infrastructure on AWS, including IAM, VPC, EC2, S3, RDS, ElastiCache, SNS, SQS, ECR, and EKS.
- Created and maintained the CI/CD pipelines using Bitbucket, Jenkins, and CircleCI.
- Set up and maintained the monitoring and observability systems using ELK (Elastic Stack).
Experience
MLOps Platform on AWS using Kubeflow for Model Deployment
The code of an ML model, ML pipeline, infrastructure, and dependencies (all of which are stored and versioned in Git), as well as a dataset from a centralized feature store, are compiled into the model by ML Orchestrator to generate logs, metrics, alerts, and data for storage and analysis.
Continuous delivery and experimentation workflows are dynamically integrated. This setup allows us to use the same pipeline to initiate experiments manually and automatically. These are triggered from Git hooks and CI/CD tools for continuous integration and deployment.
Migrating AWS Infrastructure Using Gruntwork Reference Architecture
• Deployed and managed production-grade infrastructure, including the network topology, orchestration tools, databases, caches, load balancers, CI/CD pipeline, monitoring, alerting, log aggregation, etc.
• Defined and managed the infrastructure as code using Terraform, Terragrunt, and Gruntwork Service Catalog; also developed custom Terraform modules to meet the company's needs.
• Moved services from AWS ECS to AWS EKS without any downtime.
Education
Bachelor's Degree in Telecommunications Engineering
Shenyang Institute of Engineering - Shenyang, Liaoning, China
Certifications
AWS Certified Solutions Architect - Professional
Amazon Web Services
AWS Certified Machine Learning - Specialty
Amazon Web Services
Databricks Platform Administrator
Databricks
Certified Kubernetes Administrator (CKA)
The Linux Foundation
Certified Jenkins Engineer
Jenkins
HashiCorp Certified: Terraform Associate
HashiCorp
Docker Certified Associate
Mirantis, Inc.
Skills
Libraries/APIs
Terragrunt, Node.js
Tools
Terraform, Jenkins, Amazon EKS, Amazon Elastic Container Registry (ECR), AWS IAM, Amazon Virtual Private Cloud (VPC), Ansible, Helm, CircleCI, GitLab CI/CD, Vault, GitHub, AWS Fargate, Amazon Elastic Container Service (ECS), Grafana
Languages
Bash, Bash Script, Python, Snowflake
Paradigms
DevOps, Microservices, Azure DevOps
Platforms
Docker, Kubernetes, Linux, Amazon Web Services (AWS), Amazon EC2, Databricks, AWS Lambda, Kubeflow
Storage
Cloud Deployment, Amazon S3 (AWS S3), PostgreSQL, Datadog, Database Administration (DBA)
Frameworks
Django
Other
Argo CD, Amazon RDS, Cloud Infrastructure, DevOps Engineer, GitHub Actions, Infrastructure as Code (IaC), CI/CD Pipelines, Configuration Management, Amazon API Gateway, Gruntwork, Computer Science, Amazon Machine Learning, Prometheus
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring