Alex Zhang, Developer in Toronto, ON, Canada
Alex is available for hire
Hire Alex

Alex Zhang

Verified Expert  in Engineering

DevOps Engineer and AWS Developer

Toronto, ON, Canada

Toptal member since October 4, 2021

Bio

Alex is a DevOps engineer with over 10 years of experience in cloud infrastructure, infrastructure as code (IaC), containerization, CI/CD, monitoring and observability, configuration management, and shell scripting. His technologies include AWS, Terraform, Terragrunt, Kubernetes, Argo CD, GitHub, GitLab, Datadog, Prometheus, Grafana, Ansible, Python, and Linux. Alex learns fast and continuously welcomes a challenge.

Portfolio

Furniture
Amazon Web Services (AWS), DevOps, Infrastructure as Code (IaC), Terraform...
Hub International
Kubernetes, Terraform, Amazon Web Services (AWS)
Sweetgreen
Terraform, Amazon Web Services (AWS), CI/CD Pipelines, Node.js, GitHub...

Experience

  • Linux - 10 years
  • Python - 10 years
  • Amazon Web Services (AWS) - 10 years
  • Kubernetes - 8 years
  • Terraform - 5 years
  • GitHub Actions - 3 years
  • Argo CD - 3 years
  • Databricks - 3 years

Availability

Full-time

Preferred Environment

Amazon Web Services (AWS), Terraform, Kubernetes, Python, Linux, GitHub, Datadog, Databricks

The most amazing...

...project I've implemented is an ML operations project on AWS using Kubeflow for model deployment.

Work Experience

Senior Infrastructure Engineer

2022 - 2024
Furniture
  • Designed and managed AWS infrastructure that ensured scalability, availability, and security, supporting millions of users.
  • Built and maintained infrastructure as code using Terraform, Terragrunt, AWS CloudFormation, and AWS CDK, automating the provisioning and lifecycle of AWS resources.
  • Led efforts to deploy and maintain Kubernetes clusters, optimizing for resource efficiency, scalability, and high availability.
  • Developed and maintained a robust CI/CD pipeline with GitHub Actions and Argo CD, enabling seamless deployment of containerized microservices to Kubernetes clusters.
  • Created and enhanced monitoring and alerting solutions with Datadog, AWS CloudWatch, and other tools, proactively identifying and resolving performance issues.
  • Partnered with security teams to meet security standards and comply with regulations such as CIS and SOC 2.
  • Supported the data engineering team by designing and implementing a scalable, secure data analytics and processing platform on Databricks.
  • Collaborated closely with development, QA, and other teams to optimize CI/CD pipelines, accelerate deployments, and ensure code quality.
  • Produced and maintained detailed documentation of infrastructure, workflows, and configurations to support knowledge sharing and best practices.
  • Diagnosed and resolved complex system issues, troubleshooted network problems, and provided technical support as needed.
Technologies: Amazon Web Services (AWS), DevOps, Infrastructure as Code (IaC), Terraform, Terragrunt, CI/CD Pipelines, GitHub Actions, Datadog, Python, Linux, Databricks

Kubernetes and Terraform Specialist

2022 - 2022
Hub International
  • Deployed and managed the production-grade infrastructure, including the network topology, orchestration tools, databases, caches, load balancers, CI/CD pipeline, monitoring, alerting, log aggregation, etc.
  • Defined and managed infrastructure as code using Terraform, Terragrunt, and Gruntwork Service Catalog; also developed custom Terraform modules to meet the company's needs.
  • Moved services from AWS ECS (an old AWS account) to AWS EKS without downtime.
Technologies: Kubernetes, Terraform, Amazon Web Services (AWS)

Site Reliability Engineer

2022 - 2022
Sweetgreen
  • Built, tested, troubleshot, and maintained IaC code for software systems and infrastructure.
  • Promoted strong design principles as they relate to scalability, disposable resources, automation, loose coupling, databases, removing single points of failure, cache, cost, and security.
  • Worked closely with lead engineers, software engineering managers and principals, they champion architectural decisions that facilitate the work of their teams and our business as a whole.
Technologies: Terraform, Amazon Web Services (AWS), CI/CD Pipelines, Node.js, GitHub, AWS Fargate, AWS Lambda, Docker, Amazon API Gateway, Terragrunt, Amazon Elastic Container Service (ECS), CircleCI, Gruntwork

AWS Cloud Infrastructure Engineer

2021 - 2021
Alteryx
  • Used the latest DevOps and cloud deployment techniques—Amazon EKS and AWS ECR, RDS, EC2, and S3—to create and integrate innovative new products into the company's platform.
  • Built and deployed infrastructure as code using Terraform.
  • Designed and tested high-security, leakproof back-end infrastructure to host multitenant and single-tenant deployments.
  • Provided support to small-but-mighty engineering teams that utilized CI/CD principles in GitLab and Argo CD.
  • Performed standard database administration duties using RDS, PostgreSQL, Snowflake, and other ANSI SQL platforms.
Technologies: Terraform, Docker, Kubernetes, Helm, Argo CD, Vault, Bash, Bash Script, Amazon EKS, Amazon Web Services (AWS), DevOps, Cloud Deployment, Amazon Elastic Container Registry (ECR), Amazon RDS, Amazon EC2, Amazon S3 (AWS S3), Infrastructure as Code (IaC), Cloud Infrastructure, CI/CD Pipelines, GitLab CI/CD, Database Administration (DBA), PostgreSQL, Snowflake, DevOps Engineer

Senior DevOps Engineer

2018 - 2021
Pro Notary LLC
  • Designed, improved, and monitored the cloud infrastructure on AWS and Azure.
  • Developed and managed the infrastructure as code using Terraform.
  • Created and maintained the CI/CD pipelines using Azure Pipelines, Docker, Kubernetes, Helm, and Argo CD.
  • Set up and maintained monitoring and observability systems using ELK (Elastic Stack).
  • Worked on configuration management and application deployment using Ansible, Python, and Bash.
Technologies: Terraform, Ansible, Kubernetes, Docker, Argo CD, Bash, Infrastructure as Code (IaC), Helm, CI/CD Pipelines, DevOps, Bash Script, Azure DevOps, Cloud Infrastructure, Configuration Management, DevOps Engineer, Python, Amazon Web Services (AWS), Amazon EKS

DevOps Engineer

2015 - 2018
Neusoft
  • Developed microservices and REST APIs using Python Django.
  • Built and deployed Docker containers to break up a monolithic app into microservices, which improved the developer workflow and increased scalability.
  • Automated the deployment, scaled, and managed the Docker containers using Kubernetes.
  • Designed, improved, and monitored the cloud infrastructure on AWS, including IAM, VPC, EC2, S3, RDS, ElastiCache, SNS, SQS, ECR, and EKS.
  • Created and maintained the CI/CD pipelines using Bitbucket, Jenkins, and CircleCI.
  • Set up and maintained the monitoring and observability systems using ELK (Elastic Stack).
Technologies: Terraform, Docker, Kubernetes, Jenkins, CircleCI, Microservices, CI/CD Pipelines, Amazon Web Services (AWS), Bash Script, Amazon EKS, Cloud Infrastructure, AWS IAM, Amazon EC2, Amazon Virtual Private Cloud (VPC), Amazon S3 (AWS S3), Amazon RDS, Amazon Elastic Container Registry (ECR), DevOps, DevOps Engineer

MLOps Platform on AWS using Kubeflow for Model Deployment

The project mainly focused on building and deploying the marketing attribution model on AWS using Kubeflow pipelines in Python.

The code of an ML model, ML pipeline, infrastructure, and dependencies (all of which are stored and versioned in Git), as well as a dataset from a centralized feature store, are compiled into the model by ML Orchestrator to generate logs, metrics, alerts, and data for storage and analysis.

Continuous delivery and experimentation workflows are dynamically integrated. This setup allows us to use the same pipeline to initiate experiments manually and automatically. These are triggered from Git hooks and CI/CD tools for continuous integration and deployment.

Migrating AWS Infrastructure Using Gruntwork Reference Architecture

DELIVERABLES
• Deployed and managed production-grade infrastructure, including the network topology, orchestration tools, databases, caches, load balancers, CI/CD pipeline, monitoring, alerting, log aggregation, etc.
• Defined and managed the infrastructure as code using Terraform, Terragrunt, and Gruntwork Service Catalog; also developed custom Terraform modules to meet the company's needs.
• Moved services from AWS ECS to AWS EKS without any downtime.
2011 - 2015

Bachelor's Degree in Telecommunications Engineering

Shenyang Institute of Engineering - Shenyang, Liaoning, China

MAY 2024 - MAY 2027

AWS Certified Solutions Architect - Professional

Amazon Web Services

MAY 2024 - MAY 2027

AWS Certified Machine Learning - Specialty

Amazon Web Services

APRIL 2024 - APRIL 2025

Databricks Platform Administrator

Databricks

JULY 2021 - JULY 2024

Certified Kubernetes Administrator (CKA)

The Linux Foundation

JUNE 2021 - PRESENT

Certified Jenkins Engineer

Jenkins

MAY 2021 - MAY 2023

HashiCorp Certified: Terraform Associate

HashiCorp

APRIL 2021 - APRIL 2023

Docker Certified Associate

Mirantis, Inc.

Libraries/APIs

Terragrunt, Node.js

Tools

Terraform, Jenkins, Amazon EKS, Amazon Elastic Container Registry (ECR), AWS IAM, Amazon Virtual Private Cloud (VPC), Ansible, Helm, CircleCI, GitLab CI/CD, Vault, GitHub, AWS Fargate, Amazon Elastic Container Service (ECS), Grafana

Languages

Bash, Bash Script, Python, Snowflake

Paradigms

DevOps, Microservices, Azure DevOps

Platforms

Docker, Kubernetes, Linux, Amazon Web Services (AWS), Amazon EC2, Databricks, AWS Lambda, Kubeflow

Storage

Cloud Deployment, Amazon S3 (AWS S3), PostgreSQL, Datadog, Database Administration (DBA)

Frameworks

Django

Other

Argo CD, Amazon RDS, Cloud Infrastructure, DevOps Engineer, GitHub Actions, Infrastructure as Code (IaC), CI/CD Pipelines, Configuration Management, Amazon API Gateway, Gruntwork, Computer Science, Amazon Machine Learning, Prometheus

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring