Head of Site Reliability Engineering | Consultant2015 - PRESENTHazelOps
Technologies: AWS ECS, AWS, AWS DevOps, GNU Make, Amazon Web Services (AWS), Grafana, Traefik, HAProxy, Python, WordPress, PHP, Java, Serverless, ECS, Docker Swarm, Docker, Ansible, AWS CloudFormation, Terraform, Nginx
- Built scalable infrastructures for startups: multi-environment, with infrastructure as code, self-healing, scalable, and predictable environments on AWS.
- Took care of the legacy code for dockerizing JVM, PHP, and Python apps.
- Analyzed and audited performance for dozens of full-cycle reports based on key factors of infrastructure performance and action items based on proposals.
- Helped software engineers implement DevOps, including close communication, strategy, and processes improvement.
- Instrumented site reliability practices by owning SLA, SLO, SLIs, eliminating toil, and increasing observability—automation, monitoring, and error budgeting.
- Implemented CI/CD, facilitating a streamlined deployment pipeline for dozens of different projects, including GitLab, Jenkins, and CircleCI. Utilized Docker, registry, and multi-stage builds.
- Created OPS procedures in customers' environments, including service-based alerting, on-call rotation, and escalations.
- Deployed and maintained Apache Kafka, including full-cycle management via Terraform, Ansible, and Docker.
Lead Site Reliability Engineer2016 - 2019Flo Technologies
Technologies: AWS DevOps, GNU Make, Amazon Web Services (AWS), Transport Layer Security (TLS), Linux, CircleCI, Docker, TICK Stack, ELK (Elastic Stack), GitLab, Apache Kafka, Ansible, AWS CloudFormation, AWS
- Designed and executed a complex IoT infrastructure from scratch on AWS: multi-tier, multi-subnet scalable cloud AWS infrastructure, multi-application stateless stack with Elastic Beanstalk and ECS and Docker, platform-agnostic local workspaces with Docker.
- Created and administered Ansible infrastructure: idempotent plays and roles to support infrastructure needs and wrote community-available roles for multiple platforms under Apache Foundation.
- Designed and implemented CI/CD: complete application lifecycle with green deployments of high-traffic services, platform-agnostic framework to support SaaS or hosted CI servers, and hassle-free pipelines for software engineers.
- Constructed and administered monitoring solutions: log and data aggregation from multiple sources (ELK), on-prem monitoring via TICK, Grafana. SaaS monitoring with Datadog and New Relic when needed.
- Devised and executed operational procedures: service-oriented OLA, Pagerduty with monitoring solutions, and Pagerduty "Service Owner First" policy.
- Created and maintained an upgrade procedure for critical distributed systems to allow no-downtime and no-data loss upgrades for the whole three-year time span.
Senior Member of Technical Staff2016 - 2017Delphix
Technologies: AWS DevOps, Amazon Web Services (AWS), Python, AWS CloudFormation, Foreman, Ansible, ELK (Elastic Stack), Jenkins, AWS
- Architected and implemented multi-tier hybrid cloud AWS infrastructure for a new project for a high-scale testing framework.
- Constructed log and data aggregation from multiple sources (ELK).
- Created a virtual and bare-metal host provisioning system (Foreman).
- Designed and implemented Nmap-based inventory software.
- Contributed to company-wide IT processes and improvements.
- Came up with major portions to on-call rotation, monitoring, SOA, and OLA designs and implementations.
Senior DevOps Engineer2013 - 2016Intuit
Technologies: AWS DevOps, Amazon Web Services (AWS), Foreman, Git, TeamCity, ELK (Elastic Stack), Puppet, AWS
- Managed a hybrid cloud with around 300 nodes: AWS, VMware, and bare metal.
- Implemented automation, config management, and provisioning: 90% of the environment is in Puppet and Git.
- Managed the lifecycle of legacy systems. .NET, C#, and automation of manually deployed systems.
- Provided CI in configuration management and IaaC: GitFlow, reusable code, and open-source contribution.
- Managed and mentored junior IT staff, including separation of concerns and easy onboarding.
- Led most of the post-acquisition infrastructure integration projects.
DevOps Engineer2011 - 2013Docstoc (Acquired by Intuit)
Technologies: Amazon Web Services (AWS), AWS, AWS DevOps, Nagios, Bash, Python, MongoDB, MySQL, LB, Juniper
- Supported colocation with 180+ Windows and Linux dedicated servers as well as new server deployment.
- Managed network security and performance (Juniper SSG, SRX Firewalls, A10 networks Load Balancer, Radius, IPsec, NAT, Amazon EC2 VPC).
- Implemented proactive monitoring using Nagios, ELK, and New Relic.
- Optimized Linux and Windows server performance for high scale.
- Deployed and maintained on-premise MySQL databases.
- Introduced and implemented ELK stack, Elasticsearch, Logstash, Kibana.