Head of Site Reliability Engineering | Consultant
2015 - PRESENTHazelOps- Built scalable infrastructures for startups: multi-environment, with infrastructure as code, self-healing, scalable, and predictable environments on AWS.
- Took care of the legacy code for dockerizing JVM, PHP, and Python apps.
- Analyzed and audited performance for dozens of full-cycle reports based on key factors of infrastructure performance and action items based on proposals.
- Helped software engineers implement DevOps, including close communication, strategy, and processes improvement.
- Instrumented site reliability practices by owning SLA, SLO, SLIs, eliminating toil, and increasing observability—automation, monitoring, and error budgeting.
- Implemented CI/CD, facilitating a streamlined deployment pipeline for dozens of different projects, including GitLab, Jenkins, and CircleCI. Utilized Docker, registry, and multi-stage builds.
- Created OPS procedures in customers' environments, including service-based alerting, on-call rotation, and escalations.
- Deployed and maintained Apache Kafka, including full-cycle management via Terraform, Ansible, and Docker.
Technologies: AWS ECS, AWS, AWS DevOps, GNU Make, Amazon Web Services (AWS), Grafana, Traefik, HAProxy, Python, WordPress, PHP, Java, Serverless, ECS, Docker Swarm, Docker, Ansible, AWS CloudFormation, Terraform, NginxLead Site Reliability Engineer
2016 - 2019Flo Technologies- Designed and executed a complex IoT infrastructure from scratch on AWS: multi-tier, multi-subnet scalable cloud AWS infrastructure, multi-application stateless stack with Elastic Beanstalk and ECS and Docker, platform-agnostic local workspaces with Docker.
- Created and administered Ansible infrastructure: idempotent plays and roles to support infrastructure needs and wrote community-available roles for multiple platforms under Apache Foundation.
- Designed and implemented CI/CD: complete application lifecycle with green deployments of high-traffic services, platform-agnostic framework to support SaaS or hosted CI servers, and hassle-free pipelines for software engineers.
- Constructed and administered monitoring solutions: log and data aggregation from multiple sources (ELK), on-prem monitoring via TICK, Grafana. SaaS monitoring with Datadog and New Relic when needed.
- Devised and executed operational procedures: service-oriented OLA, Pagerduty with monitoring solutions, and Pagerduty "Service Owner First" policy.
- Created and maintained an upgrade procedure for critical distributed systems to allow no-downtime and no-data loss upgrades for the whole three-year time span.
Technologies: AWS DevOps, GNU Make, Amazon Web Services (AWS), Transport Layer Security (TLS), Linux, CircleCI, Docker, TICK Stack, ELK (Elastic Stack), GitLab, Apache Kafka, Ansible, AWS CloudFormation, AWSSenior Member of Technical Staff
2016 - 2017Delphix- Architected and implemented multi-tier hybrid cloud AWS infrastructure for a new project for a high-scale testing framework.
- Constructed log and data aggregation from multiple sources (ELK).
- Created a virtual and bare-metal host provisioning system (Foreman).
- Designed and implemented Nmap-based inventory software.
- Contributed to company-wide IT processes and improvements.
- Came up with major portions to on-call rotation, monitoring, SOA, and OLA designs and implementations.
Technologies: AWS DevOps, Amazon Web Services (AWS), Python, AWS CloudFormation, Foreman, Ansible, ELK (Elastic Stack), Jenkins, AWSSenior DevOps Engineer
2013 - 2016Intuit- Managed a hybrid cloud with around 300 nodes: AWS, VMware, and bare metal.
- Implemented automation, config management, and provisioning: 90% of the environment is in Puppet and Git.
- Managed the lifecycle of legacy systems. .NET, C#, and automation of manually deployed systems.
- Provided CI in configuration management and IaaC: GitFlow, reusable code, and open-source contribution.
- Managed and mentored junior IT staff, including separation of concerns and easy onboarding.
- Led most of the post-acquisition infrastructure integration projects.
Technologies: AWS DevOps, Amazon Web Services (AWS), Foreman, Git, TeamCity, ELK (Elastic Stack), Puppet, AWSDevOps Engineer
2011 - 2013Docstoc (Acquired by Intuit)- Supported colocation with 180+ Windows and Linux dedicated servers as well as new server deployment.
- Managed network security and performance (Juniper SSG, SRX Firewalls, A10 networks Load Balancer, Radius, IPsec, NAT, Amazon EC2 VPC).
- Implemented proactive monitoring using Nagios, ELK, and New Relic.
- Optimized Linux and Windows server performance for high scale.
- Deployed and maintained on-premise MySQL databases.
- Introduced and implemented ELK stack, Elasticsearch, Logstash, Kibana.
Technologies: Amazon Web Services (AWS), AWS, AWS DevOps, Nagios, Bash, Python, MongoDB, MySQL, LB, Juniper