Verified Expert in Engineering
DevOps Engineer and Developer
Robert is a senior infrastructure and DevOps engineer with over 20 years of experience in Unix/Linux system administration, server automation, programming, development, storage area networks, networking, and security. In addition, for the past seven years, he has worked as a SAN administrator. Robert is exceptionally professional and joined Toptal to work on innovative projects with exceptional talents.
Amazon Web Services (AWS), Kubernetes, Docker, Linux, Containers, OpenShift, Azure, Google Cloud Platform (GCP), Helm, Terraform
The most amazing...
...thing I've developed is a custom Kubernetes controller for interaction between Kubernetes services and Cloudflare API in Go.
AWS DevOps - Cloud Infra
Blue River Technology - Computer Vision and Machine Learning
- Managed resources in AWS cloud using an IaaC first approach. The enterprise environment consisted of dedicated CI/CD and application environments, all of the set up in separate AWS accounts, ensuring a separation between CI/CD and application accounts.
- Set up cross-account IAM roles and policies, created and updated AWS resources such as EKS, ECS, Kinesis, Fargate, Lambda functions, Transit gateways, IAM roles and policies, Athena, and others.
- Upgraded EKS clusters from version 1.18 to 1.21 without disruption of service. Performed analysis of containers that would be impacted by the rollout. Managed EKS cluster node pools with multi-type CPU/GPU instances used for machine learning jobs.
- Upgraded all CI/CD pipelines' Terraform version to 1.2.4, updated states outside of CI/CD, and performed state file manipulations required to return the job back to the automated CI/CD pipeline.
- Updated the RDS instance of MySQL to version 8 and performed related state manipulations.
- Built and updated helm charts used for internal applications as well as external helm charts. Managed Kustomize code used by Kubeflow.
- Created a complete Terraform/Terragrunt IaaC code for the Databricks ecosystem. The code was divided into individual modules that were deployed by Terragrunt either on the Databricks account or workspace level. Debugged notebook problems in Databricks.
- Built Jenkins pipelines to build and deploy one of the internal applications. Utilized technologies like Kaniko to avoid security issues when using DockerInDocker. Followed best industry practices such as artifact promotion, immutability, and versioning.
- Migrated CI/CD pipelines from Jenkins to GitHub actions using reusable pipeline code blocks.
- Performed troubleshooting on hanging or unexpectedly failing Kubeflow jobs and worked with users on fixes. Introduced API key rotation by extending the application API for one of the key in-house applications written in Python.
- Troubleshot out of memory (OOM) kills of application containers in production Kubernetes cluster running in Digital Ocean. Performed a deep root cause analysis of application outages in a production environment.
- Created custom metrics for Prometheus to show exactly the same metrics as used by Linux kernel when OOM killer is invoked.
- Traced gRPC application calls using Wireshark. Set up ephemeral debugging containers to collect TCP dumps from all application containers. Attempted to recreate gRPC calls between all application layers using collected data.
- Troubleshot and upgraded system-level applications and their respective helm charts – Istio, ingress controller, external DNS, and certificate manager.
- Built Terraform code and Arm templates for resources to be provisioned in Azure cloud.
- Worked on architecture designs for clients and educated them on best practices. Configured network links between the client's data centers and Azure and configured landing zones and network security.
- Worked on an internal project, "Documentation as a code," where project documentation is created using the information gathered from cloud deployments. Migrated the deployment from virtual machines to containers.
St. Jude Children's Research Hospital
- Developed IaaC using Terragrunt/Terraform while applying a fully modularized approach and keeping code DRY. Deployed a Terraform operator for integrated deployment of application components that are not part of Kubernetes. Modified operator code.
- Designed and deployed an Nginx ingress controller with ModSecurity WAF module. Created automation gathering data that updated blacklists from internet vendors and applied those to the running configuration. Used Reloader to perform the rolling update.
- Developed an Azure Node.js function to transfer Kubernetes control plane logs to Splunk through Fluentd while augmenting them with additional information and tags.
- Developed CI/CD pipelines using Tekton pipelines while using the Lighthouse as a GitHub web hook and git chat operator. Extended the Lighthouse code to utilize Kustomize for provisioning and customizing source code versioned pipelines.
- Implemented OAuth2 authentication and authorization for services running inside Kubernetes using OAuth2-proxy.
- Developed a Cloudflare controller, which maintained the configuration of Argo tunnels between Cloudflare and Kubernetes clusters based on ingress resource annotations.
- Developed a central operations dashboard in Splunk governing Kubernetes and Azure resources.
- Developed the helm chart unit test while utilizing test hooks for automated helm deployment testing.
- Built and maintained CI/CD pipelines using GitHub actions. Deployed Keel operator as a tool for CD deployment from GitHub to Kubernetes.
- Built JenkinsX CI/CD pipeline for container builds and deployments with a GitOps-controlled pipeline following DevOps best practices.
- Developed an infrastructure as a code using Terragrunt with Terraform while applying a fully modularized approach. Built networking infrastructure, EKS (Kubernetes) clusters, and relational databases in AWS as a code.
- Developed and deployed Helm charts to Kubernetes using Helmfile as a declarative configuration for deploying distributions of Helm charts while adhering to 12-factor application principles.
- Containerized the application. Deployed infrastructure containers as an external DNS, used cert-manager and Ingress Controller to automate DNS name registration, and created auto SSL certificate provisioning and assigned them to external endpoints.
- Load-tested the application with JMeter using different scenarios, ensuring that it met the requirements for a number of concurrent users performing various workflows. Worked with developers to identify and address bottlenecks.
- Developed CI/CD pipelines using serverless Jenkins X controlled by a GitOps build. Deployed in Cloud Native.
- Developed infrastructure as code using Terraform combining third-party modules with client-specific code. Deployed the infrastructure in the Amazon cloud, debugged all issues, and wrote the deployment documentation.
- Developed a Python script to automate the input value file build and a Terraform module and workspace initialization.
- Created a local Minikube environment to imitate the AWS cloud environment with a local dynamic DNS server.
- Used MetalLB as the load balancer, an external DNS for DNS record updates, and cert-manager for auto-provisioning SSL certificates to deploy Helm chart to the cloud and locally.
- Built Helm charts for the deployment of Kubernetes services as an EFS persistent volume provisioner, Ingress Controller, external DNS, as well as charts for JupyterHub and client-specific services interconnected with JupyterHub.
- Deployed a certification manager configured with Let's Encrypt SSL certificates for DNS domains and their ongoing management.
- Developed Terraform code for a Kubernetes infrastructure build. Managed it using Kops for Kubernetes. Debugged issues and integrated Kubernetes with ECR and Amazon Route 53.
- Designed a Helm chart and created an in-house built application using Golang. Debugged and troubleshot gRPC.
- Developed Helm charts for infrastructure services inside Kubernetes for dynamic DNS names in Amazon Route 53. Created an SSL certificate on demand for Ingress endpoints and a persistent volume controller for EFS and Ingress Controller.
- Built a Jenkins X CI/CD pipeline for container builds and deployments with a GitOps-controlled pipeline.
- Troubleshot Istio installation and Istio upgrade to the latest version.
International Financial Data Services (IFDS)
- Designed, installed, and troubleshot RedHat OpenShift cluster, migration from version 3.4 to 3.11.
- Created dynamic Jenkins CI/CD pipelines running all the master and worker nodes as containers within an OpenShift cluster where each stage was represented by its own parameterized docker image tailored for a specific purpose.
- Designed and installed Hashicorp vault for secret management. Secrets required for running the application are retrieved from a vault at a container's startup time and automatically renewed during the application's lifecycle.
- Developed health check liveliness and readiness probes for a JBoss cluster (Wildfly) to automate OpenShift corrective actions when the node hosting pod is under memory pressure or high load, tested probes by introducing network failures.
- Debugged Helm tiller code—identifying issues with security context constraint deployment in RedHat OpenShift version 3.4.
- Built Docker images with built-in configuration initialization at startup, EGM Nexus docker image with Groovy scripts to fully initialize configuration upon the first startup and retain it on subsequent starts of the same container.
- Deployed a Helm API to be used as means of deploying helm charts by third-party applications. Expanded Heketi Go code that handles volume deletion, deleting volume snapshots prior to the volume itself.
- Implemented MetalLB with dynamic DNS as the auto-provisioning solution for load balancer on bare metal.
- Implemented Helm deployment profiling, identifying system bottlenecks during larger deployments. Debugged and profiled OpenShift performance by identifying performance bottlenecks on bare metal.
- Created a Helm deployment chart for automated storage provisioner to auto-provision storage in Minishift. Installed and configured an automated Gluster provisioner with Heketi using storage classes based provisioning.
- Developed infrastructure as a code using Terraform with a modular approach, deployed infrastructure in Amazon cloud, performed migration of on-premise resources into the cloud and debugged all migration issues in the cloud.
- Built Hashicorp packer code for OpenVPN AMI with user authentication against AWS accounts.
- Developed Helm charts for the containerized version of the application running on-premise, deployed it to EKS Kubernetes.
- Deployed OpenVPN into EKS with self-service ca certification authority using corporate central authentication.
- Deployed a certification manager configured with Let's Encrypt SSL certificates for DNS domains and their ongoing management.
- Developed Grafana dashboards for all containerized environments with the intention for dashboards to dynamically scale with the environments without the need for any dashboard code changes. Used templated dashboards as well as boom table panels.
- Wrote customized Prometheus queries to retrieve data. Made changes to Prometheus collectors and filters, ensuring all relevant data are passed into Grafana.
- Handled the Rancher creation of services, deployment of services, and troubleshooting of issues.
- Made changes to Docker images allowing to see host-level disk devices and gather their metrics by Prometheus.
- Built, deployed, and managed eight large Kubernetes clusters for development, user acceptance testing (UAT), and production environments with 25 nodes per cluster and load-based horizontal autoscaling.
- Implemented Jenkins as a continuous delivery tool using Groovy, DSL, pipelines, and Kubernetes running Jenkins slaves on demand.
- Provisioned AWS and Azure services and resources using Terraform: EC2, EBS, S3, VPC, Auto Scaling, Cloud Formation, Elastic Load Balancing, RDS, Route 53, Memcache, Redis, OpsWorks, CloudWatch, CloudTrail, Identity and Access Management (IAM).
- Provisioned AWS and Azure services and resources using Terraform: SQS, Redshift, Lambda Functions, Beanstalk, Batch, Elastic Container Service, Fargate, and Firehose.
- Migrated VMware servers into AWS and Azure Cloud, using VMware OVF and Hypervisor VHD images.
- Designed Chef cookbooks to manage configurations and automate the installation process using the OpsWorks framework.
- Build resources on GCP to utilize machine learning capabilities of the platform.
- Deployed central logging to a Kibana server using Elasticsearch as a storage engine.
- Configured a Gluu IAM server as a user authentication gateway for Kubernetes using OpenID.
- Built a NeoDB database for all the resources in AWS, all the repositories in GitHub, all jobs in Jenkins, and all containers in Kubernetes and their relationships with continuous updates using Mercator code as a framework.
- Developed TOSCA blueprints and also deployed them to OpenStack.
- Built Jenkins CI/CD pipelines in Groovy to automate code releases and updates of Confluence pages through a REST API.
- Wrote Python scripts to interact with APIs for automated deployments.
- Created a JBoss blueprint; also installed, configured and troubleshot JBoss.
DevOps Consultant | Architect
Bank of Montreal
- Designed, deployed, and configured Red Hat Satellite Server 6 (RHS6) and migrated 670 servers from RHS5 to RHS6.
- Designed and built a Puppet infrastructure for configuration management of 700+ servers.
- Created provisioning processes utilizing RHS6 with Puppet and coded supporting build classes and facts.
- Managed Solaris and Red Hat enterprise servers (700+ servers).
- Performed a difficult root cause analysis and debugged problems on the level of system library calls.
Resmor Trust (Royal Bank of Canada)
- Managed RHEL Linux servers, Hitachi modular storage, Hitachi Virtual Storage Platform (VSP), Brocade FC switches, VMware ESX servers, and F5 load balancers.
- Designed, installed, and configured Hitachi AMS 2300 and associated SAN components.
- Built a Kickstart server for automated Linux builds with a PXE boot.
- Designed and implemented the encapsulation of Oracle RAC cluster into a Linux HA cluster ensuring high availability of all Oracle failover components and ETL applications.
- Migrated Oracle RAC 11G to a new Hitachi storage frame.
Senior Unix, SAN Consultant
Intria Items | CIBC
- Managed Solaris, HP-UX, AIX, and Linux servers (200+ servers).
- Planned and executed a physical storage migration (25 terabytes) from Montreal to Markham with minimal downtime.
- Worked on the application design and implementation using virtualization technologies as LDOMs, Solaris zones, and HP VMware.
- Managed and implemented improvements to cluster technologies such as HP ServiceGuard and Veritas cluster server.
Cloudflare Controller for Kubernetes
Streamlined Application Deployment to Kubernetes
A Jenkins Automated Job Creation Framework with Kubernetes Containers as Slaves
Bash, Java, Perl, C, Python, Groovy, HTML, Go
Terragrunt, Jenkins Job DSL, OpenID, GitHub API, Node.js
Helm, Amazon EKS, Terraform, Chef, Puppet, Jenkins, Veritas Cluster Server, Amazon Elastic Container Service (Amazon ECS), GitLab CI/CD, Google Kubernetes Engine (GKE), Azure Kubernetes Service (AKS), AWS Fargate, VPN, Amazon Virtual Private Cloud (VPC), GitHub, Vault, OpenVPN, Kibana, Grafana, VMware, Google Compute Engine (GCE), Git, HashiCorp, Packer, Red Hat Satellite, Hitachi HPLC, SonarQube, Apache JMeter, Istio, Ansible, GitLab
DevOps, Continuous Integration (CI), DevSecOps, Azure DevOps, Microservices
Docker, Linux, Kubernetes, Solaris, HP-UX, Azure, Amazon Web Services (AWS), Google Cloud Platform (GCP), AWS Lambda, Red Hat Linux, OpenShift, Oracle, Red Hat OpenShift, Azure Functions, Hortonworks Data Platform (HDP), WebSphere, JBoss, OpenStack, Rancher, Nexus, AWS Cloud Computing Services, AIX, Apache Kafka
DNS, CI/CD Pipelines, Kubernetes Operations (kOps), Autoscaling, Containers, Infrastructure, Containerization, Infrastructure as Code (IaC), Amazon RDS, AWS DevOps, Site Reliability Engineering (SRE), Load Balancers, Cloud, Networks, Groovy Scripting, Networking, Monitoring, Cloudflare, Cloud Architecture, Kustomize, Storage, IBM SAN, SAN Brocade, SAN Switches, LDoms, Prometheus, AWS Certificate Manager, Web Application Firewall (WAF), Argo CD, ARM, Debugging, AWS Cloud Architecture, Azure Databricks, Electronics, Computer, Architecture
Hadoop, CFEngine, Blueprint, OAuth 2, gRPC
MySQL, PostgreSQL, Database Management, Redis, GlusterFS, Cassandra, Elasticsearch, Datadog
Master's Degree in Computers and Electronics
Brno University of Technology, Department of Computers and Electronics - Brno, Czech Republic
Certified Kubernetes Security Specialist
Cloud Native Computing Foundation
Certified Kubernetes Administrator
Cloud Native Computing Foundation
Hitachi Data Systems Certified Professional
Sun Solaris 10 Certified System Administrator
Oracle Certified Professional
Sun Solaris 9 Certified System Administrator
HP-UX 11 Certified System Administrator
Check Point Firewall I Administrator
HP-UX 10.20 Certified System Administrator