Senior DevOps Engineer2019 - PRESENTPypestream, Inc.
Technologies: Kubernetes, Elasticsearch, Ceph, Jenkins, Ansible, Prometheus, Grafana
- Created lots of Jenkins pipelines with Groovy for deploying both infrastructure and applications in Kubernetes.
- Provided on-call support 24/7 and was responsible for dealing with any kind of operational issues that can come by.
- Deployed and upgraded well-known clusters and databases.
- Developed several solutions for backing up clusters and applications.
- Containerized multiple applications.
DevOps Consultant2018 - 2019Nezasa AG
Technologies: Amazon Web Services (AWS), Terraform, Jenkins, HAProxy, Kong, Heroku
- Modified the deployment of multiple parts of their infrastructure—such as HAProxy, MongoDB and Jenkins—to use Terraform.
- Fixed an issue in which cloning the production MongoDB to the staging one would take almost a day—resulting it being accomplished in much less time.
- Set up a better API gateway for their Heroku deployments.
- Managed their application's lifecycle, deploying new releases and hotfixes in staging and promoting them to production after all tests ran smoothly.
- Monitored live logs and reported major bugs caught in the production environment.
Lead DevOps2018 - 2019Audsat
Technologies: Amazon Web Services (AWS), Kubernetes, Elasticsearch, Fluentd, GoCD, Datadog, PagerDuty, Java
- Set up three Kubernetes clusters for development, staging, and production environments. The production cluster was set up as MultiAZ, with private topology, autoscaling, restrictions of the RBAC credentials per user, daily backups, and constant monitoring through Datadog and PagerDuty. As of today, I am the one responsible for guaranteeing the SLA of all the clusters.
- Established GoCD with custom Elastic Agents for deploying the company’s applications into all three Kubernetes clusters. The agents run within spot instances automatically provisioned by Kubernetes. All applications are containerized and deployed as Helm packages.
- Implemented automatic provisioning and renewal of Let’s Encrypt TLS certificates.
- Deployed Fluentd daemon sets for the collection of logs of all the applications and sent them to AWS- provisioned Elasticsearch clusters (one for each Kubernetes); also deployed Elasticsearch-curators for cleaning old logs.
- Set up the automatic monitoring of all Java applications deployed in the cluster by scraping Kubernetes pods with JMX ports exposed.
- Spearheaded project Navalis which is a web application intended to allow developers to deploy, monitor, and scale their applications in multiple Kubernetes clusters with ease. It is currently under development, designed with Go and Vue.js.
- Scaled Kubernetes up to 250 nodes to process batches within a few hours.
DevOps Engineer2017 - 2018TFG Co
Technologies: Amazon Web Services (AWS), PagerDuty, MongoDB, VyOS, Kubernetes, Helm, Jenkins, Elasticsearch, Datadog, Kafka, ZooKeeper, MirrorMaker, Burrow
- Worked in 24/7 on-call rotations.
- Deployed multiple MongoDB clusters for collecting data during a high-traffic event.
- Designed, in partnership with our data engineering team, a new Kafka cluster for the company that was inspired by Netflix’s way of orchestrating and monitoring Kafka. The cluster was entirely written with Terraform and Chef and had a few components deployed to Kubernetes with Helm charts (Confluent’s REST API, MirrorMaker, and Burrow). All of the components would scale and send health metrics to Datadog automatically.
- Developed a system for monitoring backups, consisting of a Python/Flask server and a client written in Go. The system would centralize all EBS and RDS snapshot statuses in a single place, along with other backups stored in S3 like GitLab and Redis. That was useful whenever a backup failed, triggering an alarm in Datadog and PagerDuty, alerting whoever was on-call of the failed backup.
- Created a redundant VPN between availability zones (US/AP) in AWS using VyOS.
- Developed a tool for cross-validating the Kubernetes network which would establish a route between every machine in Kubernetes generating a complete graph or pointing out issues in the network.
- Solved an issue with our Elasticsearch cluster which used to crash at the beginning of each day; it was caused by an excessive amount of shards and a bunch of misconfigured Logstash instances which would flood the cluster with requests when those shards were being created. Solved that issue by reducing the number of shards, increasing the batch size sent by Filebeat to Logstash and reducing the number of open connections from Logstash to Elasticsearch.
- Helped instrument our most important servers with Jaeger APM.
- Deployed a Kubernetes cluster with autoscaling as a proof-of-concept in order to test how well a Kafka cluster would scale within Kubernetes.
- Solved an issue in which our Kafka cluster would crash because of unexpected behavior of a tool someone had installed to monitor ZooKeper (Netflix Exhibitor).
- Deployed a Kubernetes cluster the hard way. (i.e., without any tools like Kops or Kubeadm) in order to learn deeper concepts of its architecture.
DevOps Engineer2017 - 2017MAV Technology
Technologies: BareMetal, Node.js, HAProxy, Consul, Datadog, MySQL, MongoDB, Ceph
- Centralized in an HAProxy cluster all incoming requests which didn’t have a proper entry point for the infrastructure (i.e., DNS pointed to lots of different entry points)—thus avoiding single points of failure.
- Fixed multiple bugs in Node.js servers, among them a critical one which forced us to restart production containers from time to time because of a progressive decay of performance.
- Solved multiple bugs in Objective-C servers by creating a system for debugging multiple servers in real-time, attaching multiple GDBs to multiple processes distributed amongst nodes and capturing eventual stack traces—allowing us to quickly fix bugs that would only occur in the production environment.
- Developed a Node.js server which would hold thousands of connections open as a fronting proxy for a legacy server which wasn’t able to receive too many simultaneous connections.
- Stopped an ongoing brute-force password attack, which I was able to detect because of an expressive increase in the number of failed authentications in DataDog. I was able to stop the attack by blocking the attacker’s IP addresses in HAProxy.
- Resolved a serious problem which would cause Ceph to crash. We traced the problem to a bug that was tied to the specific version of the software we were using.
Software Engineering Intern2015 - 2016Synopsys, Inc.
Technologies: Verilog, C++, Python, TCL, D3.js, EDA
- Developed a tool in Python for automatically generating C++ code which would bind hardware transactors written in C++ to TCL.
- Built a tool for extracting statistics from a hardware-emulating platform and generating D3.js charts.
- Fixed a major C++ bug caused by a racing condition between GTK and a hardware transactor.
- Worked for a month at Synopsys' headquarters in Mountain View where I learned a lot about electronic design automation.
Junior Back-end Engineer2012 - 2014MAV Technology
Technologies: C++, Lua, MongoDB, MySQL, Java, GWT, CakePHP, Bootstrap
- Developed a substantial part of a back-end of a corporate email service; it was written in C++ with language bindings to Lua. I utilized MongoDB for storing the email metadata, GridFS for storing their bodies, and MySQL for storing relational user data. Worked with REST interfaces in a monolithic architecture.
- Built-up part of their front end, written in Java and Google Web Toolkit.
- Constructed IMAP and POP3 proxies which would route new users coming from other email service providers to their old servers, while capturing their password and transparently migrating their accounts to our servers.
- Developed HTTP and SMTP servers from scratch with C++.
- Supported the development of the company’s ERP system; built with CakePHP and Bootstrap.