Caíque França, Developer in Belo Horizonte - State of Minas Gerais, Brazil
Caíque is available for hire
Hire Caíque

Caíque França

Platform Engineer and Developer

Belo Horizonte - State of Minas Gerais, Brazil

Toptal member since May 14, 2026

Bio

Caíque is a senior platform engineer with 14+ years of experience designing reliable, scalable cloud infrastructure. He specializes in AWS, Kubernetes, and Terraform, with deep expertise in DevOps, SRE, and observability tools such as Datadog, Prometheus, and Grafana. He's delivered high-impact platforms in international environments where uptime, automation, and developer experience drive business success.

Portfolio

TradeWeb Markets
Terraform, Kubernetes, Helm, Argo CD, GitLab CI/CD, Prometheus, Grafana...
babelforce
GitLab CI/CD, Prometheus, Grafana, OpenTelemetry, Kubernetes, Helm, Argo CD...
Venmo
Datadog, GitHub, Terraform, Python 3, PagerDuty, Jira, Confluence, Amazon EC2...

Experience

  • Linux - 14 years
  • Observability - 12 years
  • Bash Script - 12 years
  • Docker - 7 years
  • GitHub - 7 years
  • Kubernetes - 7 years
  • CI/CD Pipelines - 6 years
  • Terraform - 5 years

Preferred Environment

Linux, Docker, Kubernetes, GitHub, GitLab, CI/CD Pipelines, Terraform, Observability, Bash Script, Python 3

The most amazing...

...project I've delivered was a custom VPN observability platform that turned 59% user dissatisfaction into a permanent home-office model for a major enterprise.

Work Experience

Senior Platform Engineer

2023 - PRESENT
TradeWeb Markets
  • Leveraged Argo CD for application deployment management on Kubernetes clusters.
  • Enhanced existing Terraform code to improve infrastructure-as-code.
  • Developed tests for Kubernetes clusters using Open Policy Agent (OPA) and Conftest.
  • Conducted a proof of concept for Linkerd on EKS and Rafay Kubernetes clusters.
  • Managed tasks involving AWS services, RDS, and Logstash.
  • Improved logging for Jira and Confluence to support internal teams.
  • Created Python code to integrate PagerDuty contacts with Grafana.
  • Worked on GitLab CI pipelines for build and deployment workflows.
  • Implemented ElasticSearch and Kibana using Helm on EKS.
  • Handled tasks involving Atlantis and used Terratest to test Kubernetes modules on EKS.
Technologies: Terraform, Kubernetes, Helm, Argo CD, GitLab CI/CD, Prometheus, Grafana, Ansible, Karpenter, Open Policy Agent (OPA)

Senior Platform Engineer

2022 - 2023
babelforce
  • Managed the entire cloud infrastructure, ensuring reliability and scalability.
  • Participated in a 24/7 on-call rotation for critical system support.
  • Led a project to implement a tracing solution, successfully deploying Grafana Tempo.
  • Specialized in observability, leveraging Prometheus, Grafana Loki, and Grafana Tempo for metrics, logs, and tracing.
  • Managed CI/CD pipelines using GitLab CI for efficient deployment processes.
  • Orchestrated Kubernetes clusters using Kops on AWS, optimizing resource allocation.
  • Implemented GitLab Runner on Amazon EKS using Fargate for enhanced scalability.
  • Developed Grafana dashboards and alerts using Prometheus metrics and Grafana Loki logs.
  • Engineered Helm chart templates for streamlined application deployment on AWS.
  • Utilized Argo CD to manage application deployment and configuration in AWS.
Technologies: GitLab CI/CD, Prometheus, Grafana, OpenTelemetry, Kubernetes, Helm, Argo CD, Crossplane, Kustomize, Amazon RDS

Senior SRE Cloud Engineer

2021 - 2022
Venmo
  • Focused on observability and incident response for Venmo/PayPal.
  • Implemented the vector observability pipeline with Datadog.
  • Created dashboards and alerts for Datadog cost management and developed cost reduction initiatives.
  • Developed Terraform modules on GitHub to streamline creation of alerts, SLOs, and dashboards.
  • Resolved Jira tickets related to Datadog and PagerDuty.
Technologies: Datadog, GitHub, Terraform, Python 3, PagerDuty, Jira, Confluence, Amazon EC2, Amazon EKS, Amazon S3 (AWS S3)

Specialist SRE, Observability

2021 - 2021
Dock
  • Led the SRE observability squad, overseeing Dock's entire observability ecosystem.
  • Developed strategic plans for short, medium, and long-term observability enhancements.
  • Led open-source observability projects, starting with Prometheus for metrics.
  • Created the "Observability Showroom" to inspire developers with model solutions.
  • Developed the "Observability Journey" to document set-up steps for observability tools.
  • Architected the observability team's AWS infrastructure.
  • Provided technical support for Datadog and Splunk.
  • Managed project tasks in Jira and handled backlog construction and delivery deadlines.
  • Trained and mentored the team on SRE practices and DevOps culture.
Technologies: Datadog, Prometheus, Grafana, Thanos, Terraform, CircleCI, Splunk, GitHub, Kubernetes, Amazon S3 (AWS S3)

Senior Site Reliability Engineer

2020 - 2021
Banco Itaú
  • Led SRE efforts in a squad focused on observability.
  • Provided technical consultancy for monitoring applications using AppDynamics.
  • Planned and executed application migrations from PaaS (OpenShift), IaaS (OpenStack), AWS (ECS, EKS, EC2), and on-premise solutions to a new AppDynamics SaaS environment.
  • Promoted observability culture across product squads.
Technologies: AppDynamics, Splunk, GitLab, ServiceNow, Kubernetes, Terraform, Jira, Confluence, Python 3, Bash Script

Site Reliability Engineer

2019 - 2020
Localiza
  • Ensured system stability as part of a multidisciplinary team.
  • Provided technical consultancy for monitoring on-premises and Cloud applications using AppDynamics and Datadog.
  • Monitored Windows servers with OpManager and WMI technology.
  • Automated disaster recovery processes for the network team using a shell script.
  • Managed, administered, and operated network assets including Cisco and HP switches, Aruba Wi-Fi controller and APs, Palo Alto, Fortinet, and ASA firewalls, Citrix Netscaler traffic balancer, and Aruba Clear Pass NAC.
  • Managed, administered, and operated tools supporting operations such as OpManager, Zabbix, TRAFip, SLAviel, CFGtool, and Infoblox.
  • Led the project for network traffic management and monitoring at Localiza headquarters and key branches.
  • Handled N2 and N3 level support tickets for network infrastructure and information security.
  • Produced reports, metrics, and dashboards with strategic insights for the business.
Technologies: Ansible, Rundeck, Azure DevOps, Amazon Route 53, System Center Operations Management (SCOM), F5 Load Balancer, Palo Alto Networks, Cisco, Fortinet, Zabbix

Technical Consultant

2013 - 2019
Telcomanager
  • Provided technical consultancy in pre-sales, post-sales, and special projects, developing new business opportunities in the corporate market.
  • Analyzed client network infrastructure to provide optimal solutions and effectively communicated client needs to support and development teams.
  • Participated in corporate trade shows such as Futurecom, delivering speeches and presentations to promote Telcomanager solutions.
  • Coordinated the technical support team, ensuring high-quality service delivery aligned with the company's mission, vision, and philosophy.
  • Achieved significant cost savings, doubled software sales over three years, and substantially increased the client base through technical support advancements.
  • Achieved over 98% customer satisfaction ratings for support calls and implemented improvements in response time, incident resolution, and customer surveys.
  • Supervised, hired, and provided ongoing training for the technical support team, documenting processes, manuals, and guidelines for the department.
  • Conducted customized client training sessions and developed scripts in Shell and LUA to support Telcomanager's network management tools.
  • Identified and replicated bugs in network management tools, providing detailed reports and proposing system improvements to the development team.
  • Configured client network assets focusing on flow export protocols (NetFlow, Sflow, IPFIX) and SNMP, and participated in Telcomanager solutions implementation.
Technologies: Linux, Bash Script, Lua, NetFlow, SNMP, Cisco, Firewalls, Networks, MikroTik, Zabbix

Experience

Home Office VPN Observability Platform

During the pandemic, the company faced a critical remote-work challenge: around 59% of employees reported issues with the corporate VPN (frequent drops, slow connections, and difficulty connecting), which directly impacted customer service teams and business goals. As a solution, I designed and delivered an end-to-end VPN observability platform to determine whether problems originated in the company's infrastructure or in employees' home environments.

The solution combined three monitoring layers:
• VPN firewalls, collecting CPU, memory, disk, traffic, simultaneous client connections, and UDP/ICMP/SSL/TCP session counts against device limits.
• User VPN sessions, capturing connection/disconnection events, session durations, and contextual data (employee ID, department, ISP, public/private IP, state, city) in near real-time (5-minute intervals).
• Home internet quality. A custom service running on every remote workstation, executing ICMP tests against VPN peers, telephony system, internal servers, and public DNS, plus Wi-Fi against cable detection and signal strength.

I built Power BI dashboards and SQL-based reports that drove decisions across support, security, and leadership teams.

Education

2014 - 2018

Bachelor's Degree in Control and Automation Engineering

Federal Center for Technological Education of Rio de Janeiro Celso Suckow da Fonseca - Rio de Janeiro, Brazil

2009 - 2012

Technical Course in Telecommunications

Federal Center for Technological Education of Rio de Janeiro Celso Suckow da Fonseca - Rio de Janeiro, Brazil

Certifications

JANUARY 2023 - PRESENT

HashiCorp Certified: Terraform Associate

HashiCorp

OCTOBER 2022 - PRESENT

Certified Kubernetes Administrator

The Linux Foundation

MAY 2022 - PRESENT

LPIC-1: Linux Administrator

Linux Professional Institute

FEBRUARY 2022 - PRESENT

AppDynamics Certified Associate Performance Analyst

AppDynamics

SEPTEMBER 2021 - PRESENT

AWS Certified Solutions Architect

Amazon Web Services

JUNE 2021 - PRESENT

AWS Certified Cloud Practitioner

Amazon Web Services

MAY 2021 - PRESENT

GitLab Certified Associate

GitLab

FEBRUARY 2020 - PRESENT

Cisco Certified Network Associate

Cisco Systems

NOVEMBER 2019 - PRESENT

MikroTik Certified Network Associate (MTCNA)

MikroTik

Skills

Libraries/APIs

Thanos

Tools

Terraform, GitHub, GitLab, GitLab CI/CD, MATLAB, Helm, Grafana, Ansible, Kustomize, Jira, Confluence, Amazon EKS, CircleCI, Splunk, AppDynamics, Rundeck, F5 Load Balancer, Zabbix, Microsoft Power BI, ACL, VPN, Amazon CloudWatch, Kubectl, Git

Platforms

Linux, Docker, Kubernetes, PagerDuty, Amazon EC2, RouterOS

Languages

Bash Script, Python 3, C, Java, Lua, SQL

Frameworks

Crossplane

Paradigms

Azure DevOps

Storage

Datadog, Amazon S3 (AWS S3)

Other

Observability, CI/CD Pipelines, Robotics, Linear Algebra, Electronics, Networks, Physics, Mathematics, Mechanics, Telecommunication Engineering, TCP/IP, Programming, OSI Model, Argo CD, Prometheus, Karpenter, Open Policy Agent (OPA), OpenTelemetry, Amazon RDS, ServiceNow, Amazon Route 53, System Center Operations Management (SCOM), Palo Alto Networks, Cisco, Fortinet, NetFlow, SNMP, Firewalls, MikroTik, Data Analysis, Routing, Cisco Switches, VLANs, Virtual Private Cloud (VPC), Cloud Computing, Cloud, AWS Pricing, AWS Support, Cloud Security, Shell Scripting, System Administration, File Permissions, Networking, User Management, APM, Dashboards, Notification Center, Troubleshooting, Container Orchestration, Cluster Administration, Infrastructure as Code (IaC), HCL

Collaboration That Works

How to Work with Toptal

Toptal matches you directly with global industry experts from our network in hours—not weeks or months.

1

Share your needs

Discuss your requirements and refine your scope in a call with a Toptal domain expert.
2

Choose your talent

Get a short list of expertly matched talent within 24 hours to review, interview, and choose from.
3

Start your risk-free talent trial

Work with your chosen talent on a trial basis for up to two weeks. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring