
Aghateymur Hasanzade
Verified Expert in Engineering
Site Reliability Engineer and Developer
Baku, Azerbaijan
Toptal member since June 22, 2026
Aghateymur is a site reliability engineer with 7 years of experience in observability and platform infrastructure. His expertise spans AWS, Azure, Kubernetes, and open-source observability stacks, with notable work at Kapital Bank, Ciklum, and Simbrella. Aghateymur led a migration to an open-source stack for 700+ services across 7 clusters at Kapital Bank, saving $2.5 million annually and improving reliability.
Portfolio
Experience
- Go - 7 years
- Kubernetes - 6 years
- CI/CD Pipelines - 6 years
- PostgreSQL - 5 years
- Amazon Web Services (AWS) - 5 years
- OpenTelemetry - 4 years
- Terraform - 2 years
- Datadog - 2 years
Preferred Environment
Amazon Web Services (AWS), Azure, Kubernetes, Nomad, Consul, Vault, Linux, Windows Server, Azure DevOps, GitLab
The most amazing...
...migration I've led is moving 700+ services to an open-source observability stack, saving $2.5 million annually.
Work Experience
Software Engineer
Ciklum
- Built new and modified API endpoints for an integration layer between the client's internal API, MS SQL logic, and external healthcare providers (GoodRx and Honeybee Health), exposing an external API and event pub/sub for other healthcare companies.
- Implemented configurable exponential retry logic for external API calls to improve the reliability of cross-provider integrations.
- Instrumented application telemetry with OpenTelemetry. Deployed and operated the services on AKS with Azure Monitor.
- Maintained Azure DevOps CI/CD pipelines and Helm charts, with integration flows built against Honeybee Health Partners API specs in compliance with HIPAA and SOC 2.
Software Engineer
Ciklum
- Built an internal tool that generates business proposals from company documentation using local AI models (Mistral and GPT-OSS) via Ollama, including the Ollama serving infrastructure.
- Designed agentic AI and RAG flows using PostgreSQL (pgvector) and nomic-embed-text embeddings to use model context efficiently. Built Confluence and Jira integrations to ingest content as markdown.
- Built the React/TypeScript front end and demoed it to the company within a month.
Site Reliability Engineer
Ciklum
- Set up production readiness standards and a service catalog for 80+ services, extended to cover vendors, libraries, and appliances, with each entry holding points of contact, SLA, and runbooks, and became the single reference index during incidents.
- Developed a custom incident response Slack application in Go, integrated with Jira for ticketing, handling communication, data collection, and MTTx/KPI measurement.
- Migrated 50+ Datadog monitors and 20+ SLOs to Terraform using reusable modules and remote state, improving maintainability.
- Supported production Node.js services among the 80+ Tier 1 services covered by the catalog.
Senior Site Reliability Engineer
Kapital Bank
- Initiated and led migration from Dynatrace to an open-source observability stack (OpenTelemetry, Grafana, Mimir, Tempo, and Loki) across 700+ services on 7 Kubernetes clusters with no code changes or downtime, saving the bank $2.5 million annually.
- Researched and implemented a symptom-based SLO alerting framework with multiwindow, multi-burn-rate alerts on business-flow SLOs, replacing most cause-based alerting and cutting alert noise by around 80%.
- Deployed and operated the full platform on bare-metal: a mesh of OpenTelemetry collectors with head and tail sampling, a MinIO cluster as long-term store for Mimir, Tempo, Loki, and nginx ingress across the 7 clusters.
- Diagnosed CPU, memory, and IO contention on Linux nodes with perf, flamegraphs, eBPF, bpftrace, and strace. Debugged kubelet evictions and OOMKilled events and tuned eviction thresholds and requests/limits to keep workloads stable.
- Maintained logging, monitoring, and alerting controls and ran the vulnerability scanning and patch tracking that kept the bank's cardholder data environment PCI DSS compliant.
Site Reliability Engineer
Simbrella
- Operated 200+ services across multi-continental on-premises infrastructure (VMs, Kubernetes, and Nomad), serving hundreds of thousands of users under ISO 27001. Ran Consul for service discovery, Vault for secrets, and Kafka for event streaming.
- Provided on-call technical support to external customer engineering teams, including banks and telcos integrating with Simbrella's APIs. Joined real-time incident calls to debug integration issues and delivered written RCAs to client tech teams.
- Owned CI/CD pipelines in Azure DevOps for .NET and Java services. Migrated all from UI-based to YAML, built core templates, and integrated SonarQube and Trivy as quality and security gates.
- Led an E2E testing framework using Vagrant and Ansible to provision Nomad clusters and dependencies, cutting new cluster setup in production from 2 days to 2 hours. Adopted by teams before departure.
- Contributed to migrating 80+ services and a shared core library from .NET Framework to .NET Core, which let services run cleanly under Nomad and Kubernetes for the first time.
DevOps Engineer
WeTravel
- Managed AWS EC2, Route53, and ALB/NLB load balancers across local, staging, and pre-production environments.
- Improved CI/CD pipelines for Ruby on Rails and React applications in GitLab.
- Wrote automation scripts to set up unified development environments, speeding up the onboarding of new engineers.
Experience
Torrent Streaming Service for Homelab | Apple TV
https://github.com/ShapedTime/momoshtremThe web UI lets users browse TMDB, add movies and shows to their library, search for torrents across multiple indexers, and assign them with one click. Content is immediately available for streaming over WebDAV, so any compatible player, including Infuse, VLC, nPlayer, or any WebDAV client, can play it on any device.
The back end uses a library-first virtual filesystem, with the folder structure driven by a library database rather than by how torrents happen to name their files. Users get clean Movie Name (Year)/Movie Name (Year).mkv paths and proper Show/Season 01/S01E01.mkv organization regardless of the torrent's internal structure.
Apdex Prometheus Rule Generator
https://github.com/ShapedTime/apdex-prometheus-generatorEducation
Bachelor's Degree in Computer Science
University of Strasbourg - Baku, Azerbaijan
Certifications
AWS Certified Solutions Architect Associate
Amazon Web Services
Skills
Libraries/APIs
Thanos, Node.js, React, Amazon API
Tools
Helm, Dynatrace, Grafana, Loki, NGINX, Strace, Vagrant, Ansible, GitLab, Kibana, Amazon CloudWatch, Git, RabbitMQ, Amazon EKS, Claude Code, Azure Monitor, Confluence, Jira, Slack, Terraform, Vault, SonarQube, Jenkins, Azure Kubernetes Service (AKS), Amazon Elastic Container Service (ECS), Amazon Elastic Container Registry (ECR), AWS IAM, Amazon Elastic Block Store (EBS), AWS ELB, Amazon ElastiCache, Amazon CloudFront, Amazon CloudFront CDN, Amazon Simple Queue Service (SQS), Amazon Simple Notification Service (SNS), Amazon Kinesis Data Firehose, AWS Fargate, AWS CloudTrail, AWS Key Management Service (KMS), Amazon Virtual Private Cloud (VPC)
Languages
Go, YAML, Bash, SQL, TypeScript, Java, Python, JavaScript, C#, C, Ruby
Frameworks
Windows PowerShell, .NET
Paradigms
Azure DevOps, DevOps
Platforms
Kubernetes, Linux, Amazon Web Services (AWS), Docker, PagerDuty, Harbor, Windows Server, Codefresh, Nexus, Ollama, Azure, Amazon EC2, AWS ALB, AWS NLB, AWS Lambda
Storage
Redis, PostgreSQL, Datadog, Elasticsearch, Amazon FSx, Amazon S3 (AWS S3), Amazon Aurora, Amazon EFS, Amazon DynamoDB
Other
APIs, OpenTelemetry, CI/CD Pipelines, Mimir, MinIO, Performance Optimization, Prometheus, Computer Architecture, Data Structures, Algorithms, Opslevel, Caching, Linux Filesystem, AWS DevOps, DevOps Engineer, Infrastructure as Code (IaC), Static Application Security Testing (SAST), eBPF, VMS, Nomad, Consul, Kafka, Trivy, vCenter, Jaeger, IIS, Security, Mistral AI, Pgvector, ISO 27001, Argo CD, AWS IAM Identity Center, Amazon RDS, AWS Auto Scaling, Amazon Route 53, Amazon Kinesis, Amazon Kinesis Data Streams, AWS ECS Fargate, AWS Config
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring