
Darshit Suratwala
Verified Expert in Engineering
Site Reliability Engineer (SRE) and Software Developer
Mumbai, Maharashtra, India
Toptal member since March 18, 2026
Darshit is a senior site reliability and platform engineer with more than seven years of experience spanning blockchain, AI, observability, and developer tooling. A CKA-certified engineer fluent in AWS, GCP, Azure, and bare-metal environments, he has managed 100+ node clusters at Coinbase, delivered over $700,000 in monthly cloud cost savings, and led SOC2 compliance initiatives. Darshit thrives at the intersection of reliability, automation, and infrastructure at scale.
Portfolio
Experience
- Monitoring & Alerting - 7 years
- Ansible - 6 years
- Site Reliability Engineering (SRE) - 6 years
- CI/CD Pipelines - 6 years
- Google Cloud Platform (GCP) - 6 years
- Amazon Web Services (AWS) - 5 years
- Kubernetes - 5 years
- Python - 5 years
Preferred Environment
Slack, Zoom, Google, Jira, Confluence, Notion
The most amazing...
...cost optimization I've delivered saved over $700,000 per month in cloud expenses at Supra by migrating to a distributed multi-cloud and bare-metal architecture.
Work Experience
Site Reliability Engineer
Supra
- Achieved more than $700,000 a month in cloud cost reduction through strategic migration from cloud-only to a hybrid multi-cloud and bare-metal infrastructure model.
- Directed migration to a distributed multi-cloud and bare-metal architecture, improving resilience and decentralization for a high-throughput Layer 1 blockchain network.
- Drove SOC2 compliance across all infrastructure, implementing security controls, CIS benchmarks, and vulnerability management processes to meet audit requirements.
Software Engineer
Scale3 Labs
- Led the full development of the Nodepilot product, from infrastructure architecture and design to customer onboarding, serving as the primary SRE owner of the blockchain observability platform.
- Implemented a fully automated GitOps deployment workflow using Argo CD, Helm, and Terraform, eliminating manual intervention and ensuring consistent, reproducible infrastructure across environments.
- Integrated VectorDB and LLM framework support into Python and TypeScript SDKs for Langtrace, extending observability capabilities for AI and machine learning workloads.
- Deployed scalable self-hosting solutions for Langtrace across Kubernetes, Azure, Docker Compose, and Railway App, enabling diverse customer deployment models.
- Automated blockchain binary release pipelines using serverless services, reducing release cycle time and minimizing human error in critical node software updates.
Site Reliability Engineer
Coinbase
- Managed blockchain node operations across over 30 chains and more than 100 remote procedure call (RPC) node clusters, ensuring high availability for one of the world's largest publicly-traded cryptocurrency exchanges.
- Defined service-level objectives (SLOs) and service-level indicators (SLIs) for critical blockchain infrastructure services, built monitoring dashboards, and reduced alert noise to improve on-call efficiency and signal-to-noise ratio.
- Built an in-house Opsbook service using Django to centralize runbooks and incident response procedures, reducing mean time to resolution (MTTR) by 15 minutes.
Senior DevOps Engineer
BrowserStack
- Implemented a disaster recovery strategy on an alternate cloud provider, achieving 50% lower recovery time objective (RTO) and ensuring business continuity for a platform serving 1+ million daily testing sessions.
- Migrated monolith staging environments to Kubernetes, improving resource utilization and enabling faster, more reliable developer workflows across engineering teams.
- Maintained and operated global cloud and on-premises infrastructure spanning more than 20 data centers, supporting platform reliability at scale.
Platform Engineer
Quantiphi
- Built the v1 file-browser module from scratch using Django and GCP Cloud Storage, enabling end users to upload, organize, and retrieve media assets for AI-driven content analysis.
- Deployed and configured a multi-node Elastic Stack (ELK) cluster with Kibana dashboards, providing real-time log aggregation and search across platform microservices.
- Developed RESTful APIs using Django REST Framework and AWS Lambda to power core platform functionality, serving as the integration layer between the front-end and AI inference services.
- Automated CI/CD pipelines for microservices and AI model deployments, reducing manual release effort and accelerating delivery cycles.
Experience
Production-grade 3-tier AWS Infrastructure with IaC and CI/CD
https://github.com/DSdatsme/node-3tier-app2I built six GitHub Actions workflows: three for PR validation (linting, security audits, Dockerfile linting, Terraform plan) and three for deployment with environment-gated approval flows. I also created operational runbook scripts for day-2 management, including service start/stop/scale and RDS backup operations. This project demonstrates end-to-end ownership from infrastructure design through CI/CD automation to production operations.
Blockchain Goes Kubernetes
https://youtu.be/5_dwKZ88G8wTerraform GitOps CI/CD with Approval and Slack Notifications
https://github.com/DSdatsme/gh-terraformEducation
Bachelor's Degree in Information Technology
University of Mumbai - Mumbai, India
Certifications
Microsoft Azure Fundamentals
Microsoft
Google Cloud Professional Cloud DevOps Engineer
Google Cloud
Certified Kubernetes Administrator
Linux Professional Institute
GCP Professional Cloud Architect
Google Cloud
GCP Associate Cloud Engineer
Google Cloud
Skills
Libraries/APIs
REST APIs, Node.js
Tools
Ansible, Jenkins, Terraform, Amazon CloudWatch, Google Kubernetes Engine (GKE), Slack, Zoom, Vault, NGINX, Helm, Grafana, OpenTofu, Google Compute Engine (GCE), Kubectl, Amazon EKS, AWS IAM, Observability Tools, AWS ELB, Logging, Docker Compose, Claude, Jira, Confluence, Notion, ELK (Elastic Stack), GitHub, Amazon CloudFront, BigQuery, AWS Fargate, AWS CloudFormation, GitLab CI/CD, Amazon Elastic Container Service (ECS), Amazon Virtual Private Cloud (VPC), MongoDB Atlas
Paradigms
DevOps, Continuous Integration (CI), Continuous Delivery (CD), Role-based Access Control (RBAC), HIPAA Compliance, Azure DevOps
Platforms
Amazon Web Services (AWS), Kubernetes, Google Cloud Platform (GCP), Azure, Docker, Blockchain, DigitalOcean, Amazon EC2, AWS Lambda, Linux, Ubuntu, PagerDuty, Bare-metal Server, Vercel, Apache Kafka, Cloud Run
Languages
Python, Bash, Bash Script, Groovy, Python Script, TypeScript, SQL, JavaScript, Go, Ruby
Storage
Google Cloud, Datadog, Microsoft SQL Server, Amazon S3 (AWS S3), MySQL, On-premise, PostgreSQL, Google Cloud Storage, Amazon DynamoDB, Redis, MongoDB
Frameworks
Django, Django REST Framework
Other
Infrastructure as Code (IaC), CI/CD Pipelines, GitHub Actions, Monitoring & Alerting, Cloud Architecture, Site Reliability Engineering (SRE), Cloud, Incident Response, Infrastructure, GCP DevOps, System Administration, Infrastructure Automation, SOC 2, Disaster Recovery (DR), Cloud Cost Management, Monitoring, Observability, Argo CD, GitOps, Prometheus, Compliance, Incident Management, Amazon RDS, Security, Networking, Troubleshooting, Identity & Access Management (IAM), Kubernetes Operations (kOps), Agile DevOps, Configuration Management, Debugging Tools, Disaster Recovery Plans (DRP), Software Development Lifecycle (SDLC), Cloudflare, Architecture, Linux Administration, IT Infrastructure, Virtual Machines, Self-hosted, Containers, AWS DevOps, AWS Cloud Architecture, AWS Cloud Operations, Cloud Infrastructure, Performance, Server Optimization, Hybrid Cloud Infrastructure, Multi-tenant Architecture, APIs, Transport Layer Security (TLS), Container Orchestration, Argo Workflows, Disaster Recovery Automation, Containerization, Scripting, SOC Compliance, Domain Migration, Domain DNS Setup, Web Hosting, Consulting, Cloud Migration, Google, Software Development, RESTful Microservices, Serverless, OpenTelemetry, AWS ECS Fargate, High Availability (HA), AWS Secrets Manager, GitHub Workflows, Slackbot, Virtualization, Cloud Security, Microsoft Azure, Railway, IT Security, Artificial Intelligence (AI), Data Migration, Large Language Models (LLMs), Virtual Private Cloud (VPC), Google Cloud Build, Pulumi, API Gateways, Migration
How to Work with Toptal
Toptal matches you directly with global industry experts from our network in hours—not weeks or months.
Share your needs
Choose your talent
Start your risk-free talent trial
Top talent is in high demand.
Start hiring