08.2022 - PresentXenditSingapore

Head of Infrastructure and Science
Spearheaded comprehensive overhaul of infrastructure and engineering culture. Orchestrated 7-figure cost savings. Managing 7000+ services across dozens of clusters and thousands of nodes. Pure IaaC state with zero-downtime EKS lifecycle management.
15 direct reportsSREDataSecurity
Key wins
99.999% uptime
Pure IaaC self-service
7-figures cost efficiency
Technologies
EKSArgoCDGitOpsTerraformCloudflareSplit DNSMulti-CDN
Responsibilities & achievements
Spearheaded a comprehensive overhaul of infrastructure and engineering culture at Xendit, managing 7,000+ services across dozens of Kubernetes clusters and thousands of nodes. Delivered unprecedented efficiency, 7-figure cost savings, and a transition from a toil-focused squad to a high-performing SRE engineering unit.
Platform Engineering & Orchestration
- Fleet Re-Architecture: Completely reimagined and rebuilt the Kubernetes fleet, eliminating technical debt and implementing a true Active/Active Multi-Cluster architecture for hyper-distributed workloads.
- Advanced GitOps & Control Plane: Engineered a custom ArgoCD Macro-Scale Framework and plugin, enabling a dry, layered YAML structure to manage thousands of deployments across a distributed fleet with infinite-scale design.
- Lifecycle & Scaling: Achieved zero-downtime, zero-error-rate EKS lifecycle management. Leveraged Karpenter for high-performance node provisioning and KEDA to drive extreme cost efficiency, enabling services to scale to zero during low-demand periods.
- Deployment & Governance: Standardized canary deployments via Argo Rollouts and orchestrated complex QA/CI/CD pipelines with Argo Workflows, all governed by automated Kyverno policy enforcement.
Networking & Traffic Engineering
- Multi-Cluster Connectivity: Deployed Cilium MultiCluster Mesh to provide seamless, secure, and observable connectivity across the global service landscape.
- Traffic Steering & DR: Designed a fault-tolerant, transparent DR strategy via a Split DNS Horizon on multi-region/multi-AZ topologies. Developed an advanced DNS hierarchy for weighted blue/green load balancing between CDNs, clusters, and regions.
- Edge Migration: Executed a flawless, zero-downtime migration from Imperva to Cloudflare using a Multi-CDN topology.
Reliability & Cost Engineering
- SLA & Incident Management: Elevated platform availability from 99.98% to 99.999% SLA, maintaining a near two-year record of zero team-led incidents. Elevated RCA and Post-Mortems to mission-critical status.
- Strategic Cost Optimization: Delivered consistent 20% YoY cost reductions through a dual-track strategy:
- Architectural Efficiency: Implementing scaling-to-zero (KEDA), right-sizing via Karpenter, and optimized compute architectures (arm64/amd64).
- Financial Engineering: Orchestrating complex capacity planning involving Spot instances, Reserved Instances, and Savings Plans.
- Data Layer Resilience: Managed high-availability, self-hosted data layers including YugabyteDB, MongoDB, and PostgreSQL within the Kubernetes ecosystem.