Experience
08.2022 - PresentXenditSingapore
Xendit logo

Head of Infrastructure and Science

Spearheaded comprehensive overhaul of infrastructure and engineering culture. Orchestrated 7-figure cost savings. Managing 7000+ services across dozens of clusters and thousands of nodes. Pure IaaC state with zero-downtime EKS lifecycle management.

15 direct reportsSREDataSecurity
Key wins
99.999% uptime
Pure IaaC self-service
7-figures cost efficiency
Technologies
EKSArgoCDGitOpsTerraformCloudflareSplit DNSMulti-CDN
Responsibilities & achievements

Spearheaded a comprehensive overhaul of infrastructure and engineering culture at Xendit, managing 7,000+ services across dozens of Kubernetes clusters and thousands of nodes. Delivered unprecedented efficiency, 7-figure cost savings, and a transition from a toil-focused squad to a high-performing SRE engineering unit.

Platform Engineering & Orchestration

  • Fleet Re-Architecture: Completely reimagined and rebuilt the Kubernetes fleet, eliminating technical debt and implementing a true Active/Active Multi-Cluster architecture for hyper-distributed workloads.
  • Advanced GitOps & Control Plane: Engineered a custom ArgoCD Macro-Scale Framework and plugin, enabling a dry, layered YAML structure to manage thousands of deployments across a distributed fleet with infinite-scale design.
  • Lifecycle & Scaling: Achieved zero-downtime, zero-error-rate EKS lifecycle management. Leveraged Karpenter for high-performance node provisioning and KEDA to drive extreme cost efficiency, enabling services to scale to zero during low-demand periods.
  • Deployment & Governance: Standardized canary deployments via Argo Rollouts and orchestrated complex QA/CI/CD pipelines with Argo Workflows, all governed by automated Kyverno policy enforcement.

Networking & Traffic Engineering

  • Multi-Cluster Connectivity: Deployed Cilium MultiCluster Mesh to provide seamless, secure, and observable connectivity across the global service landscape.
  • Traffic Steering & DR: Designed a fault-tolerant, transparent DR strategy via a Split DNS Horizon on multi-region/multi-AZ topologies. Developed an advanced DNS hierarchy for weighted blue/green load balancing between CDNs, clusters, and regions.
  • Edge Migration: Executed a flawless, zero-downtime migration from Imperva to Cloudflare using a Multi-CDN topology.

Reliability & Cost Engineering

  • SLA & Incident Management: Elevated platform availability from 99.98% to 99.999% SLA, maintaining a near two-year record of zero team-led incidents. Elevated RCA and Post-Mortems to mission-critical status.
  • Strategic Cost Optimization: Delivered consistent 20% YoY cost reductions through a dual-track strategy:
    • Architectural Efficiency: Implementing scaling-to-zero (KEDA), right-sizing via Karpenter, and optimized compute architectures (arm64/amd64).
    • Financial Engineering: Orchestrating complex capacity planning involving Spot instances, Reserved Instances, and Savings Plans.
  • Data Layer Resilience: Managed high-availability, self-hosted data layers including YugabyteDB, MongoDB, and PostgreSQL within the Kubernetes ecosystem.