Experience
01.2021 - 08.2022foodpandaSingapore
foodpanda logo

Principal DevOps Engineer

Ran APAC infrastructure for the Delivery Hero group on a macro-scale ArgoCD ecosystem — standing up new Local Business Units, countries and regions, fast and safe. Drove self-service GitOps that absorbed the bulk of Jira hot-requests, contributed upstream to argo-rollouts, ingress-nginx, and Atlantis, and held the fleet — thousands of distributed services — without a self-inflicted incident.

7 direct reportsAPAC SRE
Key wins
Macro-scale ArgoCD
Multiple mainstream contributions
Self-service IaaC state
APAC LBU enablement
Technologies
KubernetesArgoCDArgoRolloutsArgoWorkflowsTerraformPythonGoHelmKustomizeingress-nginxAtlantisAWSEKSDatadog
Responsibilities & achievements

First Principal Engineer of its kind at foodpanda — running APAC infrastructure not just for foodpanda but across the entire Delivery Hero group. Each cluster a self-contained Local Business Unit: a country or region running its own business, independently. Large-scale computing of containerised environments, driven by IaaC on high-availability and high-concurrency systems.

Platform & Orchestration

  • Kubernetes at large scale, concurrency, resilience — thousands of distributed services, thousands of nodes, tens of thousands of ingress resources under management
  • Cluster lifecycle on AWS/EKS: Terraform-imported the existing clusters into a blue/green blueprint, then spun new clusters inside the same VPC/networking — communicating internally and presenting as a single perimeter entity: a "metacluster"
  • Macro-scale ArgoCD ecosystem: a custom plugin, a shared chart, and a layered DRY footprint repo. Reproducible infrastructure built on meta-modular abstractions, so the group could stand up new Regions and Countries fast and safe
  • ArgoCD and ArgoRollouts (Design, Deployment, Customisation, Workshops)
  • Kubernetes Tailoring (MPA, Controllers, Advanced Scheduling, Affinity, etc.)
  • Self-service GitOps absorbed the bulk of the Jira Service Desk hot-request types — engineers spent their time reviewing PRs instead of crafting them

Reliability, Cost & Resilience

  • Observability: Datadog as the observability stack across the fleet
  • Cost engineering: introduced Spot and hybrid capacity, with on-demand fallback to tolerate instance-exhaustion events
  • GameDays: twice-yearly drills exercising failure scenarios — AZ-outage availability among them

People & Practice

  • Taking care of a small (7), talented APAC SRE squad — zero attrition and zero self-inflicted incidents during my tenure
  • Mentorship & upskilling: invested in team certifications (Terraform especially); the modules they shipped reached a level the team had not produced before, with continuous support so no engineer flew alone
  • Platform engineering at group scale: the macro-scale ArgoCD ecosystem and self-service GitOps enabled developer teams across many Local Business Units to ship to production fast and safely
  • Hackathons: drove the hackathon program with deliberate themes — disabilities and accessibility among them, surfacing features like ingredient-to-condition filtering (diabetes, allergies, G6PD, gluten) and visual-impairment support, several of which shipped to production
  • Tooling & Automation (Python, Terraform, Go, JS, HTML, CSS, Helm, Kustomize)
  • Infrastructure Security and Hardening (Topology & Perimetrical)

Mainstream Contributions

  • argo-rollouts — advanced canary across multiple rollout providers, so north-south and east-west traffic splits can be routed independently
  • kubernetes/ingress-nginx — incremental update capacity for fleets running tens of thousands of ingress resources, plus controller-runtime observability (timings, counts)
  • Atlantis — hardened the admin portal, and added proper SEMVER support so Terraform modules absorb minor tf-binary updates automatically across vast IaaC

Share is Care