
Principal DevOps Engineer
Ran APAC infrastructure for the Delivery Hero group on a macro-scale ArgoCD ecosystem — standing up new Local Business Units, countries and regions, fast and safe. Drove self-service GitOps that absorbed the bulk of Jira hot-requests, contributed upstream to argo-rollouts, ingress-nginx, and Atlantis, and held the fleet — thousands of distributed services — without a self-inflicted incident.
First Principal Engineer of its kind at foodpanda — running APAC infrastructure not just for foodpanda but across the entire Delivery Hero group. Each cluster a self-contained Local Business Unit: a country or region running its own business, independently. Large-scale computing of containerised environments, driven by IaaC on high-availability and high-concurrency systems.
Platform & Orchestration
- Kubernetes at large scale, concurrency, resilience — thousands of distributed services, thousands of nodes, tens of thousands of ingress resources under management
- Cluster lifecycle on AWS/EKS: Terraform-imported the existing clusters into a blue/green blueprint, then spun new clusters inside the same VPC/networking — communicating internally and presenting as a single perimeter entity: a "metacluster"
- Macro-scale ArgoCD ecosystem: a custom plugin, a shared chart, and a layered DRY footprint repo. Reproducible infrastructure built on meta-modular abstractions, so the group could stand up new Regions and Countries fast and safe
- ArgoCD and ArgoRollouts (Design, Deployment, Customisation, Workshops)
- Kubernetes Tailoring (MPA, Controllers, Advanced Scheduling, Affinity, etc.)
- Self-service GitOps absorbed the bulk of the Jira Service Desk hot-request types — engineers spent their time reviewing PRs instead of crafting them
Reliability, Cost & Resilience
- Observability: Datadog as the observability stack across the fleet
- Cost engineering: introduced Spot and hybrid capacity, with on-demand fallback to tolerate instance-exhaustion events
- GameDays: twice-yearly drills exercising failure scenarios — AZ-outage availability among them
People & Practice
- Taking care of a small (7), talented APAC SRE squad — zero attrition and zero self-inflicted incidents during my tenure
- Mentorship & upskilling: invested in team certifications (Terraform especially); the modules they shipped reached a level the team had not produced before, with continuous support so no engineer flew alone
- Platform engineering at group scale: the macro-scale ArgoCD ecosystem and self-service GitOps enabled developer teams across many Local Business Units to ship to production fast and safely
- Hackathons: drove the hackathon program with deliberate themes — disabilities and accessibility among them, surfacing features like ingredient-to-condition filtering (diabetes, allergies, G6PD, gluten) and visual-impairment support, several of which shipped to production
- Tooling & Automation (Python, Terraform, Go, JS, HTML, CSS, Helm, Kustomize)
- Infrastructure Security and Hardening (Topology & Perimetrical)
Mainstream Contributions
- argo-rollouts — advanced canary across multiple rollout providers, so north-south and east-west traffic splits can be routed independently
- kubernetes/ingress-nginx — incremental update capacity for fleets running tens of thousands of ingress resources, plus controller-runtime observability (timings, counts)
- Atlantis — hardened the admin portal, and added proper SEMVER support so Terraform modules absorb minor tf-binary updates automatically across vast IaaC
Share is Care