Every role's full responsibilities and every case study's full body — on a single page. The same content the modals render, served in long form so you can scan, search, or hand the URL to someone else.
01 · Person
Xavier Murias
I've been in infrastructure since 2003 — through sysadmin, security, virtualization, containers, SRE, platform, and now AI. The work kept changing names; the shape of the problem didn't. Whatever I'm good at now, I owe to the next thing always being harder than the last.
These days I run platform at Xendit — seven thousand services across dozens of clusters, the kind of fleet where you measure success in things that didn't happen. Together with my team, we keep iterating and advancing our stack: materializing our own utopia day by day.
I independently released models with UNA and MGS — post-training methods I developed and applied across multiple transformer architectures, reaching #1 on the HuggingFace LLM Leaderboard several times: TheBeagle, Juanako, miniClaus, and Cybertron, which was served for nearly two years in Cloudflare Workers AI. Each iteration sharpened my intuition over deep neural networks, so I kept building AI.
I read more than I write, run more experiments than I publish, and contribute upstream when the fix belongs in the project itself, not a private fork. The systems I'm proudest of are the ones nobody notices, because they just keep running.
02 · Education & Certifications
13 entries
Academic and accredited.
Education
Massachusetts Institute of Technology
6.00.1x — Introduction to Computer Science and Programming Using Python
2018 – 2018
Introduction to Computer Science and Programming Using Python, and Introduction to Computational Thinking and Data Science.
Spearheaded a comprehensive overhaul of infrastructure and engineering culture at Xendit, managing 7,000+ services across dozens of Kubernetes clusters and thousands of nodes. Delivered unprecedented efficiency, 7-figure cost savings, and a transition from a toil-focused squad to a high-performing SRE engineering unit.
Platform Engineering & Orchestration
Fleet Re-Architecture: Completely reimagined and rebuilt the Kubernetes fleet, eliminating technical debt and implementing a true Active/Active Multi-Cluster architecture for hyper-distributed workloads.
Advanced GitOps & Control Plane: Engineered a custom ArgoCD Macro-Scale Framework and plugin, enabling a dry, layered YAML structure to manage thousands of deployments across a distributed fleet with infinite-scale design.
Lifecycle & Scaling: Achieved zero-downtime, zero-error-rate EKS lifecycle management. Leveraged Karpenter for high-performance node provisioning and KEDA to drive extreme cost efficiency, enabling services to scale to zero during low-demand periods.
Deployment & Governance: Standardized canary deployments via Argo Rollouts and orchestrated complex QA/CI/CD pipelines with Argo Workflows, all governed by automated Kyverno policy enforcement.
Networking & Traffic Engineering
Multi-Cluster Connectivity: Deployed Cilium MultiCluster Mesh to provide seamless, secure, and observable connectivity across the global service landscape.
Traffic Steering & DR: Designed a fault-tolerant, transparent DR strategy via a Split DNS Horizon on multi-region/multi-AZ topologies. Developed an advanced DNS hierarchy for weighted blue/green load balancing between CDNs, clusters, and regions.
Edge Migration: Executed a flawless, zero-downtime migration from Imperva to Cloudflare using a Multi-CDN topology.
Reliability & Cost Engineering
SLA & Incident Management: Elevated platform availability from 99.98% to 99.999% SLA, maintaining a near two-year record of zero team-led incidents. Elevated RCA and Post-Mortems to mission-critical status.
Strategic Cost Optimization: Delivered consistent 20% YoY cost reductions through a dual-track strategy:
Architectural Efficiency: Implementing scaling-to-zero (KEDA), right-sizing via Karpenter, and optimized compute architectures (arm64/amd64).
Data Layer Resilience: Managed high-availability, self-hosted data layers including YugabyteDB, MongoDB, and PostgreSQL within the Kubernetes ecosystem.
03.2024 - 08.2024Candy.AIRemote
AI/ML Lead (Contract)
Candy.AI · Remote
Reports5Squads
AIInfrastructure
Wins
Scale from thousands to millionsBuilt from scratch: pure IaaCGKE GPU cost-efficient fleets
Single-handedly revolutionized a startup's AI capabilities, scaling from nascent to millions of users.
Platform & Orchestration
Fleet rebuild: from a handful of Azure VM GPUs to a fully orchestrated, hyperscale GKE on GCP environment with NVIDIA GPU autoscaling, supporting multi-GPU and diverse generative workloads
CloudNative IaC discipline:Terraform, Helm, ArgoCD, and GitOps
Just-in-time launch: delivered the GKE fleet ahead of media and TV exposure — fleet held the resulting ramp from hundreds of thousands to millions of users at 99.99% uptime
Dev → prod gating: containerized AI workflows and inference for consistent deployment and controlled promotion across environments
Inference & Generative Workloads
Data pipelines: engineered the foundations that fed training and inference workflows
'SuperBooga' inference broker: an event-driven bus fronting GPU inference — nodes pull from a pub/sub queue and serve one request at a time per GPU, avoiding the performance hit a single GPU takes from concurrent inference
GPU & instance right-sizing: through performance and load-testing, matched adequate GPU and instance family to each workload while keeping a cost-efficient footprint
Distributed weights & adapters: intra-cluster storage kept in sync with the upstream source and fanned out to inference containers on demand, accelerating cold-start times and cutting storage costs
ML Engineering & Practice
End-to-end product delivery: translated aspirational & conceptual product requirements into shipped features — both the supporting software and custom-trained PyTorch/Transformers models
Performance discipline: custom stress-testing for LLM and Stable Diffusion, anchored in a K6-driven load-testing and observability culture
First Principal Engineer of its kind at foodpanda — running APAC infrastructure not just for foodpanda but across the entire Delivery Hero group. Each cluster a self-contained Local Business Unit: a country or region running its own business, independently. Large-scale computing of containerised environments, driven by IaaC on high-availability and high-concurrency systems.
Platform & Orchestration
Kubernetes at large scale, concurrency, resilience — thousands of distributed services, thousands of nodes, tens of thousands of ingress resources under management
Cluster lifecycle on AWS/EKS: Terraform-imported the existing clusters into a blue/green blueprint, then spun new clusters inside the same VPC/networking — communicating internally and presenting as a single perimeter entity: a "metacluster"
Macro-scale ArgoCD ecosystem: a custom plugin, a shared chart, and a layered DRY footprint repo. Reproducible infrastructure built on meta-modular abstractions, so the group could stand up new Regions and Countries fast and safe
ArgoCD and ArgoRollouts (Design, Deployment, Customisation, Workshops)
Self-service GitOps absorbed the bulk of the Jira Service Desk hot-request types — engineers spent their time reviewing PRs instead of crafting them
Reliability, Cost & Resilience
Observability: Datadog as the observability stack across the fleet
Cost engineering: introduced Spot and hybrid capacity, with on-demand fallback to tolerate instance-exhaustion events
GameDays: twice-yearly drills exercising failure scenarios — AZ-outage availability among them
People & Practice
Taking care of a small (7), talented APAC SRE squad — zero attrition and zero self-inflicted incidents during my tenure
Mentorship & upskilling: invested in team certifications (Terraform especially); the modules they shipped reached a level the team had not produced before, with continuous support so no engineer flew alone
Platform engineering at group scale: the macro-scale ArgoCD ecosystem and self-service GitOps enabled developer teams across many Local Business Units to ship to production fast and safely
Hackathons: drove the hackathon program with deliberate themes — disabilities and accessibility among them, surfacing features like ingredient-to-condition filtering (diabetes, allergies, G6PD, gluten) and visual-impairment support, several of which shipped to production
Infrastructure Security and Hardening (Topology & Perimetrical)
Mainstream Contributions
argo-rollouts — advanced canary across multiple rollout providers, so north-south and east-west traffic splits can be routed independently
kubernetes/ingress-nginx — incremental update capacity for fleets running tens of thousands of ingress resources, plus controller-runtime observability (timings, counts)
Atlantis — hardened the admin portal, and added proper SEMVER support so Terraform modules absorb minor tf-binary updates automatically across vast IaaC
Share is Care
06.2019 - 01.2021Prudential PACSSingapore
SRE Manager (Principal Engineer)
Prudential PACS · Singapore
Reports9Squads
SREArchitecture
Wins
OnPrem to cloud-nativeFull DevOps implementationComplete containerizationMAS TRM 2021-aligned SDLC
Drove a 100-year-old regulated financial institution from on-prem and early DevOps into a cloud-native, containerised operating model in almost two years. End-to-end PMO ownership delivered structural savings across cloud, licensing, and augmentation:
Leading a team of nine across SRE and Architecture squads — zero attrition during my tenure, and a materially shortened SLA turnaround on infrastructure tasks, driven by upskilling the team through certifications (CKA, CKAD, Terraform Associate, etc.)
OnPrem to Cloud Migrations Expertise — Planning, Architecture, End-to-End Execution
Containerisation and migration of legacy platforms — moved IBM WebSphere and JBoss workloads off OpenShift and VMs into AKS, containers only
Evolved Jenkins to behave like Drone-CI — declarative-YAML pipelines
TRM-aligned SDLC — crafted Prudential's SDLC covering DevOps, Agile, Pipelines, Artifact Lineage, and CAB, aligned with the MAS TRM 2021 Act (Monetary Authority of Singapore)
COVID WFH enablement — played a crucial role making sure thousands of employees could work from home safely as the pandemic started
Resilient stateful workloads — introduced Kafka self-hosted via the Confluent Kubernetes Operator and Stolon-managed self-hosted Postgres clusters; production StatefulSets with CSI-backed persistence
Production-grade MVP on AKS — OpenFaaS + Kafka + Cassandra, with autoscaling and full instrumentation through ElasticSearch + Prometheus + Grafana
Performance and production incident troubleshooting
Solutions Architecture — re-engineering and new platforms toward cloud-native and containerised, while maintaining a cost-efficient footprint
02.2019 - 07.2019DBS BankSingapore
VP of Reliability Engineering
DBS Bank · Singapore
Wins
Observability PlatformTooling and Toil ReductionSLO/ErrorBudgets compositions
Stack
PythonPrometheusGrafana
First SRE on board helping the organization understand and implement SRE practices on a legacy structure and systems:
Toil reduction by Python automation
Implementation of Monitoring platform with Prometheus and evangelisation of better monitoring practices
Definition of SLO, Error budget, Monitoring Dashboards
Development of prometheus exporters for databases and applications
Helping the organization understand what is SRE and how to implement SRE practices
Engineering process for design, build, and implementation of high availability/disaster recovery infrastructure model for Tier-1 applications across two datacenter:
Consistently delivered survey documentation packages ahead of schedule
Prepared thorough checklists, reports, and Visio drawings documenting circuit and network equipment
Elaboration of technological roadmaps for production environments
Pivotal role in the SDLC CD/CI at the operational side, automating with Puppet or Chef
Coordinated across multiple teams to complete assigned tasks and projects driven by business needs
Troubleshoot production performance issues and suggest architecture changes
04.2010 - 04.2012IndependentChina
Chinese Student & Freelance
Independent · China
Career break for language studies and cultural immersion:
Chinese Mandarin Student
English Student
Independent Freelance consulting
Traveler across China
03.2008 - 02.2010EndesaMadrid, Spain
Infrastructure & CyberSecurity Architect
Endesa · Madrid, Spain
Wins
Security architecture roadmapLab & test scenario design
Participating in the migrations of Bare-metal AIX & Linux to Virtualised environments with ESXi & vCenter
Design and Implementation of new software solutions
Administration of AIX LPAR pSeries big computing servers
Administration of company storage, FastT and Hitachi with McData and Brocade fibre switches
Tuning and Troubleshooting of platforms in developing
Patching and Updating production environments, new platforms deployments
07.2003 - 02.2004TelefonicaMadrid, Spain
Systems Operations
Telefonica · Madrid, Spain
Stack
ApacheDNSLDAPTACACS+DHCP
Infrastructure Operations:
Administration of Apache and Application Servers, performance troubleshooting, hardening and daily maintenance
Elaboration of monitoring scripts
Administration of corporative DNS, LDAP, TACACS+ and DHCP servers
02.2003 - 07.2003OrangeMadrid, Spain
Technical Support Specialist
Orange · Madrid, Spain
Stack
RADIUSADSLLinux
Before acquired by Orange, in Ya.Com:
Platform troubleshooting and escalation of incidents
Maintenance of platforms, patching, users management
Support for RIMA Network circuits (ADSL)
Administration of Radius ACL & Users
04 · Portfolio
15 entries
Every case study, fully expanded.
01 · 2023-2024Leaderboard
8x HuggingFace #1 Champion
Multiple #1 positions across two leaderboard eras
★8x #1 HuggingFace Open LLM Leaderboard
Eight #1 positions on the HuggingFace Open LLM Leaderboard across both v1 and v2 eras, competing with models from major tech firms and AI labs using original post-training techniques (UNA and MGS) applied systematically across different base architectures. Competed against 70B models with 7B, and maintained contamination-free benchmarks.
8 separate #1 positions across 2023-2024. Displaced Intel's neural-chat from #1 in November 2023. Reached #8 ALL SIZES with Cybertron v2 — a 7B model competing against 70B+ models.
#1 across ALL model sizes with TheBeagle in January 2024. Contamination-free verification with 5-gram analysis. Consistent results across Mistral, Intel, Yi/Smaug, and Qwen bases.
#1 LeaderboardLLMPost-TrainingUNAMGSOpen Source
02 · 2021-2025Open Source
Enterprise Infrastructure Contributions
Merged contributions to Kubernetes, Argo, and Atlantis
★Merged PRs in Kubernetes, Argo, Atlantis & More
Contributions to mainstream infrastructure projects used in enterprise deployments. Kubernetes ingress-nginx, Argo Rollouts, Atlantis, SurfSense. All PRs merged into mainline repositories, addressing production-scale problems.
UNAVision is a compact neural vision codec and visual tokenizer. It compresses arbitrary RGB imagery into a dense latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss — and the loss shrinks as resolution grows (inverse of typical codecs). I can batch 6x 40MP images on a single RTX 4090. Under 150K trainable parameters. 100% codebook utilization (zero dead codes). Dual continuous/discrete bottleneck on same weights with <0.10% gap.
•Loss decreases with resolution (inverse of typical codecs)
•Dual continuous/discrete bottleneck
•100% codebook utilization (zero dead codes)
•UNA Audio prototype also developed
Reconstruction Fidelity · Drag to Compare
Wildlife · 2560px
⇄
Original
Reconstructed
01
What it is
A compact neural vision codec and visual tokenizer. Compresses arbitrary RGB imagery into a dense, well-structured latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss on natural imagery.
Loss shrinks as input grows: 4–6K photos land in the 1–2% band; 40 MP cases hold there comfortably. 100% active visual vocabulary utilization — zero dead codes.
02
Memory envelope
A batch of half a dozen 40 MP images fits in a single forward pass on one RTX 4090 — no tiling, no sharding, no gradient checkpointing acrobatics, no OOM.
Possible because activation memory is dominated by the 16:1 bottleneck and the network sits under 150K trainable parameters.
Classification, Retrieval, and NLP from Embeddings Geometry
★93% Classification, 28x Faster Inference
HarEmb performs classification, retrieval, and NLP tasks by exploiting the geometry of LLM embedding matrices. Results achieved using Qwen2.5-0.5B, a very small model — demonstrating that embeddings geometry carries significant semantic information even at minimal scale. Lightweight components run 28x faster than conventional transformers.
Client
Independent — Author-attested
Role
Sole author
Duration
2025
Team
Solo
Outcomes
93.16%
AG News
90.75%
Emotion
86.01%
IMDB
83.72%
SST-2
0.941
MS MARCO MRR@10
28x
Speedup
<150M
Total Network
<20M
Trainable Params
Highlights
•Only lightweight forward pass components
•Retrieval extension with MRR@10 >0.9
•Throughput: thousands of samples per second
•Exploits embeddings geometry with lightweight components
★Only Independent Developer in Cloudflare's AI Catalog
Cloudflare hosted Cybertron 7B v2 on their global Workers AI inference platform as a first-party model — served at the edge with OpenAI-compatible endpoints, a 15,000-token context window, and a public playground. The only third-party fine-tune in their catalog under an independent-developer namespace. Hosted for nearly two years.
UNA is Uniform Neural Alignment — a transformers architecture change introducing an auxiliary loss, applied as a patch to HuggingFace Transformers models. Operates during SFT and RLHF training. Applicable to attention layers, MLP layers, or both. Memory intensive but compatible with LoRA. Training data does not need to be novel, but must not have been previously overfitted. Applied across Mistral, Intel, Yi/Smaug, Qwen2.5, LLaMA 1 & 2, Pythia, and Luxa architectures.
Where the agent, the spec, and the codebase share one surface.
★Every task lands as a reviewed spec before it lands as code.
Agentic coding on greenfield demos is easy. Doing it on the kind of code a business actually runs on — years of history, multiple owners, no authoritative map, and a context window that runs out before the work does — is where most agentic workflows fall apart. DIL exists to make that second case tractable.
Client
Independent — Ecosystem project
Role
Creator · Ecosystem author
Duration
2025
Team
Solo
Outcomes
100K+ LOC
Proven Scale
Claude Code · Codex · Kiro
Host Agents
Language · Server · UI · MCP · CLI
Surfaces
SWE Approval Gates
Review Model
Screenshots
01
The substrate
DIL is three things welded into one surface: a spec layer the agent authors against, a graph of the project's structure and relationships, and an agent integration that reaches into the host coding tool — Claude Code, Codex, or Kiro — through MCP. The server hosts all of it (database, web UI, MCP endpoints), and a CLI sits alongside for humans who prefer the terminal.
02
The loop
An agent onboards a project fly-solo — crawling, building the graph, and registering itself without supervision, while a human watches progress through the CLI or the UI. From there, every task runs the same shape: the agent produces a DIL-SPEC through a workflow pipeline, the human reviews and approves at the gates built into the flow, and implementation proceeds against the approved spec.
During the work, the agent searches and reasons across the graph, the spec layer, and the source code in a single query — the three surfaces are one. When code inevitably drifts away from the spec, the ecosystem self-heals, either through a direct command or as a native step inside the SWE workflow. Skills, SubAgents, and Commands extend the reach inside the host agent, so the integration isn't a thin adapter — it's first-class behavior.
03
Lineage
DIL is what Tree-of-Knowledge symbolic tuning becomes when you push it into the software-engineering domain and make the symbolic structure load-bearing, not academic.
MGS is MultiGumbelSampling — a regularization technique introducing Gumbel-sampled noise across signal paths during SFT/RLHF training. Combinable with UNA (UNAMGS releases) for additive performance gains.
SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).
A Redis-first, event-driven workbench with swarm intelligence for decomposing complex tasks into specialist-assigned subtasks. Features JSONRPC 2.0 + WebSocket communication, MCP integration, and React dashboard with Kanban. Architecture anticipated Anthropic's published long-running-agent harness pattern.
When you're running long coding sessions with Claude as the executor, you run out of context window before you run out of work, and the next session starts blind.
I solved that my way: a Redis-first, event-driven workbench with swarm intelligence for decomposing complex tasks into specialist-assigned subtasks — observable through a real-time dashboard.
02
Timeline
On 26-Nov-2025 Anthropic published 'Effective harnesses for long-running agents'. ClaudeBench was released approximately 8 weeks prior.
The architectural pattern ClaudeBench implements aligns with the concepts later published in that document.
Traditional distributed tracing shows what happened at runtime but can't reason about intent or surface contract mismatches. eLLMulator takes a different approach: LLM agents become your software components. Each agent studies its assigned source file, then interacts with other agents via synchronous MCP tool calls that mirror real function calls. The call graph emerges naturally from code control flow, producing traces that capture not just what happened, but why each component behaved as it did.
Claude Agent SDKMCPOpenTelemetryCode AnalysisDistributed TracingOpen Source
12 · 2026Product
Juanako — This Site
A portfolio that responds.
★Dual-audience agent, one codebase, no shared accounts.
Most portfolio sites are passive — a scroll of work. Most AI chatbots are ungrounded — they hallucinate away from whatever they're supposed to be about. Most job-search tools pick a side — they serve recruiters or candidates, rarely both. This site refuses those defaults. The visitor-facing agent is strictly grounded in what's on record, drives the interface rather than just describing it, and can produce a printable match report scoped to a recruiter's job description. A local-only companion surface turns the same product into a private career copilot — application tracking, candid notes that never leave the user's machine, and a suite of generators for the moments that matter before, during, and after a job hunt.
Client
Independent
Role
Sole author
Duration
2026 · Ongoing
Team
Solo
Outcomes
7
Deliverable Templates
Visitor + Candidate
Audiences
Self-Hostable · Free
Hosting
Local-First
Privacy Model
Highlights
•Agent drives the interface — navigation, highlighting, deep-dives happen through real actions, not claimed ones
•Seven printable deliverable templates spanning the full application arc
•Applications act as durable containers — JD plus every generated artefact pinned to the role
•Candid data stays on the user's machine; nothing personal is hosted or shared
•Generation reshapes style and wording, never substance — every claim traces to the source knowledge; voice rules cut hype and flattery without inventing facts
A portfolio that responds. The agent doesn't just answer questions — it navigates between pages, highlights the case studies and roles that map to a visitor's interest, opens deep-dives in a single step, and refuses to speculate outside what the site actually holds.
Recruiters can hand it a job description (dropped as a file or pasted as a URL) and get back a printable match report grounded in real work, real numbers, and honest gaps. One click, a new tab, a PDF ready to save.
02
The design stance
Grounded over clever. The agent is constrained to what exists on record; tailoring means picking depth from the source material, not inventing facts. Generation operates on **style and wording** — phrasing, cadence, register, ordering — never on substance: every claim, metric, project, and timeline traces back to the underlying knowledge. The voice rules cut hype, flattery, and unearned superlatives out of the first draft, but they reshape *how* the truth is told, not the truth itself. Security gets meaningful emphasis — strict isolation between the public and private surfaces, integrity checks on conversation history, and candid data that never crosses the network. One codebase, two audiences, no shared accounts, no SaaS layer, no tenants. Single user, free to run, self-hostable.
★5 Public Datasets · Powering #1 Models & RLHF Experiments
Custom datasets built for targeted training experiments. The simple-math family explores minimal arithmetic corpora for reasoning under SFT and DPO. Tree of Knowledge introduces symbolic knowledge structuring. The german-humanlike pair demonstrates downstream artifacts produced by composing SingleMoM RLHF experts.
Over 10,000 experiments tracked in Weights & Biases — sweeps, ablations, hyperparameter searches, and training runs. Each technique developed (UNA, MGS, SingleMoM, HarEmb, UNAVision) came from methodical experimentation across documented training runs.
Client
Independent — Juanako.AI
Role
Sole researcher
Duration
Ongoing
Team
Solo
Outcomes
10,000+
Total Experiments
5+
Techniques Developed
Weights & Biases
Tracking Platform
Systematic
Methodology
Highlights
•10,000+ total tracked experiments
•Systematic hyperparameter sweeps
•Architecture ablation studies
•Training dynamics analysis
•Reproducible experiment tracking
•Cross-technique comparison studies
01
Systematic ML Engineering
10,000+ total tracked experiments with systematic hyperparameter sweeps and architecture ablation studies.
Training dynamics analysis with reproducible experiment tracking. Cross-technique comparison studies across all developed methods.
MLOpsExperiment TrackingWandBAblationsSystematic Research
15 · 2000s–PresentHobby · Receipts
Engineering — Receipts Since the Early 2000s
Two Decades of Building in Public
★20+ Years Shipping — glFTPd, Docker, K8s, ML, IoT
Before AI became the headline, the craft was already there. Contributions to the glFTPd scene in the early 2000s in C, TCL, and SQL — networking primitives, sitebot tooling, and utilities. Docker images on the public registry since 2016, chasing scale, observability, and performance: MariaDB MaxScale, DBNinja, Rundeck, Cacti, and an HHVM repo-build image that packaged Facebook's top-performance PHP runtime into a container years before containerization of perf-first PHP was common. Today's tinkering continues in the same spirit — neural-net visualization, model-weight similarity analysis, declarative Jira, Kubernetes admission mutators driven by live Prometheus signals, Home Assistant + ESP smart-home glue, and ARM64/CUDA ports shared with the community. All hobby. At the job, I deliver more and better.
•glFTPd community contributions in C, TCL, and SQL since the early 2000s
•Docker Hub publisher since 2016 — performance and observability focus
•HHVM repo-build image (2016–2018) — top-performance PHP in a container before it was common
•Neural-net debugger (transviz) with time-travel replay of training sessions
•Kubernetes admission mutator driven by live Prometheus metrics (nemutator)
•Home Assistant + ESP smart-home with OTP-gated physical access
•ARM64/CUDA Viseron NVR port shared to save others the build time
•All hobby — at work, delivers more and better
Timeline
Early 2000s
glFTPd community — contributions in C (networking), TCL (sitebots and tooling), and SQL (utilities). Archived at grandis.nu/glftpd/Mr_V/.
2016
Docker Hub publishing begins — fblgit/maxscale-docker, fblgit/dbninja, fblgit/rundeck, fblgit/cacti. Early affinity for scale, observability, and performance.
2016–2018
fblgit/hhvm-repo-build — Facebook's top-performance PHP runtime (HHVM) packaged into a container, iterated across 2016, 2017, and 2018. Containerization applied to performance-first workloads before it was standard.
Jan 2021
fblgit/jarvis-iot-hassio — Home Assistant + ESP firmware + Tuya smart-home integration with ESP-powered smart gate and OTP-based access flow.
Feb 2021
fblgit/nemutator — Kubernetes admission mutation webhook that rewrites pod specs (resources, labels, env, images, selectors) from live Prometheus metrics, with Redis-backed mutation logs for rollback.
fblgit/viseron-arm64-cuda — ARM64 + CUDA port of the Viseron self-hosted NVR, shared with the community.
Feb 2024
fblgit/model-similarity — cosine-similarity analysis across transformer weights with CSV and interactive HTML reports, quantifying how close a fine-tune is to its base.
Feb 2025
fblgit/transviz — real-time neural-net visualization and debugging: tensor inspection, conditional breakpoints, training metrics, and time-travel replay of captured training sessions.
Oct 2025
fblgit/agentool — public release. Meta-framework for type-safe, composable AI workflows on top of pydantic-ai, with a three-layer architecture and state-driven execution.
01
The long game
Contributions to the glFTPd community in the early 2000s — C for networking primitives, TCL for sitebot/tooling, SQL for backing stores. Mirror archived at grandis.nu/glftpd/Mr_V/.
Docker images published on the public registry since 2016 — always chasing scale, observability, and performance. MaxScale, DBNinja, Rundeck, Cacti. The HHVM repo-build image packaged Facebook's top-performance PHP runtime into a container across 2016, 2017, and 2018 — applying containerization to performance-first workloads well before it was common practice.
02
Still tinkering
Prototypes released as they mature: transviz (real-time neural-net visualization with tensor inspection and time-travel replay of training sessions), model-similarity (cosine-similarity analysis of transformer weights with interactive HTML reports), agentool (meta-framework for type-safe, composable AI workflows on top of pydantic-ai).
Concepts and tools: jira_as_a_code (declarative YAML planning — epics, tasks, SOP templates, per-env iteration), nemutator (Kubernetes admission mutation webhook that rewrites pod specs from live Prometheus metrics without redeploy).
Hardware and community saves: jarvis-iot-hassio (Home Assistant + ESP firmware + Tuya smart-home integration with ESP-powered smart gate and OTP-based physical access), viseron-arm64-cuda (ARM64 + CUDA port of the Viseron NVR shared to save others the porting time).
03
The common thread
Every one of these is hobby. Weekend tinkering, personal itches, things that would have helped me if someone else had built them — so I built and published them instead.
At my job I deliver more and better. Same engineering instinct, wound tighter.