Xavier Murias

Infrastructure & AI Engineer · Singapore

[email protected]·juanako.ai·github.com/fblgit·huggingface.co/fblgit·linkedin.com/in/xmm-sg

About

Twenty years building systems that scale — from bare-metal datacenters to Kubernetes fleets running 7,000+ services. SRE, DevOps, platform engineering, and now AI infrastructure.

On the AI side: 8x #1 on the HuggingFace Open LLM Leaderboard. Author of UNA (Uniform Neural Alignment) and MGS (MultiGumbelSampling). Cybertron 7B ran on Cloudflare Workers AI for nearly two years as the only independent developer model in their catalog.

Open source contributor to Kubernetes ingress-nginx, Argo Rollouts, and Atlantis. 10,000+ tracked experiments in Weights & Biases. Building infrastructure that disappears into use and models that punch above their weight class.

Currently

Head of Infrastructure and Science · Xendit · since 2022

Languages

Spanish (Native) · English (Fluent) · Italian (Intermediate) · Chinese (Conversational)

Stack

Kubernetes · Helm · Terraform · AWS · GCP · Cloudflare · Go · Python · PyTorch · Transformers · ArgoCD · Argo Rollouts · Atlantis · GitOps · Datadog · Cilium · KEDA · CDN · CI/CD

Work Preferences

Remote · Hybrid · TZ: CET · SGT · AU

Relocation

EU · US · AU · SG

Selected Work

01 · Leaderboard Achievement

8x HuggingFace #1 Champion

Multiple #1 positions across two leaderboard eras

2023-2024

Eight #1 positions on the HuggingFace Open LLM Leaderboard across both v1 and v2 eras, competing with models from major tech firms and AI labs using original post-training techniques (UNA and MGS) applied systematically across different base architectures. Competed against 70B models with 7B, and maintained contamination-free benchmarks.

★ 8x #1 HuggingFace Open LLM Leaderboard

Total #1 Positions: 8Leaderboard Eras: v1 & v2Model Sizes: 1.5B to 34BBase Architectures: 4+

8 separate #1 positions across 2023-2024
Displaced Intel's neural-chat from #1 (Nov 2023)
#8 ALL SIZES with Cybertron v2 — 7B competing against 70B+
#1 across ALL model sizes with TheBeagle (Jan 2024)
Contamination-free verification with 5-gram analysis
Consistent results across Mistral, Intel, Yi/Smaug, and Qwen bases

#1 LeaderboardLLMPost-TrainingUNAMGSOpen Source

02 · Open Source Contributions

Enterprise Infrastructure Contributions

Merged contributions to Kubernetes, Argo, and Atlantis

2021-2025

Contributions to mainstream infrastructure projects used in enterprise deployments. Kubernetes ingress-nginx, Argo Rollouts, Atlantis, SurfSense. All PRs merged into mainline repositories, addressing production-scale problems.

★ Merged PRs in Kubernetes, Argo, Atlantis & More

Projects: 4+PRs Merged: 7+Scope: Enterprise-gradeStatus: Merged to mainline

7+ PRs merged across 4 mainstream projects
All contributions merged into mainline repositories
Focus on large-scale deployment challenges

KubernetesArgoAtlantisOpen SourceInfrastructureGitOps

03 · Neural Image Codec & Visual Tokenizer

UNAVision

Neural Image Codec & Visual Tokenizer

2024-Present

UNAVision is a compact neural vision codec and visual tokenizer. It compresses arbitrary RGB imagery into a dense latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss — and the loss shrinks as resolution grows (inverse of typical codecs). I can batch 6x 40MP images on a single RTX 4090. Under 150K trainable parameters. 100% codebook utilization (zero dead codes). Dual continuous/discrete bottleneck on same weights with <0.10% gap.

★ 150K Params, 40MP Batching, 97.69% Fidelity

Spatial Compression: 16:1Avg Fidelity: 97.69%Peak Fidelity: 99.42%Parameters: <150K

16:1 spatial compression ratio
97.69% average reconstruction fidelity
Under 150K trainable parameters
Batches 6x 40MP images on single RTX 4090
Loss decreases with resolution (inverse of typical codecs)
Dual continuous/discrete bottleneck
100% codebook utilization (zero dead codes)
UNA Audio prototype also developed

VisionImage CodecVisual TokenizerVAE AlternativeCompression

04 · Research · Embedding-Geometry NLP

HarEmb

Classification, Retrieval, and NLP from Embeddings Geometry

2025

HarEmb performs classification, retrieval, and NLP tasks by exploiting the geometry of LLM embedding matrices. Results achieved using Qwen2.5-0.5B, a very small model — demonstrating that embeddings geometry carries significant semantic information even at minimal scale. Lightweight components run 28x faster than conventional transformers.

★ 93% Classification, 28x Faster Inference

AG News: 93.16%Emotion: 90.75%IMDB: 86.01%SST-2: 83.72%MS MARCO MRR@10: 0.941Speedup: 28xTotal Network: <150MTrainable Params: <20M

Only lightweight forward pass components
Retrieval extension with MRR@10 >0.9
Throughput: thousands of samples per second
Exploits embeddings geometry with lightweight components

EmbeddingsEfficient InferenceNLPContent ModerationRAG

05 · Production Deployment

Cloudflare Workers AI

Global Edge Deployment for ~2 Years

2023-2025

Cloudflare hosted Cybertron 7B v2 on their global Workers AI inference platform as a first-party model — served at the edge with OpenAI-compatible endpoints, a 15,000-token context window, and a public playground. The only third-party fine-tune in their catalog under an independent-developer namespace. Hosted for nearly two years.

★ Only Independent Developer in Cloudflare's AI Catalog

Deployment Duration: ~2 yearsContext Window: 15,000 tokensModel ID: @cf/fblgit/una-cybertron-7b-v2-bf16API: OpenAI-compatible

First-party model in Cloudflare's curated catalog
Only independent developer namespace in Workers AI
Nearly two years of production hosting

CloudflareEdge DeploymentProductionWorkers AIGlobal Scale

06 · SFT/RLHF Technique

UNA — Uniform Neural Alignment

Yet-Unpublished LLM SFT/RLHF Technique

2023-2024

UNA is Uniform Neural Alignment — a transformers architecture change introducing an auxiliary loss, applied as a patch to HuggingFace Transformers models. Operates during SFT and RLHF training. Applicable to attention layers, MLP layers, or both. Memory intensive but compatible with LoRA. Training data does not need to be novel, but must not have been previously overfitted. Applied across Mistral, Intel, Yi/Smaug, Qwen2.5, LLaMA 1 & 2, Pythia, and Luxa architectures.

★ 8 Public Releases, Multiple #1 Positions

Public Releases: 18Base Architectures: 4+Model Sizes: 1.5B to 34B#1 Positions: Multiple

Consistent positive delta over base models
Multiple #1 leaderboard positions
Applicable to different network layers

transformersdeepspeedaccelerateaxolotltorchwandbsftrhlfdistributed-training

07 · Agentic Ecosystem · Spec-Driven Engineering

DIL — Domain Intent Language

Where the agent, the spec, and the codebase share one surface.

2025

Agentic coding on greenfield demos is easy. Doing it on the kind of code a business actually runs on — years of history, multiple owners, no authoritative map, and a context window that runs out before the work does — is where most agentic workflows fall apart. DIL exists to make that second case tractable.

★ Every task lands as a reviewed spec before it lands as code.

Proven Scale: 100K+ LOCHost Agents: Claude Code · Codex · KiroSurfaces: Language · Server · UI · MCP · CLIReview Model: SWE Approval Gates

AgenticSpec-DrivenMCPGraphRAGBrown-FieldEnterpriseClaude CodeCodexKiro

08 · SFT/RLHF Regularization

MGS — MultiGumbelSampling

Yet-Unpublished LLM Regularization Technique

2024-2025

MGS is MultiGumbelSampling — a regularization technique introducing Gumbel-sampled noise across signal paths during SFT/RLHF training. Combinable with UNA (UNAMGS releases) for additive performance gains.

★ Compatible with UNA for Additive Gains

Public Releases: 5UNAMGS Releases: 4Model Sizes: 1.5B to 7B#1 Positions: Multiple

Compatible with UNA — UNAMGS combines both
Operates on different network paths than UNA
First public release: Oct 2024

transformersdeepspeedaccelerateaxolotltorchwandbsftrhlfdistributed-training

09 · Parameter-Efficient Adaptation

SingleMoM

Exploratory Parameter-Efficient Adaptation

2024

SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).

★ Promising Early Results — More Research Underway

Competes with LoRA on GLUE at a fraction of trainable params
Zero-overhead expert switching at inference
Experts can be combined (e.g. German × humanlike → real dataset output)
Tested under SFT, DPO, and PPO setups
Open directions: expressiveness, cross-domain behavior, image adapters

transformerstorchwandbsftrlhflora-alternativeparameter-efficient

10 · Agent Workbench · MIT Open Source

ClaudeBench

Claude Code Best Friend

2025

A Redis-first, event-driven workbench with swarm intelligence for decomposing complex tasks into specialist-assigned subtasks. Features JSONRPC 2.0 + WebSocket communication, MCP integration, and React dashboard with Kanban. Architecture anticipated Anthropic's published long-running-agent harness pattern.

★ Anticipated Anthropic's Harness Pattern

Redis-first coordination with direct primitives
Swarm intelligence for task decomposition
Event-driven with JSONRPC 2.0 + WebSocket
MCP integration from day one
React dashboard with Kanban
579+ commits at time of writing

AgentClaudeMCPRedisTask ManagementSwarm Intelligence

11 · Agentic Distributed Trace Simulation

eLLMulator

Agentic Distributed Trace Simulation

2025

Traditional distributed tracing shows what happened at runtime but can't reason about intent or surface contract mismatches. eLLMulator takes a different approach: LLM agents become your software components. Each agent studies its assigned source file, then interacts with other agents via synchronous MCP tool calls that mirror real function calls. The call graph emerges naturally from code control flow, producing traces that capture not just what happened, but why each component behaved as it did.

★ Open Source · Claude Agent SDK + MCP

Finding Types: 5Trace Modes: 3MCP Servers: 2License: Open Source

Source files become autonomous Claude agents
Agent communication mirrors real function calls via MCP
Five finding types including contract mismatches and assumption bugs
Three trace modes: Full, Targeted, and Lens
OpenTelemetry export to standard observability platforms

Claude Agent SDKMCPOpenTelemetryCode AnalysisDistributed TracingOpen Source

12 · Custom Datasets

Training Datasets

Datasets for Training, Reasoning, and RLHF

2023-2025

Custom datasets built for targeted training experiments. The simple-math family explores minimal arithmetic corpora for reasoning under SFT and DPO. Tree of Knowledge introduces symbolic knowledge structuring. The german-humanlike pair demonstrates downstream artifacts produced by composing SingleMoM RLHF experts.

★ 5 Public Datasets · Powering #1 Models & RLHF Experiments

Public Datasets: 5Largest: 800K rowsUsed in #1 Models: YesCoverage: Math · Knowledge · Style

datasethuggingfacesynthetic-datarlhfdposftdata-engineering

13 · ML Engineering

10,000+ Tracked Experiments

2023-2025

Over 10,000 experiments tracked in Weights & Biases — sweeps, ablations, hyperparameter searches, and training runs. Each technique developed (UNA, MGS, SingleMoM, HarEmb, UNAVision) came from methodical experimentation across documented training runs.

★ 10,000+ WandB Tracked Experiments

Total Experiments: 10,000+Techniques Developed: 5+Tracking Platform: Weights & BiasesMethodology: Systematic

10,000+ total tracked experiments
Systematic hyperparameter sweeps
Architecture ablation studies
Training dynamics analysis
Reproducible experiment tracking
Cross-technique comparison studies

MLOpsExperiment TrackingWandBAblationsSystematic Research

14 · Track Record · Tinkering · Hobby

Engineering — Receipts Since the Early 2000s

Two Decades of Building in Public

2000s–Present

Before AI became the headline, the craft was already there. Contributions to the glFTPd scene in the early 2000s in C, TCL, and SQL — networking primitives, sitebot tooling, and utilities. Docker images on the public registry since 2016, chasing scale, observability, and performance: MariaDB MaxScale, DBNinja, Rundeck, Cacti, and an HHVM repo-build image that packaged Facebook's top-performance PHP runtime into a container years before containerization of perf-first PHP was common. Today's tinkering continues in the same spirit — neural-net visualization, model-weight similarity analysis, declarative Jira, Kubernetes admission mutators driven by live Prometheus signals, Home Assistant + ESP smart-home glue, and ARM64/CUDA ports shared with the community. All hobby. At the job, I deliver more and better.

★ 20+ Years Shipping — glFTPd, Docker, K8s, ML, IoT

Years Shipping: 20+Docker Hub Since: 2016Public Repos: DozensDomains: Net · Perf · ML · IoT

glFTPd community contributions in C, TCL, and SQL since the early 2000s
Docker Hub publisher since 2016 — performance and observability focus
HHVM repo-build image (2016–2018) — top-performance PHP in a container before it was common
Neural-net debugger (transviz) with time-travel replay of training sessions
Kubernetes admission mutator driven by live Prometheus metrics (nemutator)
Home Assistant + ESP smart-home with OTP-gated physical access
ARM64/CUDA Viseron NVR port shared to save others the build time
All hobby — at work, delivers more and better

CTCLSQLDockerKubernetesIoTOpen SourceHobby