Eight #1 positions on the HuggingFace Open LLM Leaderboard across both v1 and v2 eras, competing with models from major tech firms and AI labs using original post-training techniques (UNA and MGS) applied systematically across different base architectures. Competed against 70B models with 7B, and maintained contamination-free benchmarks.
★ 8x #1 HuggingFace Open LLM Leaderboard
Total #1 Positions: 8Leaderboard Eras: v1 & v2Model Sizes: 1.5B to 34BBase Architectures: 4+
- 8 separate #1 positions across 2023-2024
- Displaced Intel's neural-chat from #1 (Nov 2023)
- #8 ALL SIZES with Cybertron v2 — 7B competing against 70B+
- #1 across ALL model sizes with TheBeagle (Jan 2024)
- Contamination-free verification with 5-gram analysis
- Consistent results across Mistral, Intel, Yi/Smaug, and Qwen bases
#1 LeaderboardLLMPost-TrainingUNAMGSOpen Source
Contributions to mainstream infrastructure projects used in enterprise deployments. Kubernetes ingress-nginx, Argo Rollouts, Atlantis, SurfSense. All PRs merged into mainline repositories, addressing production-scale problems.
★ Merged PRs in Kubernetes, Argo, Atlantis & More
Projects: 4+PRs Merged: 7+Scope: Enterprise-gradeStatus: Merged to mainline
- 7+ PRs merged across 4 mainstream projects
- All contributions merged into mainline repositories
- Focus on large-scale deployment challenges
KubernetesArgoAtlantisOpen SourceInfrastructureGitOps
UNAVision is a compact neural vision codec and visual tokenizer. It compresses arbitrary RGB imagery into a dense latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss — and the loss shrinks as resolution grows (inverse of typical codecs). I can batch 6x 40MP images on a single RTX 4090. Under 150K trainable parameters. 100% codebook utilization (zero dead codes). Dual continuous/discrete bottleneck on same weights with <0.10% gap.
★ 150K Params, 40MP Batching, 97.69% Fidelity
Spatial Compression: 16:1Avg Fidelity: 97.69%Peak Fidelity: 99.42%Parameters: <150K
- 16:1 spatial compression ratio
- 97.69% average reconstruction fidelity
- Under 150K trainable parameters
- Batches 6x 40MP images on single RTX 4090
- Loss decreases with resolution (inverse of typical codecs)
- Dual continuous/discrete bottleneck
- 100% codebook utilization (zero dead codes)
- UNA Audio prototype also developed
VisionImage CodecVisual TokenizerVAE AlternativeCompression
HarEmb performs classification, retrieval, and NLP tasks by exploiting the geometry of LLM embedding matrices. Results achieved using Qwen2.5-0.5B, a very small model — demonstrating that embeddings geometry carries significant semantic information even at minimal scale. Lightweight components run 28x faster than conventional transformers.
★ 93% Classification, 28x Faster Inference
AG News: 93.16%Emotion: 90.75%IMDB: 86.01%SST-2: 83.72%MS MARCO MRR@10: 0.941Speedup: 28xTotal Network: <150MTrainable Params: <20M
- Only lightweight forward pass components
- Retrieval extension with MRR@10 >0.9
- Throughput: thousands of samples per second
- Exploits embeddings geometry with lightweight components
EmbeddingsEfficient InferenceNLPContent ModerationRAG
Cloudflare hosted Cybertron 7B v2 on their global Workers AI inference platform as a first-party model — served at the edge with OpenAI-compatible endpoints, a 15,000-token context window, and a public playground. The only third-party fine-tune in their catalog under an independent-developer namespace. Hosted for nearly two years.
★ Only Independent Developer in Cloudflare's AI Catalog
Deployment Duration: ~2 yearsContext Window: 15,000 tokensModel ID: @cf/fblgit/una-cybertron-7b-v2-bf16API: OpenAI-compatible
- First-party model in Cloudflare's curated catalog
- Only independent developer namespace in Workers AI
- Nearly two years of production hosting
CloudflareEdge DeploymentProductionWorkers AIGlobal Scale
UNA is Uniform Neural Alignment — a transformers architecture change introducing an auxiliary loss, applied as a patch to HuggingFace Transformers models. Operates during SFT and RLHF training. Applicable to attention layers, MLP layers, or both. Memory intensive but compatible with LoRA. Training data does not need to be novel, but must not have been previously overfitted. Applied across Mistral, Intel, Yi/Smaug, Qwen2.5, LLaMA 1 & 2, Pythia, and Luxa architectures.
★ 8 Public Releases, Multiple #1 Positions
Public Releases: 18Base Architectures: 4+Model Sizes: 1.5B to 34B#1 Positions: Multiple
- Consistent positive delta over base models
- Multiple #1 leaderboard positions
- Applicable to different network layers
transformersdeepspeedaccelerateaxolotltorchwandbsftrhlfdistributed-training
Agentic coding on greenfield demos is easy. Doing it on the kind of code a business actually runs on — years of history, multiple owners, no authoritative map, and a context window that runs out before the work does — is where most agentic workflows fall apart. DIL exists to make that second case tractable.
★ Every task lands as a reviewed spec before it lands as code.
Proven Scale: 100K+ LOCHost Agents: Claude Code · Codex · KiroSurfaces: Language · Server · UI · MCP · CLIReview Model: SWE Approval Gates
AgenticSpec-DrivenMCPGraphRAGBrown-FieldEnterpriseClaude CodeCodexKiro
MGS is MultiGumbelSampling — a regularization technique introducing Gumbel-sampled noise across signal paths during SFT/RLHF training. Combinable with UNA (UNAMGS releases) for additive performance gains.
★ Compatible with UNA for Additive Gains
Public Releases: 5UNAMGS Releases: 4Model Sizes: 1.5B to 7B#1 Positions: Multiple
- Compatible with UNA — UNAMGS combines both
- Operates on different network paths than UNA
- First public release: Oct 2024
transformersdeepspeedaccelerateaxolotltorchwandbsftrhlfdistributed-training
SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).
★ Promising Early Results — More Research Underway
- Competes with LoRA on GLUE at a fraction of trainable params
- Zero-overhead expert switching at inference
- Experts can be combined (e.g. German × humanlike → real dataset output)
- Tested under SFT, DPO, and PPO setups
- Open directions: expressiveness, cross-domain behavior, image adapters
transformerstorchwandbsftrlhflora-alternativeparameter-efficient
10 · Agent Workbench · MIT Open SourceClaudeBench
Claude Code Best Friend
2025A Redis-first, event-driven workbench with swarm intelligence for decomposing complex tasks into specialist-assigned subtasks. Features JSONRPC 2.0 + WebSocket communication, MCP integration, and React dashboard with Kanban. Architecture anticipated Anthropic's published long-running-agent harness pattern.
★ Anticipated Anthropic's Harness Pattern
- Redis-first coordination with direct primitives
- Swarm intelligence for task decomposition
- Event-driven with JSONRPC 2.0 + WebSocket
- MCP integration from day one
- React dashboard with Kanban
- 579+ commits at time of writing
AgentClaudeMCPRedisTask ManagementSwarm Intelligence
Traditional distributed tracing shows what happened at runtime but can't reason about intent or surface contract mismatches. eLLMulator takes a different approach: LLM agents become your software components. Each agent studies its assigned source file, then interacts with other agents via synchronous MCP tool calls that mirror real function calls. The call graph emerges naturally from code control flow, producing traces that capture not just what happened, but why each component behaved as it did.
★ Open Source · Claude Agent SDK + MCP
Finding Types: 5Trace Modes: 3MCP Servers: 2License: Open Source
- Source files become autonomous Claude agents
- Agent communication mirrors real function calls via MCP
- Five finding types including contract mismatches and assumption bugs
- Three trace modes: Full, Targeted, and Lens
- OpenTelemetry export to standard observability platforms
Claude Agent SDKMCPOpenTelemetryCode AnalysisDistributed TracingOpen Source
Custom datasets built for targeted training experiments. The simple-math family explores minimal arithmetic corpora for reasoning under SFT and DPO. Tree of Knowledge introduces symbolic knowledge structuring. The german-humanlike pair demonstrates downstream artifacts produced by composing SingleMoM RLHF experts.
★ 5 Public Datasets · Powering #1 Models & RLHF Experiments
Public Datasets: 5Largest: 800K rowsUsed in #1 Models: YesCoverage: Math · Knowledge · Style
datasethuggingfacesynthetic-datarlhfdposftdata-engineering
Over 10,000 experiments tracked in Weights & Biases — sweeps, ablations, hyperparameter searches, and training runs. Each technique developed (UNA, MGS, SingleMoM, HarEmb, UNAVision) came from methodical experimentation across documented training runs.
★ 10,000+ WandB Tracked Experiments
Total Experiments: 10,000+Techniques Developed: 5+Tracking Platform: Weights & BiasesMethodology: Systematic
- 10,000+ total tracked experiments
- Systematic hyperparameter sweeps
- Architecture ablation studies
- Training dynamics analysis
- Reproducible experiment tracking
- Cross-technique comparison studies
MLOpsExperiment TrackingWandBAblationsSystematic Research
Before AI became the headline, the craft was already there. Contributions to the glFTPd scene in the early 2000s in C, TCL, and SQL — networking primitives, sitebot tooling, and utilities. Docker images on the public registry since 2016, chasing scale, observability, and performance: MariaDB MaxScale, DBNinja, Rundeck, Cacti, and an HHVM repo-build image that packaged Facebook's top-performance PHP runtime into a container years before containerization of perf-first PHP was common. Today's tinkering continues in the same spirit — neural-net visualization, model-weight similarity analysis, declarative Jira, Kubernetes admission mutators driven by live Prometheus signals, Home Assistant + ESP smart-home glue, and ARM64/CUDA ports shared with the community. All hobby. At the job, I deliver more and better.
★ 20+ Years Shipping — glFTPd, Docker, K8s, ML, IoT
Years Shipping: 20+Docker Hub Since: 2016Public Repos: DozensDomains: Net · Perf · ML · IoT
- glFTPd community contributions in C, TCL, and SQL since the early 2000s
- Docker Hub publisher since 2016 — performance and observability focus
- HHVM repo-build image (2016–2018) — top-performance PHP in a container before it was common
- Neural-net debugger (transviz) with time-travel replay of training sessions
- Kubernetes admission mutator driven by live Prometheus metrics (nemutator)
- Home Assistant + ESP smart-home with OTP-gated physical access
- ARM64/CUDA Viseron NVR port shared to save others the build time
- All hobby — at work, delivers more and better
CTCLSQLDockerKubernetesIoTOpen SourceHobby