Eight #1 positions on the HuggingFace Open LLM Leaderboard, competing with major tech firms and AI labs using original post-training techniques.
Eight #1 positions on the HuggingFace Open LLM Leaderboard across both v1 and v2 eras, competing with models from major tech firms and AI labs using original post-training techniques (UNA and MGS) applied systematically across different base architectures. Competed against 70B models with 7B, and maintained contamination-free benchmarks.
★ 8x #1 HuggingFace Open LLM Leaderboard
- 8 separate #1 positions across 2023-2024
- Displaced Intel's neural-chat from #1 (Nov 2023)
- #8 ALL SIZES with Cybertron v2 — 7B competing against 70B+
- #1 across ALL model sizes with TheBeagle (Jan 2024)
- Contamination-free verification with 5-gram analysis
- Consistent results across Mistral, Intel, Yi/Smaug, and Qwen bases
Total #1 Positions: 8Leaderboard Eras: v1 & v2Model Sizes: 1.5B to 34BBase Architectures: 4+
Client: Independent — Juanako.AIRole: Sole authorDuration: 2023-2024Team: SoloTags: #1 Leaderboard, LLM, Post-Training, UNA, MGS, Open Source
7+ PRs merged into mainline Kubernetes, Argo, Atlantis, and SurfSense — addressing real production-scale problems.
Contributions to mainstream infrastructure projects used in enterprise deployments. Kubernetes ingress-nginx, Argo Rollouts, Atlantis, SurfSense. All PRs merged into mainline repositories, addressing production-scale problems.
★ Merged PRs in Kubernetes, Argo, Atlantis & More
- 7+ PRs merged across 4 mainstream projects
- All contributions merged into mainline repositories
- Focus on large-scale deployment challenges
Projects: 4+PRs Merged: 7+Scope: Enterprise-gradeStatus: Merged to mainline
Client: Open Source CommunityRole: ContributorDuration: 2021-2025Team: Solo contributionsTags: Kubernetes, Argo, Atlantis, Open Source, Infrastructure, GitOps
Compact neural vision codec and visual tokenizer — 16:1 spatial compression at 97.69% fidelity, under 150K trainable parameters, batches 6× 40MP images on a single RTX 4090.
UNAVision is a compact neural vision codec and visual tokenizer. It compresses arbitrary RGB imagery into a dense latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss — and the loss shrinks as resolution grows (inverse of typical codecs). I can batch 6x 40MP images on a single RTX 4090. Under 150K trainable parameters. 100% codebook utilization (zero dead codes). Dual continuous/discrete bottleneck on same weights with <0.10% gap.
★ 150K Params, 40MP Batching, 97.69% Fidelity
- 16:1 spatial compression ratio
- 97.69% average reconstruction fidelity
- Under 150K trainable parameters
- Batches 6x 40MP images on single RTX 4090
- Loss decreases with resolution (inverse of typical codecs)
- Dual continuous/discrete bottleneck
Spatial Compression: 16:1Avg Fidelity: 97.69%Peak Fidelity: 99.42%Parameters: <150K
Client: Independent · Eval repo publicRole: Sole author · Architecture, training, evalsDuration: OngoingTeam: SoloTags: Vision, Image Codec, Visual Tokenizer, VAE Alternative, Compression
Classification, retrieval, and NLP tasks from LLM embedding geometry — only lightweight forward pass components, 28x faster than conventional transformers.
HarEmb performs classification, retrieval, and NLP tasks by exploiting the geometry of LLM embedding matrices. Results achieved using Qwen2.5-0.5B, a very small model — demonstrating that embeddings geometry carries significant semantic information even at minimal scale. Lightweight components run 28x faster than conventional transformers.
★ 93% Classification, 28x Faster Inference
- Only lightweight forward pass components
- Retrieval extension with MRR@10 >0.9
- Throughput: thousands of samples per second
- Exploits embeddings geometry with lightweight components
AG News: 93.16%Emotion: 90.75%IMDB: 86.01%SST-2: 83.72%MS MARCO MRR@10: 0.941Speedup: 28x
Client: Independent — Author-attestedRole: Sole authorDuration: 2025Team: SoloTags: Embeddings, Efficient Inference, NLP, Content Moderation, RAG
Cybertron 7B v2 hosted as a first-party model in Cloudflare's Workers AI catalog — the only third-party fine-tune in the lineup, served at the edge for nearly two years.
Cloudflare hosted Cybertron 7B v2 on their global Workers AI inference platform as a first-party model — served at the edge with OpenAI-compatible endpoints, a 15,000-token context window, and a public playground. The only third-party fine-tune in their catalog under an independent-developer namespace. Hosted for nearly two years.
★ Only Independent Developer in Cloudflare's AI Catalog
- First-party model in Cloudflare's curated catalog
- Only independent developer namespace in Workers AI
- Nearly two years of production hosting
Deployment Duration: ~2 yearsContext Window: 15,000 tokensModel ID: @cf/fblgit/una-cybertron-7b-v2-bf16API: OpenAI-compatible
Client: Cloudflare — Independent DeveloperRole: Model authorDuration: Dec 2023 - Oct 2025Team: SoloTags: Cloudflare, Edge Deployment, Production, Workers AI, Global Scale
An auxiliary loss-based architecture patch for HuggingFace Transformers, applied during SFT/RLHF. 18 public releases across multiple base models, with multiple #1 leaderboard positions.
UNA is Uniform Neural Alignment — a transformers architecture change introducing an auxiliary loss, applied as a patch to HuggingFace Transformers models. Operates during SFT and RLHF training. Applicable to attention layers, MLP layers, or both. Memory intensive but compatible with LoRA. Training data does not need to be novel, but must not have been previously overfitted. Applied across Mistral, Intel, Yi/Smaug, Qwen2.5, LLaMA 1 & 2, Pythia, and Luxa architectures.
★ 8 Public Releases, Multiple #1 Positions
- Consistent positive delta over base models
- Multiple #1 leaderboard positions
- Applicable to different network layers
Public Releases: 18Base Architectures: 4+Model Sizes: 1.5B to 34B#1 Positions: Multiple
Client: Independent — Juanako.AIRole: Sole author · Method, training, releasesDuration: 2023 & 2024Team: SoloTags: transformers, deepspeed, accelerate, axolotl, torch, wandb, sft, rhlf, distributed-training
A spec-driven agentic ecosystem for long-horizon engineering on enterprise brown-field code.
Agentic coding on greenfield demos is easy. Doing it on the kind of code a business actually runs on — years of history, multiple owners, no authoritative map, and a context window that runs out before the work does — is where most agentic workflows fall apart. DIL exists to make that second case tractable.
★ Every task lands as a reviewed spec before it lands as code.
Proven Scale: 100K+ LOCHost Agents: Claude Code · Codex · KiroSurfaces: Language · Server · UI · MCP · CLIReview Model: SWE Approval Gates
Client: Independent — Ecosystem projectRole: Creator · Ecosystem authorDuration: 2025Team: SoloTags: Agentic, Spec-Driven, MCP, GraphRAG, Brown-Field, Enterprise, Claude Code, Codex, Kiro
Regularization technique using Gumbel-sampled noise during SFT/RLHF. Combines with UNA (UNAMGS) for additive performance gains.
MGS is MultiGumbelSampling — a regularization technique introducing Gumbel-sampled noise across signal paths during SFT/RLHF training. Combinable with UNA (UNAMGS releases) for additive performance gains.
★ Compatible with UNA for Additive Gains
- Compatible with UNA — UNAMGS combines both
- Operates on different network paths than UNA
- First public release: Oct 2024
Public Releases: 5UNAMGS Releases: 4Model Sizes: 1.5B to 7B#1 Positions: Multiple
Client: Independent — Juanako.AIRole: Sole authorDuration: 2024-2025Team: SoloTags: transformers, deepspeed, accelerate, axolotl, torch, wandb, sft, rhlf, distributed-training
Exploratory parameter-efficient adaptation that competes with LoRA on GLUE at <0.25M trainable params, with zero-overhead expert switching at inference.
SingleMoM is an exploratory parameter-efficient adaptation approach that competes with LoRA on GLUE benchmarks at a fraction of the trainable parameter cost, while enabling zero-overhead expert switching at inference. Early experiments are encouraging — there's room to better understand its expressiveness, behavior across domains, and potential extensions (e.g. image adapters). SFT experiments on RoBERTa reproduce the LoRA paper's evaluation setup. RLHF track on LLaMA-3 explored per-expert datasets across language, conversational style, formatting, text-to-SQL, and structured output, with experts being combinable at inference (e.g. German × humanlike experts produced the fblgit/german-humanlike-clean-1k dataset).
★ Promising Early Results — More Research Underway
- Competes with LoRA on GLUE at a fraction of trainable params
- Zero-overhead expert switching at inference
- Experts can be combined (e.g. German × humanlike → real dataset output)
- Tested under SFT, DPO, and PPO setups
- Open directions: expressiveness, cross-domain behavior, image adapters
Client: Independent — Author-attestedRole: Sole authorDuration: 2024Team: SoloTags: transformers, torch, wandb, sft, rlhf, lora-alternative, parameter-efficient
Redis-first, event-driven workbench with swarm intelligence for long-running Claude coding sessions. JSONRPC + WebSocket + MCP. Open source under MIT.
A Redis-first, event-driven workbench with swarm intelligence for decomposing complex tasks into specialist-assigned subtasks. Features JSONRPC 2.0 + WebSocket communication, MCP integration, and React dashboard with Kanban. Architecture anticipated Anthropic's published long-running-agent harness pattern.
★ Anticipated Anthropic's Harness Pattern
- Redis-first coordination with direct primitives
- Swarm intelligence for task decomposition
- Event-driven with JSONRPC 2.0 + WebSocket
- MCP integration from day one
- React dashboard with Kanban
- 579+ commits at time of writing
Client: Open source (MIT)Role: Creator & sole maintainerDuration: Ongoing since 2025Team: SoloTags: Agent, Claude, MCP, Redis, Task Management, Swarm Intelligence
Each source file becomes an autonomous Claude agent communicating via MCP — surfaces contract mismatches and assumption bugs through OpenTelemetry traces.
Traditional distributed tracing shows what happened at runtime but can't reason about intent or surface contract mismatches. eLLMulator takes a different approach: LLM agents become your software components. Each agent studies its assigned source file, then interacts with other agents via synchronous MCP tool calls that mirror real function calls. The call graph emerges naturally from code control flow, producing traces that capture not just what happened, but why each component behaved as it did.
★ Open Source · Claude Agent SDK + MCP
- Source files become autonomous Claude agents
- Agent communication mirrors real function calls via MCP
- Five finding types including contract mismatches and assumption bugs
- Three trace modes: Full, Targeted, and Lens
- OpenTelemetry export to standard observability platforms
Finding Types: 5Trace Modes: 3MCP Servers: 2License: Open Source
Client: Open sourceRole: CreatorDuration: 2025Team: SoloTags: Claude Agent SDK, MCP, OpenTelemetry, Code Analysis, Distributed Tracing, Open Source
This site. An agentic portfolio that doubles as a private career copilot — single-user, free, local-first by default.
Most portfolio sites are passive — a scroll of work. Most AI chatbots are ungrounded — they hallucinate away from whatever they're supposed to be about. Most job-search tools pick a side — they serve recruiters or candidates, rarely both. This site refuses those defaults. The visitor-facing agent is strictly grounded in what's on record, drives the interface rather than just describing it, and can produce a printable match report scoped to a recruiter's job description. A local-only companion surface turns the same product into a private career copilot — application tracking, candid notes that never leave the user's machine, and a suite of generators for the moments that matter before, during, and after a job hunt.
★ Dual-audience agent, one codebase, no shared accounts.
- Agent drives the interface — navigation, highlighting, deep-dives happen through real actions, not claimed ones
- Seven printable deliverable templates spanning the full application arc
- Applications act as durable containers — JD plus every generated artefact pinned to the role
- Candid data stays on the user's machine; nothing personal is hosted or shared
- Generation reshapes style and wording, never substance — every claim traces to the source knowledge; voice rules cut hype and flattery without inventing facts
- Meaningful emphasis on security — surface isolation, session integrity, clear public/private boundaries
Deliverable Templates: 7Audiences: Visitor + CandidateHosting: Self-Hostable · FreePrivacy Model: Local-First
Client: IndependentRole: Sole authorDuration: 2026 · OngoingTeam: SoloTags: Agentic, Grounded LLM, Single-User, Free, Local-First, Dual-Audience, Portfolio, Career Copilot
Five custom datasets across math, knowledge, and RLHF — used in #1 leaderboard models and SingleMoM expert composition experiments.
Custom datasets built for targeted training experiments. The simple-math family explores minimal arithmetic corpora for reasoning under SFT and DPO. Tree of Knowledge introduces symbolic knowledge structuring. The german-humanlike pair demonstrates downstream artifacts produced by composing SingleMoM RLHF experts.
★ 5 Public Datasets · Powering #1 Models & RLHF Experiments
Public Datasets: 5Largest: 800K rowsUsed in #1 Models: YesCoverage: Math · Knowledge · Style
Client: Independent — Juanako.AIRole: Dataset authorDuration: 2023-2025Team: SoloTags: dataset, huggingface, synthetic-data, rlhf, dpo, sft, data-engineering
Over 10,000 documented experiments in Weights & Biases — sweeps, ablations, and training runs underpinning every published technique.
Over 10,000 experiments tracked in Weights & Biases — sweeps, ablations, hyperparameter searches, and training runs. Each technique developed (UNA, MGS, SingleMoM, HarEmb, UNAVision) came from methodical experimentation across documented training runs.
★ 10,000+ WandB Tracked Experiments
- 10,000+ total tracked experiments
- Systematic hyperparameter sweeps
- Architecture ablation studies
- Training dynamics analysis
- Reproducible experiment tracking
- Cross-technique comparison studies
Total Experiments: 10,000+Techniques Developed: 5+Tracking Platform: Weights & BiasesMethodology: Systematic
Client: Independent — Juanako.AIRole: Sole researcherDuration: OngoingTeam: SoloTags: MLOps, Experiment Tracking, WandB, Ablations, Systematic Research
Two decades of building in public — from glFTPd community tools in C/TCL/SQL in the early 2000s, to performance-first Docker images in 2016, to neural-net debuggers, admission mutators, and smart-home IoT today.
Before AI became the headline, the craft was already there. Contributions to the glFTPd scene in the early 2000s in C, TCL, and SQL — networking primitives, sitebot tooling, and utilities. Docker images on the public registry since 2016, chasing scale, observability, and performance: MariaDB MaxScale, DBNinja, Rundeck, Cacti, and an HHVM repo-build image that packaged Facebook's top-performance PHP runtime into a container years before containerization of perf-first PHP was common. Today's tinkering continues in the same spirit — neural-net visualization, model-weight similarity analysis, declarative Jira, Kubernetes admission mutators driven by live Prometheus signals, Home Assistant + ESP smart-home glue, and ARM64/CUDA ports shared with the community. All hobby. At the job, I deliver more and better.
★ 20+ Years Shipping — glFTPd, Docker, K8s, ML, IoT
- glFTPd community contributions in C, TCL, and SQL since the early 2000s
- Docker Hub publisher since 2016 — performance and observability focus
- HHVM repo-build image (2016–2018) — top-performance PHP in a container before it was common
- Neural-net debugger (transviz) with time-travel replay of training sessions
- Kubernetes admission mutator driven by live Prometheus metrics (nemutator)
- Home Assistant + ESP smart-home with OTP-gated physical access
Years Shipping: 20+Docker Hub Since: 2016Public Repos: DozensDomains: Net · Perf · ML · IoT
Client: Independent — CommunityRole: Author / ContributorDuration: Early 2000s – OngoingTeam: SoloTags: C, TCL, SQL, Docker, Kubernetes, IoT, Open Source, Hobby