Portfolio
2024-PresentResearch

UNAVision

Neural Image Codec & Visual Tokenizer

150K Params, 40MP Batching, 97.69% Fidelity

UNAVision is a compact neural vision codec and visual tokenizer. It compresses arbitrary RGB imagery into a dense latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss — and the loss shrinks as resolution grows (inverse of typical codecs). I can batch 6x 40MP images on a single RTX 4090. Under 150K trainable parameters. 100% codebook utilization (zero dead codes). Dual continuous/discrete bottleneck on same weights with <0.10% gap.

Client
Independent · Eval repo public
Role
Sole author · Architecture, training, evals
Duration
Ongoing
Team
Solo
Outcomes
16:1
Spatial Compression
97.69%
Avg Fidelity
99.42%
Peak Fidelity
<150K
Parameters
Highlights
  • 16:1 spatial compression ratio
  • 97.69% average reconstruction fidelity
  • Under 150K trainable parameters
  • Batches 6x 40MP images on single RTX 4090
  • Loss decreases with resolution (inverse of typical codecs)
  • Dual continuous/discrete bottleneck
  • 100% codebook utilization (zero dead codes)
  • UNA Audio prototype also developed
Reconstruction Fidelity · Drag to Compare
Wildlife · 2560px
ReconstructedOriginal
Original
Reconstructed
01

What it is

A compact neural vision codec and visual tokenizer. Compresses arbitrary RGB imagery into a dense, well-structured latent at a fixed 16:1 spatial ratio and reconstructs at 1–4% fidelity loss on natural imagery.

Loss shrinks as input grows: 4–6K photos land in the 1–2% band; 40 MP cases hold there comfortably. 100% active visual vocabulary utilization — zero dead codes.

02

Memory envelope

A batch of half a dozen 40 MP images fits in a single forward pass on one RTX 4090 — no tiling, no sharding, no gradient checkpointing acrobatics, no OOM.

Possible because activation memory is dominated by the 16:1 bottleneck and the network sits under 150K trainable parameters.

VisionImage CodecVisual TokenizerVAE AlternativeCompression