Whitepaper

From quaternion-augmented TurboQuant to lossless tensor-aware compression.

QATQ began as an investigation into quaternion structure for LLM tensor compression. The production path evolved into a lossless codec: adaptive bit-identical strategy selection, QATC chunk transport, and scoped evidence against raw, zstd and lz4 baselines.

Current release scope

Bit-for-bit f32, f16 and bf16 tensor byte restoration
QATC v2 chunk container with aggregate checksum
Reversible quaternion-chain residual path when it wins
Public fixture and live-migration evidence
Live VRAM reduction remains experimental roadmap work
Canonical technical source remains the GitHub whitepaper

Abstract

QATQ is a Rust codec for exported LLM KV caches and runtime migration artefacts. Its present product surface is lossless: encoded artefacts restore the same native tensor bytes that entered the codec. The system uses a strategy search over byte-plane transforms, zstd-backed entropy coding, adjacent-bit residuals and reversible quaternion-chain residual coding, then stores large payloads in bounded QATC chunks.

The result is a codec aimed at storage and transfer, not a transparent GPU memory layer. On the documented live-migration run, QATQ transferred 14,004,990 bytes versus 50,331,648 raw bytes, 20,405,381 zstd bytes and 28,739,217 lz4 bytes, while preserving measured continuation behaviour.

Why the design changed

The original quaternion TurboQuant direction explored lossy compression and structured approximation. That remains valuable as research context, but it was not the right default surface for live runtime migration where a single altered KV value can become a behavioural ambiguity. QATQ therefore pivots around verifiable restoration: quaternion structure is useful only when it is reversible, lowers entropy, and beats simpler byte-plane transforms after metadata is counted.

Evidence boundaries

QATQ can claim bit-for-bit restore and measured compression wins on its public fixtures and documented external runtime proof. It should not claim universal superiority across all models, runtimes, context lengths, dtypes or chunk layouts yet. The next evidence frontier is broader runtime capture, more model families and experimental live VRAM reduction through runtime-integrated KV paging.

Open canonical whitepaper Browse docs