Exported tensors
KV caches, migration blocks, and fixture captures leave the runtime as typed little-endian bytes.
v0.1.1 released on crates.io
QATQ compresses exported KV caches and tensor migration blocks into portable QATC artefacts, restores them bit-for-bit, and has proven smaller transfer sizes than raw, zstd, and lz4 on documented public and live-migration evidence.
# exported KV tensor bytes in, lossless QATC out
qatq encode-chunked \
--mode qatq-exact \
--dtype bf16 \
--max-values-per-chunk 65536 \
kv-cache.bf16le kv-cache.qatc
# restore the same native bytes for runtime import
qatq decode kv-cache.qatc restored.bf16leQATC v2 checksum validated
Corrupt or truncated payloads are rejected before restore.
Quaternion-chain path
Selected only when it beats simpler lossless transforms.
Measured evidence
The strongest production-facing result today is lossless exported-state compression. In the documented cloud live-migration proof, QATQ transferred fewer bytes than raw blocks, lz4, and zstd while preserving measured task continuation.
| Codec | Transferred bytes | Ratio vs raw | Reduction | Bit-for-bit restore |
|---|---|---|---|---|
| Raw streamed blocks | 50,331,648 | 1.0000 | baseline | yes |
| LZ4 baseline | 28,739,217 | 0.5709 | 43.0% smaller | yes |
| Zstd baseline | 20,405,381 | 0.4054 | 59.5% smaller | yes |
| QATQ | 14,004,990 | 0.2783 | 72.2% smaller | yes |
Source: QATQ external runtime evidence, 2026-06-22 cloud live-migration proof. This proves the measured integration path, not a universal claim across every model, runtime, context length, dtype, or chunk layout.
How it works
QATQ is not a transparent GPU memory layer yet. It is a production-shaped codec for exported tensor bytes: f32, f16, and bf16 data that can be chunked, compressed, transferred, checked, and restored bit-for-bit.
KV caches, migration blocks, and fixture captures leave the runtime as typed little-endian bytes.
QATQ chooses among raw, byte-plane, zstd-backed, delta-XOR, and reversible quaternion-chain candidates.
Large tensors are split into bounded chunks with ordered lengths and an aggregate checksum.
Decode returns the same native tensor bytes for runtime import or evidence comparison.
Public fixtures
Public fixture rows are generated inside the QATQ repository so anyone can reproduce the comparison without private runtime captures.
View source tables| Dataset | Winning QATQ strategy | QATQ | zstd | lz4 | Restore |
|---|---|---|---|---|---|
| bf16-kv-ramp | byte-plane-zstd | 0.3817 | 0.4665 | 0.6901 | yes |
| bf16-kv-wave | quaternion-chain-zstd | 0.1153 | 0.2900 | 0.4693 | yes |
| f32 noisy fixture | byte-plane-zstd | 0.6532 | 0.9061 | 1.0040 | yes |
| NaN/Inf stress | quaternion-chain-zstd | 0.0121 | 0.0413 | 0.0673 | yes |
Lossless claims apply to QATQ and QATC. Lossy research paths remain comparators, not the product claim.
The current target is storage and transfer of exported KV tensors, with throughput gates and fuzzing in CI.
Live GPU VRAM reduction is a roadmap goal that needs runtime KV paging and latency proof before it becomes a claim.
QATQ is open source, Rust-native, and ready for exported KV-cache storage and transfer experiments before larger production integrations broaden the evidence base.