HEVC codec decomposition → sparse patch selection → OneVision-Encoder
Image, uniform frames, or codec-aligned tokens — all feed the same Vision Transformer via RoPE3D