Xiang An - LLaVA-OneVision2

Prophet
Exclusive

Codec-Aligned Sparse Vision Encoding for Video Understanding

HEVC codec decomposition → sparse patch selection → OneVision-Encoder

Image, uniform frames, or codec-aligned tokens — all feed the same Vision Transformer via RoPE3D