Deep dive into Megatron-LM FP8 quantized training — E4M3 vs E5M2 number formats,
tensor scaling factors, delayed scaling strategy, and how FP8 GEMM achieves 1.5–2×
throughput gains on H100 GPUs with near-zero model quality loss.
A bilingual RL note page for fast lookup — covering what GRPO is, why group-relative
normalization works, how rollout/old/reference logprobs differ, how KL penalties are
computed, why credit assignment is hard, and how VERL-style rollout/training data flows.
A detailed bilingual deep-dive into Generalized Advantage Estimation (GAE) — what the
Critic is, TD residuals, Monte Carlo returns, and how the λ parameter controls the
bias-variance tradeoff, with an interactive canvas visualization and Python code.
Deep dive into RaBitQ's random rotation technique for binary quantization — how a
simple orthogonal transformation enables 32× compression with theoretical error bounds
for approximate nearest neighbor search, with 4 interactive canvas visualizations.
Comprehensive deep dive into diffusion generative models — from DDPM's forward/reverse
process and noise schedules to Flow Matching's straight ODE paths and Rectified Flow,
with 8 interactive canvas animations and bilingual (ZH/EN) support.
Apr 1, 2026
LLaVA-OneVision2Interactive VizMultimodal LLMVision-Language~3 min
Interactive visualization of LLaVA-OneVision2 architecture —
codec-aligned sampling and RoPE3D token processing.