Xiang An - Blog

Apr 11, 2026

OpenCode 原理解析：开源 AI Coding Agent 的架构与设计 Research Note OpenCode AI Agent Architecture MCP ~20 min

深度解析 OpenCode 的 Client/Server 架构、多层 Agent 系统、14 个内置 Tool 的权限模型、LSP 原生集成、 MCP 协议支持，以及 Skills/Plugins 扩展机制，揭示这个 141k star 开源 AI 编程助手的设计哲学。

Apr 8, 2026

FunASR 调研：从原理、模型家族到部署实践指南 Research Note FunASR ASR Paraformer SenseVoice Runtime ~14 min

深入梳理 FunASR 的统一接口、可组合语音流水线与 runtime 服务能力，覆盖 Paraformer / SenseVoice / VAD / 标点 / 时间戳 / ONNX / 流式部署，并给出更贴近工程落地的模型选型建议。

Apr 8, 2026

视频理解Benchmark中的字幕评测：从VideoMME到VideoMME-v2 Research Note VideoMME Subtitle lmms_eval Qwen3-VL ~12 min

从 VideoMME 到 VideoMME-v2，系统拆解字幕如何进入视频理解评测：包括 benchmark 任务设计、lmms_eval 与 VLMEvalKit 的 SRT 解析/帧对齐/Prompt 注入流程，以及 Qwen3-VL 一类模型对字幕信息的实际利用方式。

Apr 7, 2026

📌 Latest Video Codec 基础 & OneVision-Encoder Codec 对齐的稀疏 Patch 选择 Interactive Viz Video Codec OneVision-Encoder Patch Selection ~15 min

视频编码核心概念图解（残差、运动向量、GOP、I/P帧）+ OneVision-Encoder Codec 对齐的稀疏 Patch 选择机制 — 能量融合热力图、Square576 Patch 化、全局 TopK 竞争、Mosaic 打包。含 KaTeX 公式与 9 个 Canvas 交互演示。

Apr 4, 2026

Megatron FP8 量化训练：从数据格式到大规模落地 Interactive Viz FP8 Quantization LLM Training ~12 min

Deep dive into Megatron-LM FP8 quantized training — E4M3 vs E5M2 number formats, tensor scaling factors, delayed scaling strategy, and how FP8 GEMM achieves 1.5–2× throughput gains on H100 GPUs with near-zero model quality loss.

Apr 4, 2026

VERL & GRPO：RL 训练常见问题备忘录 Interactive Viz RLHF GRPO VERL ~12 min

A bilingual RL note page for fast lookup — covering what GRPO is, why group-relative normalization works, how rollout/old/reference logprobs differ, how KL penalties are computed, why credit assignment is hard, and how VERL-style rollout/training data flows.

Apr 4, 2026

GAE: Generalized Advantage Estimation — Critic, TD残差与偏差-方差权衡 Interactive Viz Reinforcement Learning PPO Actor-Critic ~10 min

A detailed bilingual deep-dive into Generalized Advantage Estimation (GAE) — what the Critic is, TD residuals, Monte Carlo returns, and how the λ parameter controls the bias-variance tradeoff, with an interactive canvas visualization and Python code.

Apr 4, 2026

RaBitQ: Random Rotation for Binary Quantization Interactive Viz Vector Search Quantization ANN ~8 min

Deep dive into RaBitQ's random rotation technique for binary quantization — how a simple orthogonal transformation enables 32× compression with theoretical error bounds for approximate nearest neighbor search, with 4 interactive canvas visualizations.

Apr 3, 2026

Diffusion Models: From DDPM to Flow Matching Interactive Viz DDPM Flow Matching Rectified Flow ~15 min

Comprehensive deep dive into diffusion generative models — from DDPM's forward/reverse process and noise schedules to Flow Matching's straight ODE paths and Rectified Flow, with 8 interactive canvas animations and bilingual (ZH/EN) support.

Apr 1, 2026

LLaVA-OneVision2 Interactive Viz Multimodal LLM Vision-Language ~3 min

Interactive visualization of LLaVA-OneVision2 architecture — codec-aligned sampling and RoPE3D token processing.

Apr 2, 2026

AReaL: Fully Asynchronous RL Training for LLMs Interactive Viz Async RL PPO TRPO ~7 min

Animated walkthrough of AReaL's async RL architecture — interruptible rollouts, decoupled PPO, TRPO trust-region theory, and how 4 components achieve 2.77× speedup over synchronous systems.

Apr 1, 2026

Understanding Partial FC: Ultra-Large-Scale Parallel Classification Interactive Viz Distributed Training Contrastive Learning ~5 min

A brief introduction to the Partial FC algorithm — how partial class center sampling makes ultra-large-scale parallel classification feasible.

Mar 24, 2026

YaRN: Efficient Context Window Extension for LLMs Interactive Viz LLM RoPE Scaling ~4 min

Interactive visualization of YaRN (Yet another RoPE extensioN) — extending LLM context windows via NTK-aware interpolation and attention scaling.