Xiang An (Chinese: 安翔) is a Research Scientist and Team Lead of the Multimodal Large Model Group at GlintLab, specializing in computer vision and multimodal large models. You can find his research on Google Scholar (with citations) and his open-source projects on GitHub (with a total of 34,177+ stars). His current research focuses on building the next-generation Vision Transformer (ViT) to address urgent needs in modern MLLMs. He is also the #2 contributor to the InsightFace ecosystem (~27k⭐).
Ask Xiang An anything
💬 Chat RAG-enhanced
Xiang An
你好!欢迎来到安翔的主页,有什么可以帮你的吗?
Publications
The following is a selection of notable publications. For a complete list, see All Publications.
OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal IntelligencePreprint 2026PaperCodeHomepageBilibiliYouTubeIntroduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Focuses exclusively on 3.1%-25% of regions rich in signal entropy, achieving 4.1% average improvement over Qwen3-ViT on video understanding tasksFeilong Tang, Xiang An, Yunyao Yan, Yin Xie, Bin Qin, Kaicheng Yang, Yifei Shen, Yuanhan Zhang, Chunyuan Li, Shikun Feng, Changrui Chen, Huajie Tan, Ming Hu, Manyuan Zhang, Bo Li, Ziyong Feng, Ziwei Liu, Zongyuan Ge, Jiankang Deng
LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal TrainingPreprint 2025PaperCode搜狐科技Fully open-source code, data, checkpoints and training logs; Provided a better open-source ViT; Proved the idea that simple scaling dense captions would improve overall multimodal tasks performanceXiang An, Yin Xie, Kaicheng Yang, Wenkang Zhang, Xiuwei Zhao, Zheng Cheng, Changrui Chen, Zizhen Yan, Ziyong Feng, Ziwei Liu, Bo Li, Jiankang Deng, et al.
Unicom: Universal and Compact Representation Learning for Image RetrievalICLR 2023CodeUniversal and compact representation learning framework for large-scale image retrieval. Foundation for scalable image retrieval systemsXiang An, Jiankang Deng, Kaicheng Yang, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
Ranked 1st in NIST FRVT Competition, Visa Track 1:1
2024 中国年度力量人物提名
Ranked 1st in the graduate entrance examination (major)
First Place in Vehicle Re-Identification, PRCV 2019
Open Source
InsightFaceOpen Source Library#2 contributor to the open-source 2D & 3D deep face analysis library. Author of Glint360K (the largest open-source face recognition training dataset) and Partial FC (enabling training 10 million identities on a single machine). Also organized the ICCV 2021 Workshop on masked face recognition challenge.
LLaVA-OneVision-1.5Multimodal LLM FrameworkTeam Leader of this fully open framework designed to democratize multimodal training. Released mid-training and instruct data for community use, and developed offline sampling pack for efficient training. Implemented RiceViT with native resolution support.
OneVision-EncoderVision EncoderLead author of this next-generation vision encoder that introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Achieves state-of-the-art performance on 16 image, video, and document understanding benchmarks while using substantially fewer visual tokens. Demonstrates 4.1% average improvement over Qwen3-ViT on video understanding tasks.
UNICOMImage Retrieval FrameworkLead author and maintainer of Universal and Compact Representation Learning framework for universal image representations. Designed the novel cluster discrimination approach for representation learning. Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025 (Highlight)).
LLaVA-NeXTLarge Multimodal ModelVision module contributor to the next-generation large multimodal model. Enhanced the OCR capability of the vision module for better text recognition in images. Optimized the visual encoder for processing text-rich and document images.
Urban SegEducational ProjectAuthor and maintainer of this educational project for semantic segmentation on remote sensing and satellite imagery. Designed a simple single-file training approach for accessibility and integrated popular pretrained models. Created comprehensive tutorials and documentation for beginners.