Xiang An

Xiang An (Chinese: 安翔) is a Research Scientist and Team Lead of the Multimodal Large Model Group at GlintLab, specializing in computer vision and multimodal large models. You can find his research on Google Scholar and his open-source projects on GitHub (with a total of 34,177+ stars). His current research focuses on building the next-generation Vision Transformer (ViT) to address urgent needs in modern MLLMs. He is also the #2 contributor to the InsightFace ecosystem (~27k⭐).

Ask Xiang An anything

💬 Chat RAG-enhanced
Xiang An
你好!欢迎来到安翔的主页,有什么可以帮你的吗?

Publications

The following is a selection of notable publications. For a complete list, see All Publications.

  1. OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence Preprint 2026 Paper Code Homepage Bilibili YouTube Introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Focuses exclusively on 3.1%-25% of regions rich in signal entropy, achieving 4.1% average improvement over Qwen3-ViT on video understanding tasks Feilong Tang, Xiang An, Yunyao Yan, Yin Xie, Bin Qin, Kaicheng Yang, Yifei Shen, Yuanhan Zhang, Chunyuan Li, Shikun Feng, Changrui Chen, Huajie Tan, Ming Hu, Manyuan Zhang, Bo Li, Ziyong Feng, Ziwei Liu, Zongyuan Ge, Jiankang Deng
  2. LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training Preprint 2025 Paper Code 搜狐科技 Fully open-source code, data, checkpoints and training logs; Provided a better open-source ViT; Proved the idea that simple scaling dense captions would improve overall multimodal tasks performance Xiang An, Yin Xie, Kaicheng Yang, Wenkang Zhang, Xiuwei Zhao, Zheng Cheng, Changrui Chen, Zizhen Yan, Ziyong Feng, Ziwei Liu, Bo Li, Jiankang Deng, et al.
  3. UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning AAAI 2026 (Oral) Code Tiancheng Gu, Kaicheng Yang, Kaichen Zhang, Xiang An, Ziyong Feng, Yueyi Zhang, Weidong Cai, Jiankang Deng, Lidong Bing
  4. Region-based Cluster Discrimination for Visual Representation Learning ICCV 2025 (Highlight) Code Novel approach to self-supervised learning by introducing region-based cluster discrimination. Yin Xie, Kaicheng Yang, Xiang An (Project Leader), Kun Wu, Yongle Zhao, Weimo Deng, Zimin Ran, Yumeng Wang, Ziyong Feng, Jiankang Deng
  5. Multi-label Cluster Discrimination for Visual Representation Learning ECCV 2024 Code Transformers MLCD-Seg Multi-label cluster discrimination framework for self-supervised visual representation learning Xiang An, Kaicheng Yang, Xiangzi Dai, Ziyong Feng, Jiankang Deng
  6. Unicom: Universal and Compact Representation Learning for Image Retrieval ICLR 2023 Code Universal and compact representation learning framework for large-scale image retrieval. Foundation for scalable image retrieval systems Xiang An, Jiankang Deng, Kaicheng Yang, Jiawei Li, Ziyong Feng, Jia Guo, Jing Yang, Tongliang Liu
  7. Killing Two Birds with One Stone: Efficient and Robust Training of Face Recognition CNNs by Partial FC CVPR 2022 Code MXNet PyTorch 知乎 Enabling training of 10 million identities on a single machine through innovative Partial FC approach Xiang An, Jiankang Deng, Jia Guo, Ziyong Feng, Xuhan Zhu, Jing Yang, Tongliang Liu
  8. Partial FC: Training 10 Million Identities on a Single Machine ICCVW 2021 Code MXNet PyTorch 知乎 Xiang An, Xuhan Zhu, Yuan Gao, Yang Xiao, Yongle Zhao, Ziyong Feng, Lan Wu, Bin Qin, Ming Zhang, Debing Zhang, Ying Fu

Awards & Competitions

Open Source

  1. InsightFace Open Source Library #2 contributor to the open-source 2D & 3D deep face analysis library. Author of Glint360K (the largest open-source face recognition training dataset) and Partial FC (enabling training 10 million identities on a single machine). Also organized the ICCV 2021 Workshop on masked face recognition challenge.
  2. LLaVA-OneVision-1.5 Multimodal LLM Framework Team Leader of this fully open framework designed to democratize multimodal training. Released mid-training and instruct data for community use, and developed offline sampling pack for efficient training. Implemented RiceViT with native resolution support.
  3. OneVision-Encoder Vision Encoder Lead author of this next-generation vision encoder that introduces codec-aligned sparsity as a foundational principle for multimodal intelligence. Achieves state-of-the-art performance on 16 image, video, and document understanding benchmarks while using substantially fewer visual tokens. Demonstrates 4.1% average improvement over Qwen3-ViT on video understanding tasks.
  4. UNICOM Image Retrieval Framework Lead author and maintainer of Universal and Compact Representation Learning framework for universal image representations. Designed the novel cluster discrimination approach for representation learning. Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025 (Highlight)).
  5. LLaVA-NeXT Large Multimodal Model Vision module contributor to the next-generation large multimodal model. Enhanced the OCR capability of the vision module for better text recognition in images. Optimized the visual encoder for processing text-rich and document images.
  6. Urban Seg Educational Project Author and maintainer of this educational project for semantic segmentation on remote sensing and satellite imagery. Designed a simple single-file training approach for accessibility and integrated popular pretrained models. Created comprehensive tutorials and documentation for beginners.

This page is styled after Wikipedia.