I actively contribute to open-source projects in face recognition, representation learning, and multimodal large models. I am the #2 contributor to the InsightFace ecosystem (~27k⭐), and co-maintain several influential vision and multimodal repositories.
Major contributor (#2 by contributions) to the core InsightFace ecosystem for large-scale face recognition and analysis.
Project Overview
InsightFace is an open-source 2D & 3D deep face analysis library with more than 27k stars on GitHub. It provides state-of-the-art face recognition, detection, alignment, and analysis capabilities.
My Contributions
- Author of Glint360K, the largest open-source face recognition training dataset
- Organizer of ICCV 2021 Workshop on masked face recognition challenge
- Author of Partial FC, enabling training 10 million identities on a single machine
- Implemented arcface_torch, an efficient distributed training framework
Key Features
- State-of-the-art face recognition models (ArcFace, CosFace, etc.)
- Support for large-scale training with millions of identities
Fully open framework for democratized multimodal training, advancing large multimodal models (LMMs).
Project Overview
LLaVA-OneVision-1.5 is a fully open framework designed to democratize multimodal training. It provides a comprehensive pipeline for training and evaluating large multimodal models.
My Contributions
Key Features
- Fully open-source training framework for multimodal models
- Efficient training with mixed-precision and distributed training support
Contributed to the vision module of LLaVA-NeXT, enhancing its OCR capability, optimized the visual encoder and training pipeline for text-rich images.
Project Overview
LLaVA-NeXT is the next-generation large multimodal model that significantly improves upon the original LLaVA. It features enhanced visual understanding capabilities, especially for document and text-rich images.
My Contributions
- Enhanced the OCR capability of the vision module for better text recognition in images
- Optimized the visual encoder for processing text-rich and document images
Key Features
- Enhanced visual understanding with higher resolution support
- Improved OCR and document understanding capabilities
Author and maintainer of Unicom, a universal and compact representation learning framework for large-scale image retrieval.
Project Overview
UNICOM (Universal and Compact Representation Learning) is a framework I developed for learning universal image representations. It enables efficient and accurate image retrieval at scale.
My Contributions
- Lead author and maintainer of the entire project
- Designed the novel cluster discrimination approach for representation learning
- Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025)
- Maintained pretrained models and provided comprehensive documentation
Key Features
- Universal image representations that transfer across domains
- State-of-the-art performance on image retrieval benchmarks
Publications
- ICLR 2023: Unicom: Universal and Compact Representation Learning for Image Retrieval
- ECCV 2024: Multi-label Cluster Discrimination for Visual Representation Learning
- ICCV 2025 (Highlight): Region-based Cluster Discrimination for Visual Representation Learning
A beginner-friendly repository for remote sensing semantic segmentation. It allows training with pre-trained models using just a single code file.
Project Overview
Urban Seg is an educational project I created to help beginners get started with semantic segmentation for remote sensing and satellite imagery. It emphasizes simplicity and ease of use.
My Contributions
- Author and maintainer of the entire project
- Designed the simple single-file training approach for accessibility
- Integrated popular pretrained models for transfer learning
- Created comprehensive tutorials and documentation
Key Features
- Single-file training script for quick start
- Beginner-friendly with clear documentation and examples