Open Source Contributions

I actively contribute to open-source projects in face recognition, representation learning, and multimodal large models. I am the #2 contributor to the InsightFace ecosystem (~27k⭐), and co-maintain several influential vision and multimodal repositories.

InsightFace · 2D & 3D Face Analysis Toolkit

⭐ 27,271 Stars

Major contributor (#2 by contributions) to the core InsightFace ecosystem for large-scale face recognition and analysis.

Project Overview

InsightFace is an open-source 2D & 3D deep face analysis library with more than 27k stars on GitHub. It provides state-of-the-art face recognition, detection, alignment, and analysis capabilities.

My Contributions

Author of Glint360K, the largest open-source face recognition training dataset
Organizer of ICCV 2021 Workshop on masked face recognition challenge
Author of Partial FC, enabling training 10 million identities on a single machine
Implemented arcface_torch, an efficient distributed training framework

Key Features

State-of-the-art face recognition models (ArcFace, CosFace, etc.)
Support for large-scale training with millions of identities

View on GitHub Partial FC Paper

LLaVA-OneVision-1.5 · Multimodal Training Framework

⭐ 651 Stars

Fully open framework for democratized multimodal training, advancing large multimodal models (LMMs).

Project Overview

LLaVA-OneVision-1.5 is a fully open framework designed to democratize multimodal training. It provides a comprehensive pipeline for training and evaluating large multimodal models.

My Contributions

Team Leader of the project, leading the overall development and coordination
Released mid-training and instruct data for community use
Developed offline sampling pack for efficient training
Implemented RiceViT with native resolution support

Key Features

Fully open-source training framework for multimodal models
Efficient training with mixed-precision and distributed training support

View on GitHub Paper

LLaVA-NeXT · Next-Generation LMMs

⭐ 4,444 Stars

Contributed to the vision module of LLaVA-NeXT, enhancing its OCR capability, optimized the visual encoder and training pipeline for text-rich images.

Project Overview

LLaVA-NeXT is the next-generation large multimodal model that significantly improves upon the original LLaVA. It features enhanced visual understanding capabilities, especially for document and text-rich images.

My Contributions

Enhanced the OCR capability of the vision module for better text recognition in images
Optimized the visual encoder for processing text-rich and document images

Key Features

Enhanced visual understanding with higher resolution support
Improved OCR and document understanding capabilities

View on GitHub

UNICOM · Universal Representation for Image Retrieval

⭐ 699 Stars

Author and maintainer of Unicom, a universal and compact representation learning framework for large-scale image retrieval.

Project Overview

UNICOM (Universal and Compact Representation Learning) is a framework I developed for learning universal image representations. It enables efficient and accurate image retrieval at scale.

My Contributions

Lead author and maintainer of the entire project
Designed the novel cluster discrimination approach for representation learning
Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025)
Maintained pretrained models and provided comprehensive documentation

Key Features

Universal image representations that transfer across domains
State-of-the-art performance on image retrieval benchmarks

Publications

ICLR 2023: Unicom: Universal and Compact Representation Learning for Image Retrieval
ECCV 2024: Multi-label Cluster Discrimination for Visual Representation Learning
ICCV 2025 (Highlight): Region-based Cluster Discrimination for Visual Representation Learning

View on GitHub ICLR 2023 Paper ECCV 2024 Paper

Urban Seg · Remote Sensing Semantic Segmentation

⭐ 465 Stars

A beginner-friendly repository for remote sensing semantic segmentation. It allows training with pre-trained models using just a single code file.

Project Overview

Urban Seg is an educational project I created to help beginners get started with semantic segmentation for remote sensing and satellite imagery. It emphasizes simplicity and ease of use.

My Contributions

Author and maintainer of the entire project
Designed the simple single-file training approach for accessibility
Integrated popular pretrained models for transfer learning
Created comprehensive tutorials and documentation

Key Features

Single-file training script for quick start
Beginner-friendly with clear documentation and examples

View on GitHub

Back to Home