I actively contribute to open-source projects in face recognition, representation learning, and multimodal large models. I am the #2 contributor to the InsightFace ecosystem (~27k⭐), and co-maintain several influential vision and multimodal repositories.

InsightFace · 2D & 3D Face Analysis Toolkit

⭐ 27k+ Stars

Major contributor (#2 by contributions) to the core InsightFace ecosystem for large-scale face recognition and analysis.

Project Overview

InsightFace is an open-source 2D & 3D deep face analysis library with more than 27k stars on GitHub. It provides state-of-the-art face recognition, detection, alignment, and analysis capabilities.

My Contributions

  • Author of Glint360K, the largest open-source face recognition training dataset
  • Organizer of ICCV 2021 Workshop on masked face recognition challenge
  • Author of Partial FC, enabling training 10 million identities on a single machine
  • Implemented arcface_torch, an efficient distributed training framework

Key Features

  • State-of-the-art face recognition models (ArcFace, CosFace, etc.)
  • Support for large-scale training with millions of identities

LLaVA-OneVision-1.5 · Multimodal Training Framework

⭐ 600+ Stars

Fully open framework for democratized multimodal training, advancing large multimodal models (LMMs).

Project Overview

LLaVA-OneVision-1.5 is a fully open framework designed to democratize multimodal training. It provides a comprehensive pipeline for training and evaluating large multimodal models.

My Contributions

Key Features

  • Fully open-source training framework for multimodal models
  • Efficient training with mixed-precision and distributed training support

LLaVA-NeXT · Next-Generation LMMs

⭐ 4000+ Stars

Contributed to the vision module of LLaVA-NeXT, enhancing its OCR capability, optimized the visual encoder and training pipeline for text-rich images.

Project Overview

LLaVA-NeXT is the next-generation large multimodal model that significantly improves upon the original LLaVA. It features enhanced visual understanding capabilities, especially for document and text-rich images.

My Contributions

  • Enhanced the OCR capability of the vision module for better text recognition in images
  • Optimized the visual encoder for processing text-rich and document images

Key Features

  • Enhanced visual understanding with higher resolution support
  • Improved OCR and document understanding capabilities

UNICOM · Universal Representation for Image Retrieval

⭐ 600+ Stars

Author and maintainer of Unicom, a universal and compact representation learning framework for large-scale image retrieval.

Project Overview

UNICOM (Universal and Compact Representation Learning) is a framework I developed for learning universal image representations. It enables efficient and accurate image retrieval at scale.

My Contributions

  • Lead author and maintainer of the entire project
  • Designed the novel cluster discrimination approach for representation learning
  • Developed the multi-label and region-based extensions (published at ECCV 2024 and ICCV 2025)
  • Maintained pretrained models and provided comprehensive documentation

Key Features

  • Universal image representations that transfer across domains
  • State-of-the-art performance on image retrieval benchmarks

Publications

  • ICLR 2023: Unicom: Universal and Compact Representation Learning for Image Retrieval
  • ECCV 2024: Multi-label Cluster Discrimination for Visual Representation Learning
  • ICCV 2025 (Highlight): Region-based Cluster Discrimination for Visual Representation Learning

Urban Seg · Remote Sensing Semantic Segmentation

⭐ 460+ Stars

A beginner-friendly repository for remote sensing semantic segmentation. It allows training with pre-trained models using just a single code file.

Project Overview

Urban Seg is an educational project I created to help beginners get started with semantic segmentation for remote sensing and satellite imagery. It emphasizes simplicity and ease of use.

My Contributions

  • Author and maintainer of the entire project
  • Designed the simple single-file training approach for accessibility
  • Integrated popular pretrained models for transfer learning
  • Created comprehensive tutorials and documentation

Key Features

  • Single-file training script for quick start
  • Beginner-friendly with clear documentation and examples
Back to Home