Skip to content

Latest commit

 

History

History
155 lines (108 loc) · 8.89 KB

README.md

File metadata and controls

155 lines (108 loc) · 8.89 KB

Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation

This repository contains the code for our CVPR'2024 L3D-IVU Workshop paper, Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation.

Abstract: In Generalized Few-shot Segmentation (GFSS), a model is trained with a large corpus of base class samples and then adapted on limited samples of novel classes. This paper focuses on the relevance between base and novel classes, and improves GFSS in two aspects: 1) mining the similarity between base and novel classes to promote the learning of novel classes, and 2) mitigating the class imbalance issue caused by the volume difference between the support set and the training set. Specifically, we first propose a similarity transition matrix to guide the learning of novel classes with base class knowledge. Then, we leverage the Label-Distribution-Aware Margin (LDAM) loss and Transductive Inference to the GFSS task to address the problem of class imbalance as well as overfitting the support set. In addition, by extending the probability transition matrix, the proposed method can mitigate the catastrophic forgetting of base classes when learning novel classes. With a simple training phase, our proposed method can be applied to any segmentation network trained on base classes. We validated our methods on the adapted version of OpenEarthMap. Compared to existing GFSS baselines, our method excels them all from 3% to 7% and ranks second in the OpenEarthMap Land Cover Mapping Few-Shot Challenge at the completion of this paper.

🎬 Getting Started

1️⃣ Requirements

We used Python 3.9 in our experiments and the list of packages is available in the requirements.txt file. You can install them using pip install -r requirements.txt.

2️⃣ Download data

Pre-processed data from drive

We use a adapted version of OpenEarthMap datasets. You can download the full .zip and directly extract it in the data/ folder.

From scratch

Alternatively, you can prepare the datasets yourself. Here is the structure of the data folder for you to reproduce:

data
├── trainset
│   ├── images
│   └── labels
│   
├── valset
|   ├── images
|   └── labels
|
├── testset
|   ├── images
|   └── labels
|
├── train.txt
├── stage1_val.txt
├── test.json
└── val.json

🗺 Overview of the repo

Default configuration files can be found in config/. Data are located in data/ contains the train/val dataset. All the codes are provided in src/. Testing script is located at the root of the repo.

⚙ Training

We use ClassTrans-Train to train models on base classes. We suggest to skip this step and directly use this checkpoint to reimplement our results.

🧪 Testing

# Creating a soft link from `ClassTrans-Train/segmentation_models_pytorch` to `ClassTrans/segmentation_models_pytorch`
ln -s /your/path/ClassTrans-Train/segmentation_models_pytorch /your/path/ClassTrans
# Creating a soft link from `ClassTrans-Train/weight` to `ClassTrans/weight`
ln -s /your/path/ClassTrans-Train/weight /your/path/ClassTrans
# Run the testing script
bash test.sh

🧊 Post-processing

In test.py, you can find some post-processing of the prediction masks with extra input files, which are obtained via a vision-language model APE and a class-agnostic mask refinement model CascadePSP. We provide these files in the Class-Trans/post-process directory. If you want to reproduce our results step by step, you can refer to the following:

APE

APE is a vision-language model which can conduct open-vocabulary detection and segmentation. We directly use the released checkpoint APE-D to infer the base class sea, lake, & pond and the novel classes vehicle & cargo-trailer and sports field, using the following commands:

# sea, lake, & pond
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_ori_png/*.png --output output/cvpr2024_oem_ori_thres-0.12_water/ --confidence-threshold 0.12 --text-prompt 'water' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True

# vehicle & cargo-trailer
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_crop_256-128/*.png --output output/cvpr2024_oem_crop-256-128_thres-0.1_car/ --confidence-threshold 0.1 --text-prompt 'car' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True

# sports field
python demo/demo_lazy.py --config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py --input data/cvpr2024_oem_crop_256-128/*.png --output output/cvpr2024_oem_crop-256-128_thres-0.2_sportfield/ --confidence-threshold 0.2 --text-prompt 'sports field,basketball field,soccer field,tennis field,badminton field' --with-sseg --opts train.init_checkpoint=model_final.pth model.model_vision.select_box_nums_for_evaluation=500 model.model_vision.text_feature_bank_reset=True

Before executing the above commands, please make sure that you have successfully built the APE environment and sliced the original image into appropriate image tiles:

  1. Please refer here to build APE's reasoning environment, we highly recommend using docker to build it.

  2. Convert the RGB images from '.tif' format to '.png' format and use image2patch.py script to generate image tiles.

After reasoning with APE, use the following commands to compose the results of the image tiles into the whole image:

# get semantic mask from instance mask
python tools/get_mask_from_instance.py
# get the complete result for the whole image
python tools/patch2image.py

Note: We have confirmed that using the foundation model is consistent with the challenge rules.

Mask Refinement

We use CascadePSP to refine the mask of building type 1 & 2

# install segmentation_refinement
pip install segmentation_refinement
# get refined mask of building type 1 & 2
python tools/mask_refinement.py 

📊 Results

Class IoU
Tree 68.94964
Rangeland 49.81997
Bareland 32.84904
Agric land type 1 53.61771
Road type 1 57.60924
Sea, lake, & pond 53.97921
Building type 1 55.54934
------------------- ----------
Vehicle & cargo-trailer 37.24685
Parking space 32.26357
Sports field 49.98770
Building type 2 52.10971
mIoU for base classes 53.19631
mIoU for novel classes 42.90196
Weighted average of mIoU scores for base and novel classes 47.01970

The weighted average is calculated using 0.4:0.6 => base:novel based on SOA GFSS baseline.

🙏 Acknowledgments

We gratefully thank the authors of BAM, DIAM, APE, CascadePSP and PyTorch Semantic Segmentation from which some parts of our code are inspired.

📚 Citation

If you find this project useful, please consider citing:

@article{wang2024class,
  title={Class Similarity Transition: Decoupling Class Similarities and Imbalance from Generalized Few-shot Segmentation},
  author={Wang, Shihong and Liu, Ruixun and Li, Kaiyu and Jiang, Jiawei and Cao, Xiangyong},
  journal={arXiv preprint arXiv:2404.05111},
  year={2024}
}