Skip to content

Latest commit

 

History

History
270 lines (206 loc) · 30.2 KB

README.md

File metadata and controls

270 lines (206 loc) · 30.2 KB

SSAV (CVPR2019-Oral)

Code for paper in CVPR2019, 'Shifting More Attention to Video Salient Object Detection'

Authors: Deng-Ping Fan, Wenguan Wang, Ming-Ming Cheng, Jianbing Shen.

[Supplementary material is also attached in the repo]

Paper with code: https://paperswithcode.com/task/video-salient-object-detection


Figure 1: OVerall architecture of the proposed SSAV model.

Table of Contents

Abstract

The last decade has witnessed a growing interest in video salient object detection (VSOD). However, the research community long-term lacked a well-established VSOD dataset representative of real dynamic scenes with high-quality annotations. To address this issue, we elaborately collected a visual-attention-consistent Densely Annotated VSOD (DAVSOD) dataset, which contains 226 videos with 23,938 frames that cover diverse realistic-scenes, objects, instances and motions. With corresponding real human eye-fixation data, we obtain precise ground-truths. This is the first work that explicitly emphasizes the challenge of saliency shift, i.e., the video salient object(s) may dynamically change. To further contribute the community a complete benchmark, we systematically assess 17 representative VSOD algorithms over seven existing VSOD datasets and our DAVSOD with totally ~84K frames (largest-scale). Utilizing three famous metrics, we then present a comprehensive and insightful performance analysis. Furthermore, we propose a baseline model. It is equipped with a saliencyshift-aware convLSTM, which can efficiently capture video saliency dynamics through learning human attention-shift behavior. Extensive experiments1 open up promising future directions for model development and comparison.

Notion of saliency shift

The saliency shift is not just represented as a binary signal, w.r.t., whether it happens in a certain frame. Since we focus on an object-level task, we change the saliency values of different objects according to the shift of human attention. The rich annotations, including saliency shift, object-/instance-level ground-truths (GT), salient object numbers, scene/object categories, and camera/object motions, provide a solid foundation for VSOD task and benefit a wide range of potential applications.


Figure 2: Annotation examples of our DAVSOD dataset.

Statistics of DAVSOD


Figure 3: Statistics of the proposed DAVSOD dataset.

Figure 3 shows (a) Scene/object categories. (b, c) Distribution of annotated instances and image frames, respectively. (d) Ratio distribution of the objects/instances. (e) Mutual dependencies among scene categories in (a).

Downloads

  1. DAVSOD dataset.

    Note:

    • The DAVSOD-name-v2.xlsx deleted some description of missing videos (0124、0291、0413、0556). Some video like 0189_1 share with the same attributes with 0189. These shared videos including: 0064_1, 0066_1, 0182_1, 0189_1, 0256_1, 0616_1, 0675_1, 0345_1, 0590_2, 0318_1, 0328_1, 0590_1, 0194_1, 0321_1, 0590_3
    • we merge the small sequence of the training set. Thus, there are only 61 sequences which are different from the CVPR 2019 paper (90 training sequences).
  2. SSAV model.

  3. Popular Existing Datasets.

    Previous datasets have different formats, causing laborious data preprocessing when training or testing. Here we unified the format of all the datasets for easier use:

    • original images saved as *.jpg format
    • original images indexd of zero (e.g., 00000.jpg)
    • ground-truth images saved as *.png format
    Year Publisher Dataset Clips Download Link1 Download Link2
    2010 BMVC SegV1 5(11.9MB) Baidu Pan Google Driver
    2013 ICCV SegV2 14(84.9MB) Baidu Pan Google Driver
    2014 TPAMI FBMS 30 (790MB) Baidu Pan Google Driver
    2014 TPAMI FBMS-59 (overall) 59(817.52MB) Baidu Pan Google Driver
    2015 TIP MCL 9(308MB) Baidu Pan Google Driver
    2015 TIP ViSal 17(59.1MB) Baidu Pan Google Driver
    2016 CVPR DAVIS 50(842MB) Baidu Pan Google Driver
    2017 TCSVT UVSD 18(207MB) Baidu Pan Google Driver
    2018 TIP VOS 200(3.33GB) Baidu Pan Google Driver
    VOS_test 40(672MB) Baidu Pan Google Driver
    2019 CVPR DAVSOD 187(10.24G) Baidu Pan (fetch code: ivzo) Google Driver
    All datasets Google Driver

    * We do not include the “penguin sequence” of the SegTrack-V2 dataset due to its inaccurate segmentation. Thus the results of the MCL dataset only contains 13 sequences in our benchmark. For VOS dataset, we only benchmark the test sequence divided by our benchmark (Traning Set: Val Set: Test Set = 6:2:2).

  4. Papers & Codes & Results (continue updating).

    We have spent about half a year to execute all of the codes. You can download all the results directly for the convience of research. Please cite our paper if you use our results. The overall results link is here (Baidu|Google)  (Update: 2019-11-17)

Year & Pub & Paper Model DAVIS FBMS MCL SegV1 SegV2 UVSD ViSal VOS DAVSOD & Overall
2008 & CVPR & PQFT Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2009 & JOV & SST Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2010 & ECCV & SIVM Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2014 & CVPR & TIMP Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2014 & TCSVT & SPVM Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2015 & CVPR & SAG Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2015 & TIP & GFVM Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2015 & TIP & RWRV Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2015 & ICCV & MB+ Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2016 & CVPR & MST Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2017 & TCSVT & SGSP Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2017 & TIP & SFLR Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2017 & TIP & STBP Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2017 & TCYB & VSOP Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2017 & BMVC & DSR3D Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & TIP & STCRF Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & TIP & SCOM Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & TIP & DLVSD Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & TCSVT & SCNN Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & ECCV & MBNM Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & ECCV & PDB Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2018 & CVPR & FGRNE Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google
2019 & CVPR & SSAV Code Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google Baidu | Google

Usage

The code has been tested successfully on Ubuntu 16.04 with CUDA 8.0 and OpenCV 3.1.0

  1. Prerequisites:

    Use DAVSOD/mycaffe-convlstm/Makefile.config to build caffe. Follow official instructions.

    make all -j8
    make pycaffe
  2. Downloading necessary data:

    • downloading the training/testing dataset and move it into Datasets/.

    • downloading pre-trained caffemodel and move it into model/, which can be found in baidu pan(Fetch Code: pb0h) | google drive.

  3. Testing Configuration:

    • After you download the pre-trained model and testing dataset, just run generateTestList.py to get the test list, and SSAV_test.py to get the saliency maps in results/SSAV/.

    • Just enjoy it!

    • The prediction results of all competitors and our PNS-Net can be found at Google Drive (7MB).

  4. Training Configuration:

    • With the training dataset downloaded, you can download the basemodel from baidu pan (Fetch Code:0xk4) | google drive.
  5. Evaluating model:

    • One-key evaluation is written in MATLAB code main.m in DAVSOD/EvaluateTool/. Details of the evaluation metrics please refer to the following papers.
       [1]Structure-measure: A New Way to Evaluate the Foregournd Maps, ICCV2017.
       [2]Enhanced Alignment Measure for Binary Foreground Map Evaluation, IJCAI2018.
      

Note that: This version only provide the implicit manner for learning attention-shift. The explicit way to train this model will not be released due to the commercial purposes (Hua Wei, IIAI).


Results

  1. Leaderboard.


    Figure 4: Summarizing of 36 previous representative VSOD methods and the proposed SSAV model.

  2. Benchmark.


    Figure 5: Benchmarking results of 17 state-of-the-art VSOD models on 7 datasets.

  3. Performance Preview




    Figure 6: Blackswan sequence result generated by our SSAV model. More results can be found in supplemental materials.

  4. DAVSOD Dataset Profile


    Figure 7: Samples from our dataset, with instance-level ground truth segmentation and fixation map.


    Figure 8: Examples to show our rigorous standards for segmentation labeling.


    Figure 9: Example sequence of saliency shift considered in the proposed DAVSOD dataset. Different (5th row) from tradition work which labels all the salient objects (2th row) via static frames. The proposed DAVSOD dataset were strictly annotated according to real human fixation record (3rd row), thus revealing the dynamic human attention mechanism.

    Watch the video

    Watch the video for Supplemental Material

Citation

If you find this useful, please cite the following works:

@InProceedings{Fan_2019_CVPR,
   author = {Fan, Deng-Ping and Wang, Wenguan and Cheng, Ming-Ming and Shen, Jianbing}, 
   title = {Shifting More Attention to Video Salient Object Detection},
   booktitle = {IEEE CVPR},
   year = {2019}
}
@inproceedings{song2018pyramid,
  title={Pyramid dilated deeper ConvLSTM for video salient object detection},
  author={Song, Hongmei and Wang, Wenguan and Zhao, Sanyuan and Shen, Jianbing and Lam, Kin-Man},
  booktitle={ECCV},
  pages={715--731},
  year={2018}
}

Contact: Deng-Ping Fan ([email protected]).

⬆ back to top