Skip to content

Latest commit

 

History

History
67 lines (48 loc) · 14.4 KB

README.md

File metadata and controls

67 lines (48 loc) · 14.4 KB

CCNet

CCNet: Criss-Cross Attention for Semantic Segmentation

Introduction

Official Repo

Code Snippet

Abstract

Contextual information is vital in visual understanding problems, such as semantic segmentation and object detection. We propose a Criss-Cross Network (CCNet) for obtaining full-image contextual information in a very effective and efficient way. Concretely, for each pixel, a novel criss-cross attention module harvests the contextual information of all the pixels on its criss-cross path. By taking a further recurrent operation, each pixel can finally capture the full-image dependencies. Besides, a category consistent loss is proposed to enforce the criss-cross attention module to produce more discriminative features. Overall, CCNet is with the following merits: 1) GPU memory friendly. Compared with the non-local block, the proposed recurrent criss-cross attention module requires 11x less GPU memory usage. 2) High computational efficiency. The recurrent criss-cross attention significantly reduces FLOPs by about 85% of the non-local block. 3) The state-of-the-art performance. We conduct extensive experiments on semantic segmentation benchmarks including Cityscapes, ADE20K, human parsing benchmark LIP, instance segmentation benchmark COCO, video segmentation benchmark CamVid. In particular, our CCNet achieves the mIoU scores of 81.9%, 45.76% and 55.47% on the Cityscapes test set, the ADE20K validation set and the LIP validation set respectively, which are the new state-of-the-art results. The source codes are available at this https URL.

Results and models

Cityscapes

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) Device mIoU mIoU(ms+flip) config download
CCNet R-50-D8 512x1024 40000 6 3.32 V100 77.76 78.87 config model | log
CCNet R-101-D8 512x1024 40000 9.5 2.31 V100 76.35 78.19 config model | log
CCNet R-50-D8 769x769 40000 6.8 1.43 V100 78.46 79.93 config model | log
CCNet R-101-D8 769x769 40000 10.7 1.01 V100 76.94 78.62 config model | log
CCNet R-50-D8 512x1024 80000 - - V100 79.03 80.16 config model | log
CCNet R-101-D8 512x1024 80000 - - V100 78.87 79.90 config model | log
CCNet R-50-D8 769x769 80000 - - V100 79.29 81.08 config model | log
CCNet R-101-D8 769x769 80000 - - V100 79.45 80.66 config model | log

ADE20K

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) Device mIoU mIoU(ms+flip) config download
CCNet R-50-D8 512x512 80000 8.8 20.89 V100 41.78 42.98 config model | log
CCNet R-101-D8 512x512 80000 12.2 14.11 V100 43.97 45.13 config model | log
CCNet R-50-D8 512x512 160000 - - V100 42.08 43.13 config model | log
CCNet R-101-D8 512x512 160000 - - V100 43.71 45.04 config model | log

Pascal VOC 2012 + Aug

Method Backbone Crop Size Lr schd Mem (GB) Inf time (fps) Device mIoU mIoU(ms+flip) config download
CCNet R-50-D8 512x512 20000 6 20.45 V100 76.17 77.51 config model | log
CCNet R-101-D8 512x512 20000 9.5 13.64 V100 77.27 79.02 config model | log
CCNet R-50-D8 512x512 40000 - - V100 75.96 77.04 config model | log
CCNet R-101-D8 512x512 40000 - - V100 77.87 78.90 config model | log

Citation

@article{huang2018ccnet,
    title={CCNet: Criss-Cross Attention for Semantic Segmentation},
    author={Huang, Zilong and Wang, Xinggang and Huang, Lichao and Huang, Chang and Wei, Yunchao and Liu, Wenyu},
    booktitle={ICCV},
    year={2019}
}