A residual dense vision transformer for medical image super-resolution with novel general-purpose perceptual loss.
This paper proposes an efficient vision transformer with residual dense connections and local feature fusion to achieve efficient single-image super-resolution (SISR) of medical modalities. Moreover, we implement a general-purpose perceptual loss with manual control for image quality improvements of desired aspects by incorporating prior knowledge of medical image segmentation. Compared with state-of-the-art methods on four public medical image datasets, the proposed method achieves the best PSNR scores of 6 modalities among seven modalities. It leads to an average improvement of +0.09 dB PSNR with only 38% parameters of SwinIR. On the other hand, the segmentation-based perceptual loss increases +0.14 dB PSNR on average for SOTA methods, including CNNs and vision transformers. Additionally, we conduct comprehensive ablation studies to discuss potential factors for the superior performance of vision transformers over CNNs and the impacts of network and loss function components.
Framework of the proposed RDST network.
To setup:
git clone https://github.com/GinZhu/RDST.git
cd RDST
pip install -r requirements.txt
To train:
python -W ignore train.py --config-file config_files/RDST_E1_OASIS_example_SRx4.ini
To test:
python -W ignore test.py --config-file config_files/RDST_E1_OASIS_example_SRx4_testing.ini
Here we provide pre-trained models to download (on the OASIS dataset):
- RDST-E1: +0.16 PSNR than SwinIR with only 38% parameters;
- RDST-HRL: [+0.0016, +0.0051, +0.0005, +0.0005] dice coefficients than SwinIR.
- RDST-E: +0.02 PSNR than SwinIR with only 20% parameters and +46% faster.
This work is available at arXiv, please cite as:
@article{zhu2023rdst,
title={A residual dense vision transformer for medical image super-resolution with segmentation-based perceptual loss fine-tuning},
author={Zhu, Jin and Yang, Guang and Lio, Pietro},
journal={arXiv preprint arXiv:2302.11184},
year={2023}
}
We refer to the previous works for better understanding of this project: