To use the code, the user needs to pre-install a few packages.
$ sudo apt-get install openslide-tools
$ sudo apt-get install libgeos-dev
$ pip install -r requirements.txt
Download all 50 zipped slides and two csv files, put them inside ./data/SourceData
, unzip them by running
$ cd preprocess
$ python unzip_slides.py
All slides would be unzipped into ./data/LiverImages
.
Visualizing the whole
and viable
mask of a slide can give the user an intuitive feel on how the tumor looks. Run the following code to generate the side-by-side view of the masks with the corresponding slide image.
$ python check_mask.py
Moreover, tissueloc
provides the algorithm to locate the boundary of real tissues in the slide. Running
$ python locate_tissue.py
The located tissue results may help in slide-level prediction stage. Both mask comparison and tissue localization results are saved in the ./data/Visualization
directory.
If you want to check the provided viable tumor burden
with the calculated result from provided masks, run the following code:
$ cd ../burden
$ python validate_burden.py
Current we use two splitting manners, half-overlap and self-overlap. half-overlap
would have more overlap between neighboring patches, thus obtains more patches. self-overlap
usually have small overlap between neighboring patches, the number of generated patches is much smaller than half-overlap
. On both viable and whole tumor types, we would remove those patches that are entirely in non-tissue regions.
viable tumor patch splitting:: We combine half-overlap with self-overlap. However, in half-overlap splitting, we control patches by its mask foreground.
whole tumor patch splitting: We combine half-overlap with self-overlap with no control on mask foreground ratio. Compared with the viable tumor, whole tumor patch generation can obtain patches, and the ratio of background would be higher, thus avoiding false-positive in whole tumor prediction.
Use the following commands to generate patches for viable
and whole
by setting the parameter tumor_type
.
$ cd patches
$ python gen_patches.py
We explore UNet and PSPNet on liver patch segmentation. Experimental results show that PSPNet achieves superior performance.
We compare SGD with initial learning rate 1.0e-2 and Adam with initial learning rate 1.0e-3. On both PSPNet and UNet, SGD presents superior performance. We train the segmentation model for 50 epochs and decay the learning rate with epoch-wise down-stepping until 0.0.
Binary cross-entropy (BCE) and dice loss are combined as the overall loss. BCE-0.1 achieves the most promising results.
We compare applying patch normalization and no patch normalization. The prediction performance on validation patches shows that applying no normalization performs a little bit better. Besides, without normalization is also more convenient to implement.
The patch training can be run as follows, we train viable
and whole
with the same settings:
$ cd seg
$ python train_seg.py
The Caffe pretrained PSPNet needs to be downloaded from Here and put it in seg/segnet/pspnet/
.
The slide-level segmentation is also conducted in a patch-wise manner. To be specific, we first split whole slide images into patches, and then we predict each patch. At last, we merge all patches' predictions to generate the final tumor segmentation result.
Here the main issue is how to split the whole slide image. To make the slide-level segmentation to be more robust, we adopt a stride-wise patch splitting method and set the stride to be small (64 used). When the stride is small, each pixel would lie in more patches and thus would be predicted more times. As we would average the predictions to get the final prediction, each pixel's segmentation prediction would be more robust if it is predicted more times in multiple different contexts. However, the time cost would linearly increase with the number of patches. In the current application, we take the segmentation accuracy as the priority.
Before predicting on test slides, we copy the best-performed model and paste it to BestModel
folder for both viable
and whole
, then run
$ cd seg
$ python pred_test_slide.py
After viable
and whole
tumor regions are predicted, we calculate the tumor burden with
$ cd burden
$ python pred_burden.py