In this repository we provide the official PyTorch implementation of our network.
Author: Chengxin Wang
Supervisors: Juana Valeria Hurtado and Prof. Dr. Abhinav Valada
Given only 5 past RGB images as a context, our network can predict future semantic segmentation for a large variety of senarios:
Note the model have no knowledge of the future RGB frame nor future semantic segmentation during test time, and they are given here only for comparison.
(From left to right: future RGB frame, oracle semantic segmentation, model prediction, prediction uncertainty)
Slight Right Going Straight Agent Movement (pedestrian walking in opposite direction) Agent-object Interaction (riding a bike)
Quantitively, our network can achieve a state-of-the-art prediction mIoU for both mid-term (5th future frame) and long-term (10th future frame) prediction:
Experiment Name | mIoU 5 | mIoU 5 (RF) | mIoU 10 | mIoU 10 (RF) | Weight | Configuration |
---|---|---|---|---|---|---|
ResNet | 47.18 % | 42.17 % | 41.82 % | 35.69 % | pth | config |
ResNet + Discriminator | 49.57 % | 42.98 % | 43.64 % | 36.14 % | pth | config |
ResNet + Discriminator + F2MF | 49.62 % | 42.83 % | 44.36 % | 35.98 % | pth | config |
EfficientPS-FPN | 52.26 % | 44.68 % | 46.51 % | 37.41 % | pth | config |
For this project, the following single-frame perception encoders are used:
Encoder Name | Input HxW | Single-frame mIoU | Training Data | Model |
---|---|---|---|---|
Panoptic Deeplab (finetuned, OS=8) | 257x513 | 65.97 % | ImageNet + Cityscapes | pth, yaml |
EfficientPS (not finetuned, FPN) | 256x512 | 59.52 % | - | pth, ini |
git clone --recurse-submodules [email protected]:cvcore/fr-panoptic-forecast.git
cd fr-panoptic-forecast
Then, use conda
to install required packages:
conda env create -f envs/env_pytorch_1_7_1_cupy.yml
Next, install the main package with pip:
pip install -e .
export CUDA_PATH=/usr/local/cuda # or your cuda installation path
pushd thirdparty/correlation_package
pip install -e .
popd
pushd thirdparty/resample2d_package
pip install -e .
popd
pushd thirdparty/panoptic_deeplab
pip install -e .
popd
(Optional: EfficientPS encoder)
To use FPN network from EfficientPS as perception encoder, install it by:
pushd thirdparty/efficientPS
pip install -e .
popd
Please use the official website. For our experiment, the following packages are required:
- gtFine_trainvaltest.zip (241MB)
- leftImg8bit_trainvaltest.zip (11GB)
- leftImg8bit_sequence_trainvaltest.zip (324GB)
By default we assume the dataset is put under ../../dataset/cityscapes
relative to this project folder. You can also change it with
the DATASET.PATH
option.
Because fine annotation is available only for the 20th frame from the Cityscapes dataset, we have generated pseudo-groundtruth semantic segmentaiton labels with the EfficientPS network. They can be downloaded from (TODO)[TODO] and should be put under gtPseudoSeqEPS/
directory in the Cityscapes dataset folder.
Download the single-frame perception backbones and unzip it under pretrained_models/perception
relative to the project directory.
error in correlation_forward_cuda_kernel: invalid device function
Make sure your CUDA runtime version reported by nvcc --version
agrees with your pytorch version shown in conda list
.
can't install correlation_package
Set CUDA_HOME
to your cuda installation path (usually /usr/local/cuda-VERSION
).
Training is done with scripts/train_forecast.py
with the following commands:
Single GPU training
python scripts/train_forecast.py CONFIG_FILE [--opts OPTIONAL_ARGUMENTS]
Multi-GPU distributed training
python -m torch.distributed.launch --nproc_per_node=NUM_GPUS --use_env scripts/train_forecast.py CONFIG_FILE --opts MODEL.DISTRIBUTED True [OTHER_OPTIONAL_ARGUMENTS]
Each CONFIG_FILE
defines an experiment setup and its documentation can be found in config/forecast.py
(where default setting for each key is defined).
For our setup, we have trained the model with 4x NVIDIA GeForce RTX 3090 (24 GB memory each) around 4 days. For a good learning outcome, we recommend a minimum batch size of 12.
Evaluation is done with scripts/train_forecast.py
by loading the pretrained model and setting MODEL.EVAL
to True
:
Single GPU evaluation
python scripts/train_forecast.py CONFIG_FILE --opts MODEL.DISTRIBUTED True MODEL.EVAL True MODEL.LOAD CHECKPOINT_FILE [OTHER_OPTIONAL_ARGUMENTS]
Multi-GPU evaluation
python -m torch.distributed.launch --nproc_per_node=2 --use_env scripts/train_forecast.py CONFIG_FILE --opts MODEL.DISTRIBUTED True MODEL.EVAL True MODEL.LOAD CHECKPOINT_FILE [OTHER_OPTIONAL_ARGUMENTS]
If you find this implementation useful in your work, please acknowledge it appropriately:
@misc{future-prediction,
author = {Chengxin Wang},
title = {Recurrent Future Prediction with Spatiotemporal GANs for Video Scene Understanding in PyTorch},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/cvcore/fr-panoptic-forecast}
}
We thank the developers of the following projects:
Apache License Version 2.0