This repository provides the implementation of the baseline method ST-GCN [1], its extension 2s-AGCN [2], and our proposed methods TA-GCN [3], PST-GCN[4], ST-BLN [5], and PST-BLN [6] for skeleton-based human action recognition. Our proposed methods are built on top of ST-GCN to make it more efficient in terms of number of model parameters and floating point operations.
The application of ST-BLN and PST-BLN methods are also evaluated on facial expression recognition methods for landmark-based facial expression recognition task in our paper [6], and the implementation can be found in FER_PSTBLN_MCD.
This implementation is modified based on the OpenMMLAB toolbox, and the 2s-AGCN repositories.
This project is funded by the OpenDR European project and the implementations are also integrated in OpenDR toolkit which will be publicly available soon.
-
Download the raw data from NTU-RGB+D and Skeleton-Kinetics. Then put them under the data directory:
-data\ -kinetics_raw\ -kinetics_train\ ... -kinetics_val\ ... -kinetics_train_label.json -keintics_val_label.json -nturgbd_raw\ -nturgb+d_skeletons\ ... -samples_with_missing_skeletons.txt
-
Preprocess the data with
python data_gen/ntu_gendata.py
python data_gen/kinetics-gendata.py.
-
Generate the bone data with:
python data_gen/gen_bone_data.py
Modify config files based on your experimental setup and run the following scripts:
`python main.py --config ./config/nturgbd-cross-view/stgcn/train_joint_stgcn.yaml`
`python main.py --config ./config/nturgbd-cross-view/stgcn/train_bone_stgcn.yaml`
To ensemble the results of joints and bones, first run test to generate the scores of the softmax layer.
`python main.py --config ./config/nturgbd-cross-view/stgcn/test_joint_stgcn.yaml`
`python main.py --config ./config/nturgbd-cross-view/stgcn/test_bone_stgcn.yaml`
Then combine the generated scores with:
`python ensemble.py` --datasets ntu/xview
The shell scripts for training and testing each of the methods are also provided. For example, for training the ST-GCN method you need to run:
`sh run_stgcn.sh`
All the aforementioned methods are also integerated in the OpenDR toolkit along with a webcam demo code. In this demo, we use light-weight OpenPose [10], which is integerated in the toolkit as well, to extract skeletons from each input frame and then we feed a sequence of 300 skeletons to a pre-trained ST-GCN-based model in this toolkit.
demo.mp4
Please cite the following papers if you use any of the proposed methods implemented in this repository in your reseach.
@inproceedings{heidari2021tagcn,
title={Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition},
author={Heidari, Negar and Iosifidis, Alexandros},
booktitle={2020 25th International Conference on Pattern Recognition (ICPR)},
pages={7907--7914},
year={2021},
organization={IEEE}
}
@inproceedings{heidari2021pstgcn,
title={Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition},
author={Heidari, Negar and Iosifidis, Alexandras},
booktitle={ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
pages={3220--3224},
year={2021},
organization={IEEE}
}
@inproceedings{heidari2021stbln,
title={On the spatial attention in spatio-temporal graph convolutional networks for skeleton-based human action recognition},
author={Heidari, Negar and Iosifidis, Alexandros},
booktitle={2021 International Joint Conference on Neural Networks (IJCNN)},
pages={1--7},
year={2021},
organization={IEEE}
}
@article{heidari2021pstbln,
title={Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout for Landmark-based Facial Expression Recognition with Uncertainty Estimation},
author={Heidari, Negar and Iosifidis, Alexandros},
journal={arXiv preprint arXiv:2106.04332},
year={2021}
}
This work was supported by the European Union’s Horizon 2020 Research and Innovation Action Program under Grant 871449 (OpenDR).
For any questions, feel free to contact: [email protected]
[1] Yan, S., Xiong, Y., & Lin, D. (2018, April). Spatial temporal graph convolutional networks for skeleton-based action recognition. In Proceedings of the AAAI conference on artificial intelligence (Vol. 32, No. 1).
[2] Shi, Lei, et al. Two-stream adaptive graph convolutional networks for skeleton-based action recognition. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019.
[3] Heidari, Negar, and Alexandros Iosifidis. "Temporal attention-augmented graph convolutional network for efficient skeleton-based human action recognition." 2020 25th International Conference on Pattern Recognition (ICPR). IEEE, 2021.
[4] Heidari, Negar, and Alexandras Iosifidis. "Progressive Spatio-Temporal Graph Convolutional Network for Skeleton-Based Human Action Recognition." ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021.
[5] Heidari, N., & Iosifidis, A. (2020). On the spatial attention in Spatio-Temporal Graph Convolutional Networks for skeleton-based human action recognition. arXiv preprint arXiv: 2011.03833.
[6] Heidari, Negar, and Alexandros Iosifidis. "Progressive Spatio-Temporal Bilinear Network with Monte Carlo Dropout for Landmark-based Facial Expression Recognition with Uncertainty Estimation." arXiv preprint arXiv:2106.04332 (2021).
[7] Shahroudy, A., Liu, J., Ng, T. T., & Wang, G. (2016). Ntu rgb+ d: A large scale dataset for 3d human activity analysis. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1010-1019).
[8] Kay, W., Carreira, J., Simonyan, K., Zhang, B., Hillier, C., Vijayanarasimhan, S., ... & Zisserman, A. (2017). The kinetics human action video dataset. arXiv preprint arXiv:1705.06950.
[9] Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation using part affinity fields. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 7291-7299).
[10] Osokin, Daniil (2017). Real-time 2d multi-person pose estimation on cpu: Lightweight openpose. arXiv preprint arXiv:1811.12004.