Skip to content

A comparative study of ViViT, CNN-GRU sequence models for video action recognition using the UCF101 dataset

Notifications You must be signed in to change notification settings

denpalrius/sports_action_recognition

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Temporal Sequence Modeling for Sports Action Recognition

This project focuses on fine-grained sports action recognition using two main architectures:

  1. CNN-based Sequence Models: These models combine CNNs for feature extraction with RNNs(GRU layers) for temporal sequence modeling:

    • VGG19
    • InceptionV3
    • InceptionV4-ResNet (hybrid model)
    • EfficientNetB4
  2. ViViT (Video Vision Transformer): A pure transformer-based approach for end-to-end video classification, capturing both spatial and temporal features.

Model Architectures

1. CNN-based Sequence Models

  • Feature Extractors: VGG19, InceptionV3, InceptionV4-ResNet, EfficientNetB4
  • Temporal Model: GRU layers

2. ViViT Model

  • Transformer-based model for video classification
  • Spatiotemporal attention and tubelet embedding

Evaluation

Each model is evaluated using:

  • Accuracy, Precision, Recall, F1-Score
  • Training/validation curves
  • Confusion matrix

Acknowledgments

  • Dr. Lina Chato
  • UCF101 dataset
  • TensorFlow team
  • All the cited authors

About

A comparative study of ViViT, CNN-GRU sequence models for video action recognition using the UCF101 dataset

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published