Skip to content

Latest commit

 

History

History
145 lines (111 loc) · 4.77 KB

README.md

File metadata and controls

145 lines (111 loc) · 4.77 KB

Taming Diffusion Models for Music-driven Conducting Motion Generation

Accepted by AAAI 2023 Summer Symposium, with Best Paper Award.

Overview


  • Generated conducting motion according to the given music -- Tchaikovsky Piano Concerto No.1:

    Tchaikovsky.Piano.Concerto.No.1.mp4

Features

  • Objective: We present Diffusion-Conductor, a novel DDIM-based approach for music-driven conducting motion generation.
  • Contributions:
    • First work to use diffusion model for music-driven conducting motion generation.
    • Modify the supervision signal from ε to x0 to achieve the better performances, which will inspire later research on motion generation field.
  • Benchmark Performance: Ourperform state-of-the-art methods on all four metrics: MSE, FGD, BC, Diversity.

News

  • 18/07/2023: Our paper won the Best Paper Award for AAAI 2023 Inangural Summer Symposium!

Getting Started

Installation

Please refer to install.md for detailed installation.

Training

Prepare the ConductorMotion100 dataset:

You can also access the dataset via Google Drive

There are 3 splits of ConductorMotion100: train, val, and test. They respectively correspond to 3 .rar files. After extract them to <Your Dataset Dir> folder, the file structure will be:

tree <Your Dataset Dir>
<Your Dataset Dir>
    ├───train
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───5268
    │           mel.npy
    │           motion.npy
    ├───val
    │   ├───0
    │   │       mel.npy
    │   │       motion.npy
    |  ...
    │   └───290
    │           mel.npy
    │           motion.npy
    └───test
        ├───0
        │       mel.npy
        │       motion.npy
       ...
        └───293
                mel.npy
                motion.npy

Each mel.npy and motion.npy are corresponded to 60 seconds of Mel spectrogram and motion data. Their sampling rates are respectively 90 Hz and 30 Hz. The Mel spectrogram has 128 frequency bins, therefore mel.shape = (5400, 128). The motion data contains 13 2d keypoints, therefore motion.shape = (1800, 13, 2)

Train the music encoder and motion encoder in Contrastive_Stage with the following command:

cd Contrastive_Stage
python M2SNet_train.py --dataset_dir <Your Dataset Dir> 

Train the diffusion model in Diffusion_Stage with the following command:

cd Diffusion_Stage
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python3 -u tools/train.py \
    --name checkpoint_folder_name \
    --batch_size 32 \
    --times 25 \
    --num_epochs 400 \
    --dataset_name ConductorMotion100 \
    --data_parallel \
    --gpu_id 1 2

Inference and Visualization

cd Diffusion_Stage
PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \
python -u tools/visualization.py \
    --motion_length 6 \
    --gpu_id 5 \
    --result_path "conduct_example.mp4"

Download the pretrained model

For evaluation and inference, you may download the contrastive stage pretrained model and the diffusion stage pretrained model from GoogleDrive.

Acknowledgement

We would like to thank to the great projects in VirtualConductor and MotionDiffuse.

Papers

  1. Zhuoran Zhao and Jinbin Bai* and Delong Chen and Debang Wang and Yubo Pan. Taming Diffusion Models for Music-driven Conducting Motion Generation

    @inproceedings{zhao2023taming,
      title={Taming diffusion models for music-driven conducting motion generation},
      author={Zhao, Zhuoran and Bai, Jinbin and Chen, Delong and Wang, Debang and Pan, Yubo},
      booktitle={Proceedings of the AAAI Symposium Series},
      volume={1},
      number={1},
      pages={40--44},
      year={2023}
    }