Skip to content

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

License

Notifications You must be signed in to change notification settings

aurelianocyp/CodeTalker

 
 

Repository files navigation

Environment

  • Linux
  • ubuntu 20.04 !!!,可以4090, python 3.8可以

Other necessary packages:

pip install -r requirements.txt

报红色error可以不用管

IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown here.(没出现这个报错就别管)

每一次重新安装库后都进入mesh测试一下是否还能用。如果test时输出为OK (skipped=5),应该就行了。

还需要将wav2vec2-base-960h下载下来,在主目录中创建Facebook/wav2vec2-base-960h目录

Dataset Preparation

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/.

Download "FLAME_sample.ply" from voca and put it in vocaset/. (这一步不训练也需要完成)

Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav:

cd vocaset
python process_voca_data.py
# 可能得到的npy只有478而wav只有475个,应该是原始pkl的问题

BIWI

Follow the BIWI/README.md to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/.

Demo

Download the pretrained models from biwi_stage1.pth.tar & biwi_stage2.pth.tar Put the pretrained models under BIWI

vocaset_stage1.pth.tar & vocaset_stage2.pth.tar. Put the pretrained models under VOCASET folders. Given the audio signal,

还需要下载templates.pkl到vacaset文件夹中

  • to animate a mesh in FLAME topology, run:
sh scripts/demo.sh vocaset

可能需要开启一下代理source /etc/network_turbo,否则会出现connection error报错。

如果出现了osmesa报错,则apt-get install -y python-opengl libosmesa6

如果遇到 RuntimeError: The shape of the 3D attn_mask is torch.Size,是models.utils.py里的max_seq_len=600限制了最大序列,可以通过更改这个600到更大来测试性能。但是改了后需要自行训练,因为预训练模型是600 Doubiiu#48

运行时的配置文件在config/vocaset中的demo.yaml.使用的vocaset模版需要是ply格式,在vocaset数据集中有,需要使用哪个模板就放到vocaset文件夹中。再在demo.yaml中的demo中的subject改一下模版名。condition应该指的是使用谁的风格。输出结果在demo文件夹以wav的名字命名的文件夹中

改为60000后重新训练了一次,但是效果不好。而且依旧只能生成10s的视频

  • to animate a mesh in BIWI topology, run:
sh scripts/demo.sh BIWI
This script will automatically generate the rendered videos in the `demo/output` folder. You can also put your own test audio file (.wav format) under the `demo/wav` folder and specify the arguments in `DEMO` section of `config/<dataset>/demo.yaml` accordingly (e.g., `demo_wav_path`, `condition`, `subject`, etc.).

Training / Testing

The training/testing operation shares a similar command:

sh scripts/<train.sh|test.sh> <exp_name> config/<vocaset|BIWI>/<stage1|stage2>.yaml <vocaset|BIWI> <s1|s2>

Please replace <exp_name> with your own experiment name, <vocaset|BIWI> by the name of your target dataset, i.e., vocaset or BIWI. Change the exp_dir in both scripts/train.sh and scripts/test.sh if needed. We just take an example for default commands below.

Training for Discrete Motion Prior

sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1

如果在训练的时候报VQAutoEncoder error,可以参考(即在一阶段和二阶段使用不同的代码。)Doubiiu#5 二阶段训练时开一下代理

Training for Speech-Driven Motion Synthesis

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

两阶段训练的模型都是在RUN文件夹中,一阶段训练完可直接训练二阶段。

Testing

sh scripts/test.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

这个testing好像不会给出结果,就是训练时的validation类似的,需要结果还是需要用demo来推理。用demo推理的时候改一下demo.yaml里的模型位置就行,用s2的模型。

Visualization with Audio

Modify the paths in scripts/render.sh and run:

sh scripts/render.sh

Evaluation on BIWI

We provide the reference code for Lip Vertex Error & Upper-face Dynamics Deviation. Remember to change the paths in scripts/cal_metric.sh, and run:

sh scripts/cal_metric.sh

Play with Your Own Data

Data Preparation

  • Create the dataset directory <dataset_dir> in CodeTalker directory.

  • Place your vertices data (.npy files) and audio data (.wav files) in <dataset_dir>/vertices_npy and <dataset_dir>/wav folders, respectively.

  • Save the templates of all subjects to a templates.pkl file and put it in <dataset_dir>, as done for BIWI and vocaset dataset. Export an arbitary template to .ply format and put it in <dataset_dir>/.

Training, Testing & Visualization

  • Create the corresponding config files in config/<dataset_dir> and modify the arguments in the config files.

  • Check all the code segments releated to dataset information.

  • Following the training/testing/visualization pipeline as done for BIWI and vocaset dataset.

后记

关于眨眼:Doubiiu#44

About

[CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 97.8%
  • Python 2.2%