GitHub - aurelianocyp/CodeTalker: [CVPR 2023] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

Environment

Linux
ubuntu 20.04 ！！！，可以4090, python 3.8可以

Other necessary packages:

pip install -r requirements.txt

报红色error可以不用管

conda install ffmpeg
pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html
MPI-IS/mesh:
- sudo apt update
- sudo apt-get install libboost-dev
- git clone https://github.com/MPI-IS/mesh.git
- cd mesh
- python -m pip install pip==22.2.1
- 先把mesh文件夹里的requirements删到只剩下pyyaml和opencv-python
- BOOST_INCLUDE_DIRS=/path/to/boost/include make all
- make tests #用于测试是否成功

IMPORTANT: Please make sure to modify the site-packages/torch/nn/modules/conv.py file by commenting out the self.padding_mode != 'zeros' line to allow for replicated padding for ConvTranspose1d as shown here.（没出现这个报错就别管）

每一次重新安装库后都进入mesh测试一下是否还能用。如果test时输出为OK (skipped=5)，应该就行了。

还需要将wav2vec2-base-960h下载下来，在主目录中创建Facebook/wav2vec2-base-960h目录

Dataset Preparation

VOCASET

Request the VOCASET data from https://voca.is.tue.mpg.de/. Place the downloaded files data_verts.npy, raw_audio_fixed.pkl, templates.pkl and subj_seq_to_idx.pkl in the folder vocaset/.

Download "FLAME_sample.ply" from voca and put it in vocaset/. (这一步不训练也需要完成)

Read the vertices/audio data and convert them to .npy/.wav files stored in vocaset/vertices_npy and vocaset/wav:

cd vocaset
python process_voca_data.py
# 可能得到的npy只有478而wav只有475个，应该是原始pkl的问题

BIWI

Follow the BIWI/README.md to preprocess BIWI dataset and put .npy/.wav files into BIWI/vertices_npy and BIWI/wav, and the templates.pkl into BIWI/.

Demo

Download the pretrained models from biwi_stage1.pth.tar & biwi_stage2.pth.tar Put the pretrained models under BIWI

vocaset_stage1.pth.tar & vocaset_stage2.pth.tar. Put the pretrained models under VOCASET folders. Given the audio signal,

还需要下载templates.pkl到vacaset文件夹中

to animate a mesh in FLAME topology, run:

sh scripts/demo.sh vocaset

可能需要开启一下代理source /etc/network_turbo，否则会出现connection error报错。

如果出现了osmesa报错，则apt-get install -y python-opengl libosmesa6

如果遇到 RuntimeError: The shape of the 3D attn_mask is torch.Size，是models.utils.py里的max_seq_len=600限制了最大序列，可以通过更改这个600到更大来测试性能。但是改了后需要自行训练，因为预训练模型是600 Doubiiu#48

运行时的配置文件在config/vocaset中的demo.yaml.使用的vocaset模版需要是ply格式，在vocaset数据集中有，需要使用哪个模板就放到vocaset文件夹中。再在demo.yaml中的demo中的subject改一下模版名。condition应该指的是使用谁的风格。输出结果在demo文件夹以wav的名字命名的文件夹中

改为60000后重新训练了一次，但是效果不好。而且依旧只能生成10s的视频

to animate a mesh in BIWI topology, run:

sh scripts/demo.sh BIWI

This script will automatically generate the rendered videos in the `demo/output` folder. You can also put your own test audio file (.wav format) under the `demo/wav` folder and specify the arguments in `DEMO` section of `config/<dataset>/demo.yaml` accordingly (e.g., `demo_wav_path`, `condition`, `subject`, etc.).

Training / Testing

The training/testing operation shares a similar command:

sh scripts/<train.sh|test.sh> <exp_name> config/<vocaset|BIWI>/<stage1|stage2>.yaml <vocaset|BIWI> <s1|s2>

Please replace <exp_name> with your own experiment name, <vocaset|BIWI> by the name of your target dataset, i.e., vocaset or BIWI. Change the exp_dir in both scripts/train.sh and scripts/test.sh if needed. We just take an example for default commands below.

Training for Discrete Motion Prior

sh scripts/train.sh CodeTalker_s1 config/vocaset/stage1.yaml vocaset s1

如果在训练的时候报VQAutoEncoder error，可以参考（即在一阶段和二阶段使用不同的代码。）Doubiiu#5 二阶段训练时开一下代理

Training for Speech-Driven Motion Synthesis

Make sure the paths of pre-trained models are correct, i.e., vqvae_pretrained_path and wav2vec2model_path in config/<vocaset|BIWI>/stage2.yaml.

sh scripts/train.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

两阶段训练的模型都是在RUN文件夹中，一阶段训练完可直接训练二阶段。

Testing

sh scripts/test.sh CodeTalker_s2 config/vocaset/stage2.yaml vocaset s2

这个testing好像不会给出结果，就是训练时的validation类似的，需要结果还是需要用demo来推理。用demo推理的时候改一下demo.yaml里的模型位置就行，用s2的模型。

Visualization with Audio

Modify the paths in scripts/render.sh and run:

sh scripts/render.sh

Evaluation on BIWI

We provide the reference code for Lip Vertex Error & Upper-face Dynamics Deviation. Remember to change the paths in scripts/cal_metric.sh, and run:

sh scripts/cal_metric.sh

Play with Your Own Data

Data Preparation

Create the dataset directory <dataset_dir> in CodeTalker directory.
Place your vertices data (.npy files) and audio data (.wav files) in <dataset_dir>/vertices_npy and <dataset_dir>/wav folders, respectively.
Save the templates of all subjects to a templates.pkl file and put it in <dataset_dir>, as done for BIWI and vocaset dataset. Export an arbitary template to .ply format and put it in <dataset_dir>/.

Training, Testing & Visualization

Create the corresponding config files in config/<dataset_dir> and modify the arguments in the config files.
Check all the code segments releated to dataset information.
Following the training/testing/visualization pipeline as done for BIWI and vocaset dataset.

后记

关于眨眼：Doubiiu#44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environment

Dataset Preparation

VOCASET

BIWI

Demo

Training / Testing

Training for Discrete Motion Prior

Training for Speech-Driven Motion Synthesis

Testing

Visualization with Audio

Evaluation on BIWI

Play with Your Own Data

Data Preparation

Training, Testing & Visualization

后记

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
BIWI		BIWI
base		base
config		config
dataset		dataset
demo		demo
main		main
metrics		metrics
models		models
scripts		scripts
utils		utils
vocaset		vocaset
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
figure.png		figure.png
requirements.txt		requirements.txt

License

aurelianocyp/CodeTalker

Folders and files

Latest commit

History

Repository files navigation

Environment

Dataset Preparation

VOCASET

BIWI

Demo

Training / Testing

Training for Discrete Motion Prior

Training for Speech-Driven Motion Synthesis

Testing

Visualization with Audio

Evaluation on BIWI

Play with Your Own Data

Data Preparation

Training, Testing & Visualization

后记

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages