This is the repository that contains the source code for the ACL 2024 Demo of EmpathyEar: An Open-source Avatar Multimodal Empathetic Chatbot. EmpathyEar is a pioneering open-source, avatar-based multimodal empathetic chatbot, to fill the gap in traditional text-only empathetic response generation (ERG) systems.
resized-video-demo-trimmed.mp4
- Download ChatGLM3 chekpoints from https://huggingface.co/THUDM/chatglm3-6b/tree/main and place it in the ChatGLM-6B folder.
- Download pre-trained ChatGLM3 lora checkpoints from https://pan.baidu.com/s/14zzdxyRZL3dqBmI2hJPlIw?pwd=qj4w and place it in the ChatGLM-6B folder. You can also fine-tune ChatGLM by following these steps:
cd chatglm
./scripts/finetune_lora.sh
- Download the pre-trained StyleTTS2 model on LibriTTS at https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main and place it in the StyleTTS2-LibriTTS folder.
- Download the pretrained models for EAT and place them in the ckpt,[Utils] folder respectively according to the following steps.
gdown --id 1KK15n2fOdfLECWN5wvX54mVyDt18IZCo && unzip -q ckpt.zip -d ckpt
gdown --id 1HGVzckXh-vYGZEUUKMntY1muIbkbnRcd && unzip -q Utils.zip -d Utils
conda env create -f environment.yml
conda activate empathyear
python inference.py
The generated TTS wav files will be saved in TTS_audio, and the generated talking face videos will be saved in MP4_video.
We acknowledge these works for their public codes: ChatGLM3, ImageBind, StyleTTS2, EAT.
This repository is under BSD 3-Clause License. EmpathyEar is a research project intended for non-commercial use only. One must NOT use the code of EmpathyEar for any illegal, harmful, violent, racist, or sexual purposes. One is strictly prohibited from engaging in any activity that will potentially violate these guidelines. Any potential commercial use of this code should be approved by the authors.