CL4AC: A Contrastive Loss for Audio Captioning

This repository contains the code and generated sound samples of our paper "CL4AC: A Contrastive Loss for Audio Captioning", which was accepted for the Detection and Classification Acoustic Scenes and Events (DCASE) Workshop 2021.

Model architecture

We proposed a contrastive loss for audio captioning task.

Set up environment

Clone the repository:

git clone https://github.com/liuxubo717/contrastive_loss_for_audio_captioning

Create conda environment with dependencies:

conda create -f environment.yml -n audio_captioning

Activate conda environment:
```
conda activate audio_captioning
```

Dataset

We use the "Clotho V2" dataset, the audio data has been processed to h5 file data format, and saved into the data/logspectrogram directory. Captions files are saved in data/ directory.

Training

The configuration files of training script are under the config/ directory.

Then, execute the training script by:

 python train.py --config=config/w2v-trainable-selection-loss-last-hidden.yml --lr=0.0005 --batch=16

During training, the log of the tensorboard will be located under runs/ directory, which will be created automatically after program started.

Meanwhile, the models for each epoch will be saved under the saved_model directory.

The evaluation will be automatic executed after each training epoch.

Cite

If you use our code, please kindly cite following:

@article{liu2021cl4ac,
  title={CL4AC: A Contrastive Loss for Audio Captioning},
  author={Liu, Xubo and Huang, Qiushi and Mei, Xinhao and Ko, Tom and Tang, H Lilian and Plumbley, Mark D and Wang, Wenwu},
  journal={Proceedings of the Detection and Classification of Acoustic Scenes and Events 2021 Workshop (DCASE 2021)},
  year={2021}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
config_loader		config_loader
data		data
data_loader		data_loader
evaluation		evaluation
figures		figures
model		model
trainer		trainer
utils		utils
w2v_tools		w2v_tools
README.md		README.md
environment.yml		environment.yml
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CL4AC: A Contrastive Loss for Audio Captioning

Model architecture

Set up environment

Dataset

Training

Cite

About

Releases

Packages

Languages

liuxubo717/cl4ac

Folders and files

Latest commit

History

Repository files navigation

CL4AC: A Contrastive Loss for Audio Captioning

Model architecture

Set up environment

Dataset

Training

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages