Skip to content
/ CELA Public
forked from pepsi2222/CELA

Cost-efficient language model alighment for recommendation

Notifications You must be signed in to change notification settings

USTCLLM/CELA

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CELA: Cost-Efficient Language Model Alignment for CTR

Language models have demonstrated remarkable versatility across various fields, with their extensive knowledge and reasoning capabilities showing promise for enhancing recommendation tasks. Researchers have found that fine-tuning language models for downstream tasks can further boost their effectiveness; however, applying this to large-scale recommendation data is often prohibitively time-consuming due to the volume of user-item interactions and the complexity of language models. Therefore, this repo aims to provide an efficient fine-tuning method to quickly enhance the PLM's compatibility with recommendation data while leveraging texts to improve CTR prediction performance.

For more details, please refer to our paper.

overview

Usage

Clone this repo and set DATA_MOUNT_DIR=[DOWNLOAD_PATH]/data in your environment.

Data preprocess

Download Amazon Sports dataset from here and then process data:

python build_dataset.py amazon-sports

Training

ID-Based Model

python run_ctr.py amazon-sports

Pre-training LM set model_name_or_path in config/mlm.yaml and then

python script/run_mlm.py amazon-sports

Fine-tuning LM set ctr_model/pretrained_dir in config/align.yaml and then

python script/run_align.py amazon-sports [PRE_TRAINED_LM_PATH]

Training recommendation backbone

python script/run_cotrain.py amazon-sports [OUPUT_PATH] [FT_LM_PATH] [TOKENIZER_PATH]

Citation

If you find this project useful in your research, please cite our research paper:

@article{wang2024cela,
  title={CELA: Cost-Efficient Language Model Alignment for CTR Prediction},
  author={Wang, Xingmei and Liu, Weiwen and Chen, Xiaolong and Liu, Qi and Huang, Xu and Yichao, Wang and Li, Xiangyang and Wang, Yasheng and Dong, Zhenhua and Lian, Defu and Tang, Ruiming},
  journal={arXiv preprint arXiv:2405.10596},
  year={2024}
}

About

Cost-efficient language model alighment for recommendation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.9%
  • Jupyter Notebook 0.6%
  • Cuda 0.4%
  • Shell 0.1%
  • Dockerfile 0.0%
  • C++ 0.0%