Skip to content

Latest commit

 

History

History
63 lines (56 loc) · 2.13 KB

README.md

File metadata and controls

63 lines (56 loc) · 2.13 KB

TransAct

This repository is the official implementation of the ACL 2024 paper Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations.

transact.png latency.png
Pruning architecture Latency on Xiaomi 14 mobile phone

Training and Evaluation

  1. Prepare environment following transact.dockerfile.
  2. Create symlink to your data, models, outputs, and HuggingFace cache (mainly for large datasets).
    ln -s /path/to/data data
    ln -s /path/to/models models
    ln -s /path/to/outputs outputs
    ln -s /path/to/hf-cache hf-cache
  3. Tweak training configs train_config.yaml and deepspeed.json.
  4. Run the train script run_trainer.sh, for example
    bash run_trainer.sh -m all \
      -a 768 -f 1536 \
      -n 128 -k 8 -p acts \
      -l 4096 -t 50 \
      -g 64 -b 4 \
      -d togethercomputer/RedPajama-Data-1T \
      -x llama -y llama2 -z 7B
    Run bash run_trainer.sh -h for help.
  5. Run the evaluation script eval.sh, for example
    bash eval.sh -m all \
      -a 768 -f 1536 \
      -n 128 -k 8 -p acts \
      -l 4096 -t 50 \
      -d togethercomputer/RedPajama-Data-1T \
      -x llama -y llama2 -z 7B
    Run bash eval.sh -h for help.

Citations

Please cite the paper if this repository is useful for you.

@inproceedings{shen-etal-2024-pruning,
    title = "Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations",
    author = "Shen, Bowen  and
      Lin, Zheng  and
      Zha, Daren  and
      Liu, Wei  and
      Luan, Jian  and
      Wang, Bin  and
      Wang, Weiping",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.582",
    doi = "10.18653/v1/2024.findings-acl.582",
    pages = "9781--9793",
}