Skip to content

This repository is the official implementation of the ACL 2024 paper: Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations.

License

Notifications You must be signed in to change notification settings

XiaoMi/transact-pruning

Repository files navigation

TransAct

This repository is the official implementation of the ACL 2024 paper Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations.

transact.png latency.png
Pruning architecture Latency on Xiaomi 14 mobile phone

Training and Evaluation

  1. Prepare environment following transact.dockerfile.
  2. Create symlink to your data, models, outputs, and HuggingFace cache (mainly for large datasets).
    ln -s /path/to/data data
    ln -s /path/to/models models
    ln -s /path/to/outputs outputs
    ln -s /path/to/hf-cache hf-cache
  3. Tweak training configs train_config.yaml and deepspeed.json.
  4. Run the train script run_trainer.sh, for example
    bash run_trainer.sh -m all \
      -a 768 -f 1536 \
      -n 128 -k 8 -p acts \
      -l 4096 -t 50 \
      -g 64 -b 4 \
      -d togethercomputer/RedPajama-Data-1T \
      -x llama -y llama2 -z 7B
    Run bash run_trainer.sh -h for help.
  5. Run the evaluation script eval.sh, for example
    bash eval.sh -m all \
      -a 768 -f 1536 \
      -n 128 -k 8 -p acts \
      -l 4096 -t 50 \
      -d togethercomputer/RedPajama-Data-1T \
      -x llama -y llama2 -z 7B
    Run bash eval.sh -h for help.

Citations

Please cite the paper if this repository is useful for you.

@inproceedings{shen-etal-2024-pruning,
    title = "Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations",
    author = "Shen, Bowen  and
      Lin, Zheng  and
      Zha, Daren  and
      Liu, Wei  and
      Luan, Jian  and
      Wang, Bin  and
      Wang, Weiping",
    booktitle = "Findings of the Association for Computational Linguistics ACL 2024",
    year = "2024",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-acl.582",
    doi = "10.18653/v1/2024.findings-acl.582",
    pages = "9781--9793",
}

About

This repository is the official implementation of the ACL 2024 paper: Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published