微调时out of memory #97

eijix · 2023-07-16T03:28:22Z

我用a100 40g执行微调报显存不够，有什么办法解决吗？一定要80g的？
参数如下：
#! /bin/bash
export CUDA_VISIBLE_DEVICES=0
GPUS_PER_NODE=1

NNODES=1
MASTER_ADDR="localhost"
MASTER_PORT=12345

OPTS=""
OPTS+=" --use-delta"
OPTS+=" --model-config ../cpm-bee-10b.json"
OPTS+=" --dataset /root/autodl-tmp/cpm_bee_training/voice_train/bin_data/adhoc_train_cpmbee0621_train_000000_0_claude2/"
#OPTS+=" --eval_dataset path/to/eval/dataset"
OPTS+=" --epoch 150"
OPTS+=" --batch-size 1"
#OPTS+=" --train-iters 100"
OPTS+=" --save-name cpm_bee_finetune"
OPTS+=" --max-length 128"
OPTS+=" --save results/"
OPTS+=" --lr 0.0001"
OPTS+=" --inspect-iters 100"
OPTS+=" --warmup-iters 1"
OPTS+=" --eval-interval 100"
OPTS+=" --early-stop-patience 5"
OPTS+=" --lr-decay-style noam"
OPTS+=" --weight-decay 0.01"
OPTS+=" --clip-grad 1.0"
OPTS+=" --loss-scale 32768"
OPTS+=" --start-step 0"
OPTS+=" --load /root/autodl-tmp/cpm-bee-10b/pytorch_model.bin"

CMD="torchrun --nnodes=${NNODES} --nproc_per_node=${GPUS_PER_NODE} ../finetune_cpm_bee.py ${OPTS}"

echo ${CMD}
$CMD

susimonxu · 2023-08-01T10:02:19Z

要60G+才能lora微调

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

微调时out of memory #97

微调时out of memory #97

eijix commented Jul 16, 2023

susimonxu commented Aug 1, 2023

微调时out of memory #97

微调时out of memory #97

Comments

eijix commented Jul 16, 2023

susimonxu commented Aug 1, 2023