Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YOLOv8 - INT4 Training #2346

Open
yoloyash opened this issue Jun 23, 2024 · 2 comments
Open

YOLOv8 - INT4 Training #2346

yoloyash opened this issue Jun 23, 2024 · 2 comments

Comments

@yoloyash
Copy link

Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.

Recipe

version: 1.1.0

metadata:

General Hyperparams

pruning_num_epochs: 90
pruning_init_lr: 0.01
pruning_final_lr: 0.0002
weights_warmup_lr: 0
biases_warmup_lr: 0.1
qat_init_lr: 1e-4
qat_final_lr: 1e-6

Pruning Hyperparams

init_sparsity: 0.05
pruning_start_epoch: 4
pruning_end_epoch: 50
pruning_update_frequency: 1.0

Quantization variables

qat_start_epoch: eval(pruning_num_epochs)
qat_epochs: 3
qat_end_epoch: eval(qat_start_epoch + qat_epochs)
observer_freeze_epoch: eval(qat_end_epoch)
bn_freeze_epoch: eval(qat_end_epoch)
qat_ft_epochs: 3
num_epochs: eval(pruning_num_epochs + qat_epochs + 2 * qat_ft_epochs)

#Modifiers
training_modifiers:

  • !EpochRangeModifier
    start_epoch: 0
    end_epoch: eval(num_epochs)

  • !LearningRateFunctionModifier
    start_epoch: 3
    end_epoch: eval(pruning_num_epochs)
    lr_func: linear
    init_lr: eval(pruning_init_lr)
    final_lr: eval(pruning_final_lr)

  • !LearningRateFunctionModifier
    start_epoch: 0
    end_epoch: 3
    lr_func: linear
    init_lr: eval(weights_warmup_lr)
    final_lr: eval(pruning_init_lr)
    param_groups: [0, 1]

  • !LearningRateFunctionModifier
    start_epoch: 0
    end_epoch: 3
    lr_func: linear
    init_lr: eval(biases_warmup_lr)
    final_lr: eval(pruning_init_lr)
    param_groups: [2]

  • !LearningRateFunctionModifier
    start_epoch: eval(qat_start_epoch)
    end_epoch: eval(qat_end_epoch)
    lr_func: cosine
    init_lr: eval(qat_init_lr)
    final_lr: eval(qat_final_lr)

  • !LearningRateFunctionModifier
    start_epoch: eval(qat_end_epoch)
    end_epoch: eval(qat_end_epoch + qat_ft_epochs)
    lr_func: cosine
    init_lr: eval(qat_init_lr)
    final_lr: eval(qat_final_lr)

  • !LearningRateFunctionModifier
    start_epoch: eval(qat_end_epoch + qat_ft_epochs)
    end_epoch: eval(qat_end_epoch + 2 * qat_ft_epochs)
    lr_func: cosine
    init_lr: eval(qat_init_lr)
    final_lr: eval(qat_final_lr)

pruning_modifiers:

  • !ConstantPruningModifier
    start_epoch: eval(qat_start_epoch)
    params: ["re:^((?!dfl).)*$"]

  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.46
    params:

    • model.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8999
    params:

    • model.1.conv.weight
    • model.4.m.1.cv1.conv.weight
    • model.4.m.4.cv2.conv.weight
    • model.6.m.1.cv1.conv.weight
    • model.21.m.1.cv1.conv.weight
    • model.21.m.2.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.514
    params:

    • model.2.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.7675
    params:

    • model.2.cv2.conv.weight
    • model.12.m.0.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8117
    params:

    • model.3.conv.weight
    • model.8.cv2.conv.weight
    • model.12.m.1.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.6457
    params:

    • model.4.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8627
    params:

    • model.4.cv2.conv.weight
    • model.5.conv.weight
    • model.8.m.1.cv1.conv.weight
    • model.22.cv3.1.1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8764
    params:

    • model.4.m.0.cv1.conv.weight
    • model.6.m.3.cv2.conv.weight
    • model.7.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9189
    params:

    • model.4.m.1.cv2.conv.weight
    • model.6.m.5.cv1.conv.weight
    • model.15.m.2.cv1.conv.weight
    • model.18.m.0.cv1.conv.weight
    • model.18.m.2.cv1.conv.weight
    • model.22.cv3.0.1.conv.weight
    • model.22.cv3.2.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8305
    params:

    • model.4.m.2.cv1.conv.weight
    • model.4.m.5.cv2.conv.weight
    • model.6.cv2.conv.weight
    • model.6.m.4.cv2.conv.weight
    • model.15.m.0.cv2.conv.weight
    • model.15.m.1.cv1.conv.weight
    • model.15.m.2.cv2.conv.weight
    • model.18.cv2.conv.weight
    • model.21.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.7417
    params:

    • model.4.m.2.cv2.conv.weight
    • model.18.cv1.conv.weight
    • model.22.cv3.2.1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8888
    params:

    • model.4.m.3.cv2.conv.weight
    • model.6.m.3.cv1.conv.weight
    • model.15.m.1.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.6063
    params:

    • model.6.cv1.conv.weight
    • model.12.cv1.conv.weight
    • model.12.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9468
    params:

    • model.6.m.0.cv1.conv.weight
    • model.21.m.2.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.7907
    params:

    • model.6.m.0.cv2.conv.weight
    • model.8.m.0.cv1.conv.weight
    • model.12.m.0.cv2.conv.weight
    • model.12.m.1.cv1.conv.weight
    • model.22.cv2.2.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9409
    params:

    • model.6.m.1.cv2.conv.weight
    • model.18.m.2.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.6811
    params:

    • model.8.cv1.conv.weight
    • model.15.cv1.conv.weight
    • model.15.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9343
    params:

    • model.8.m.0.cv2.conv.weight
    • model.8.m.1.cv2.conv.weight
    • model.18.m.0.cv2.conv.weight
    • model.18.m.1.cv1.conv.weight
    • model.21.m.0.cv1.conv.weight
    • model.21.m.1.cv2.conv.weight
    • model.22.cv3.0.0.conv.weight
    • model.22.cv3.1.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9771
    params:

    • model.8.m.2.cv1.conv.weight
    • model.22.cv2.0.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.989
    params:

    • model.8.m.2.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.5626
    params:

    • model.9.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.713
    params:

    • model.9.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9099
    params:

    • model.12.m.2.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.927
    params:

    • model.12.m.2.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9521
    params:

    • model.16.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9569
    params:

    • model.18.m.1.cv2.conv.weight
    • model.19.conv.weight
    • model.21.m.0.cv2.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.8474
    params:

    • model.21.cv1.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.9651
    params:

    • model.22.cv2.1.0.conv.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1
  • !GMPruningModifier
    init_sparsity: eval(init_sparsity)
    final_sparsity: 0.4
    params:

    • model.22.cv3.0.2.weight
    • model.22.cv3.1.2.weight
      inter_func: cubic
      global_sparsity: false
      start_epoch: eval(pruning_start_epoch)
      end_epoch: eval(pruning_end_epoch)
      update_frequency: 1

quantization_modifiers:

  • !QuantizationModifier
    start_epoch: eval(qat_start_epoch)
    disable_quantization_observer_epoch: eval(observer_freeze_epoch)
    freeze_bn_stats_epoch: eval(bn_freeze_epoch)
    ignore: ['Upsample', 'Concat', 'model.22.dfl.conv']
    scheme_overrides:
    model.2.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.2.m.0.cv1.conv:
    input_activations: null
    model.2.m.0.add_input_0:
    input_activations: null
    model.4.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.4.m.0.cv1.conv:
    input_activations: null
    model.4.m.0.add_input_0:
    input_activations: null
    model.4.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.5.conv:
    input_activations: null
    model.6.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.6.m.0.cv1.conv:
    input_activations: null
    model.6.m.0.add_input_0:
    input_activations: null
    model.6.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.7.conv:
    input_activations: null
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.8.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.8.m.0.cv1.conv:
    input_activations: null
    model.8.m.0.add_input_0:
    input_activations: null
    model.8.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.9.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.9.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.12.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.12.m.0.cv1.conv:
    input_activations: null
    model.12.m.0.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.12.m.1.cv1.conv:
    input_activations: null
    model.12.m.1.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.12.m.2.cv1.conv:
    input_activations: null
    model.12.m.2.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.12.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.15.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.15.m.0.cv1.conv:
    input_activations: null
    model.15.m.0.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.15.m.1.cv1.conv:
    input_activations: null
    model.15.m.1.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.15.m.2.cv1.conv:
    input_activations: null
    model.15.m.2.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.15.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.16.conv:
    input_activations: null
    model.16.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.18.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: false
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.18.m.0.cv1.conv:
    input_activations: null
    model.18.m.0.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.18.m.1.cv1.conv:
    input_activations: null
    model.18.m.1.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.18.m.2.cv1.conv:
    input_activations: null
    model.18.m.2.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.19.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.21.cv1.act:
    output_activations:
    num_bits: 4
    symmetric: false
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.21.m.0.cv1.conv:
    input_activations: null
    model.21.m.0.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.21.m.1.cv1.conv:
    input_activations: null
    model.21.m.1.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.21.m.2.cv1.conv:
    input_activations: null
    model.21.m.2.cv2.act:
    output_activations:
    num_bits: 4
    symmetric: False
    weights:
    num_bits: 4
    symmetric: True
    strategy: "channel"
    model.22.cv2.0.0.conv:
    input_activations: null
    model.22.cv3.0.0.conv:
    input_activations: null

@bfineran
Copy link
Contributor

Hi @yoloyash we haven't looked into taking YOLO models to 4-bit but I do agree that this drop in accuracy is unexpected. You can try using our newer repo which contains better support for 4-bit + channelwise for PTQ if you are interested: https://github.com/neuralmagic/compressed-tensors

@KozlovKY
Copy link

Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.

Recipe

Hi, can you tell me on which versions of Pytorch onnx DeepSparse sparseml you get pruning and yolov8 quantization? I have issue on that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants