You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.
Hi @yoloyash we haven't looked into taking YOLO models to 4-bit but I do agree that this drop in accuracy is unexpected. You can try using our newer repo which contains better support for 4-bit + channelwise for PTQ if you are interested: https://github.com/neuralmagic/compressed-tensors
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.
Recipe
Hi, can you tell me on which versions of Pytorch onnx DeepSparse sparseml you get pruning and yolov8 quantization? I have issue on that
Hello, I'm trying to train YOLOv8-large in int4 format. I took the training recipe available at sparsezoo for training yolov8-large. I modified the num_bits to 4 everywhere. I also saw here #1679 that we can add channel-wise quantisation so I've added that as well. However, the performance is quite inferior ([email protected])? Also I will be exporting the model to onnx for inference on a FPFGA (5-bit), so I need the model to be strictly 4 bit.
Recipe
version: 1.1.0
metadata:
General Hyperparams
pruning_num_epochs: 90
pruning_init_lr: 0.01
pruning_final_lr: 0.0002
weights_warmup_lr: 0
biases_warmup_lr: 0.1
qat_init_lr: 1e-4
qat_final_lr: 1e-6
Pruning Hyperparams
init_sparsity: 0.05
pruning_start_epoch: 4
pruning_end_epoch: 50
pruning_update_frequency: 1.0
Quantization variables
qat_start_epoch: eval(pruning_num_epochs)
qat_epochs: 3
qat_end_epoch: eval(qat_start_epoch + qat_epochs)
observer_freeze_epoch: eval(qat_end_epoch)
bn_freeze_epoch: eval(qat_end_epoch)
qat_ft_epochs: 3
num_epochs: eval(pruning_num_epochs + qat_epochs + 2 * qat_ft_epochs)
#Modifiers
training_modifiers:
!EpochRangeModifier
start_epoch: 0
end_epoch: eval(num_epochs)
!LearningRateFunctionModifier
start_epoch: 3
end_epoch: eval(pruning_num_epochs)
lr_func: linear
init_lr: eval(pruning_init_lr)
final_lr: eval(pruning_final_lr)
!LearningRateFunctionModifier
start_epoch: 0
end_epoch: 3
lr_func: linear
init_lr: eval(weights_warmup_lr)
final_lr: eval(pruning_init_lr)
param_groups: [0, 1]
!LearningRateFunctionModifier
start_epoch: 0
end_epoch: 3
lr_func: linear
init_lr: eval(biases_warmup_lr)
final_lr: eval(pruning_init_lr)
param_groups: [2]
!LearningRateFunctionModifier
start_epoch: eval(qat_start_epoch)
end_epoch: eval(qat_end_epoch)
lr_func: cosine
init_lr: eval(qat_init_lr)
final_lr: eval(qat_final_lr)
!LearningRateFunctionModifier
start_epoch: eval(qat_end_epoch)
end_epoch: eval(qat_end_epoch + qat_ft_epochs)
lr_func: cosine
init_lr: eval(qat_init_lr)
final_lr: eval(qat_final_lr)
!LearningRateFunctionModifier
start_epoch: eval(qat_end_epoch + qat_ft_epochs)
end_epoch: eval(qat_end_epoch + 2 * qat_ft_epochs)
lr_func: cosine
init_lr: eval(qat_init_lr)
final_lr: eval(qat_final_lr)
pruning_modifiers:
!ConstantPruningModifier
start_epoch: eval(qat_start_epoch)
params: ["re:^((?!dfl).)*$"]
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.46
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8999
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.514
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.7675
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8117
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.6457
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8627
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8764
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9189
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8305
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.7417
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8888
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.6063
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9468
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.7907
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9409
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.6811
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9343
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9771
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.989
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.5626
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.713
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9099
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.927
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9521
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9569
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.8474
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.9651
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
!GMPruningModifier
init_sparsity: eval(init_sparsity)
final_sparsity: 0.4
params:
inter_func: cubic
global_sparsity: false
start_epoch: eval(pruning_start_epoch)
end_epoch: eval(pruning_end_epoch)
update_frequency: 1
quantization_modifiers:
start_epoch: eval(qat_start_epoch)
disable_quantization_observer_epoch: eval(observer_freeze_epoch)
freeze_bn_stats_epoch: eval(bn_freeze_epoch)
ignore: ['Upsample', 'Concat', 'model.22.dfl.conv']
scheme_overrides:
model.2.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.2.m.0.cv1.conv:
input_activations: null
model.2.m.0.add_input_0:
input_activations: null
model.4.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.4.m.0.cv1.conv:
input_activations: null
model.4.m.0.add_input_0:
input_activations: null
model.4.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.5.conv:
input_activations: null
model.6.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.6.m.0.cv1.conv:
input_activations: null
model.6.m.0.add_input_0:
input_activations: null
model.6.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.7.conv:
input_activations: null
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.8.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.8.m.0.cv1.conv:
input_activations: null
model.8.m.0.add_input_0:
input_activations: null
model.8.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.9.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.9.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.12.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.12.m.0.cv1.conv:
input_activations: null
model.12.m.0.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.12.m.1.cv1.conv:
input_activations: null
model.12.m.1.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.12.m.2.cv1.conv:
input_activations: null
model.12.m.2.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.12.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.15.cv1.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.15.m.0.cv1.conv:
input_activations: null
model.15.m.0.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.15.m.1.cv1.conv:
input_activations: null
model.15.m.1.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.15.m.2.cv1.conv:
input_activations: null
model.15.m.2.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.15.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.16.conv:
input_activations: null
model.16.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.18.cv1.act:
output_activations:
num_bits: 4
symmetric: false
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.18.m.0.cv1.conv:
input_activations: null
model.18.m.0.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.18.m.1.cv1.conv:
input_activations: null
model.18.m.1.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.18.m.2.cv1.conv:
input_activations: null
model.18.m.2.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.19.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.21.cv1.act:
output_activations:
num_bits: 4
symmetric: false
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.21.m.0.cv1.conv:
input_activations: null
model.21.m.0.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.21.m.1.cv1.conv:
input_activations: null
model.21.m.1.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.21.m.2.cv1.conv:
input_activations: null
model.21.m.2.cv2.act:
output_activations:
num_bits: 4
symmetric: False
weights:
num_bits: 4
symmetric: True
strategy: "channel"
model.22.cv2.0.0.conv:
input_activations: null
model.22.cv3.0.0.conv:
input_activations: null
The text was updated successfully, but these errors were encountered: