diff --git a/.idea/.gitignore b/.idea/.gitignore deleted file mode 100644 index 13566b8..0000000 --- a/.idea/.gitignore +++ /dev/null @@ -1,8 +0,0 @@ -# Default ignored files -/shelf/ -/workspace.xml -# Editor-based HTTP Client requests -/httpRequests/ -# Datasource local storage ignored files -/dataSources/ -/dataSources.local.xml diff --git a/.idea/encodings.xml b/.idea/encodings.xml deleted file mode 100644 index fa9b8a7..0000000 --- a/.idea/encodings.xml +++ /dev/null @@ -1,9 +0,0 @@ - - - - - - - - - \ No newline at end of file diff --git a/.idea/inspectionProfiles/Project_Default.xml b/.idea/inspectionProfiles/Project_Default.xml deleted file mode 100644 index 03d9549..0000000 --- a/.idea/inspectionProfiles/Project_Default.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - \ No newline at end of file diff --git a/.idea/inspectionProfiles/profiles_settings.xml b/.idea/inspectionProfiles/profiles_settings.xml deleted file mode 100644 index 105ce2d..0000000 --- a/.idea/inspectionProfiles/profiles_settings.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - \ No newline at end of file diff --git a/.idea/misc.xml b/.idea/misc.xml deleted file mode 100644 index 4b2f238..0000000 --- a/.idea/misc.xml +++ /dev/null @@ -1,4 +0,0 @@ - - - - \ No newline at end of file diff --git a/.idea/modules.xml b/.idea/modules.xml deleted file mode 100644 index 9c63b12..0000000 --- a/.idea/modules.xml +++ /dev/null @@ -1,8 +0,0 @@ - - - - - - - - \ No newline at end of file diff --git a/.idea/pytorch-classifier.iml b/.idea/pytorch-classifier.iml deleted file mode 100644 index ddd9297..0000000 --- a/.idea/pytorch-classifier.iml +++ /dev/null @@ -1,12 +0,0 @@ - - - - - - - - - - \ No newline at end of file diff --git a/.idea/vcs.xml b/.idea/vcs.xml deleted file mode 100644 index 94a25f7..0000000 --- a/.idea/vcs.xml +++ /dev/null @@ -1,6 +0,0 @@ - - - - - - \ No newline at end of file diff --git a/README.md b/README.md index 87a44e2..537a2e1 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,16 @@ image classifier implement in pytoch. +# Directory +1. **[Introduction](#Introduction)** +2. **[How to use](#Howtouse)** +3. **[Argument Explanation](#ArgumentExplanation)** +4. **[Model Zoo](#ModelZoo)** +5. **[Some explanation](#Someexplanation)** +6. **[TODO](#TODO)** +7. **[Reference](#Reference)** + + ## Introduction 为什么推荐你使用这个代码? @@ -15,7 +25,7 @@ image classifier implement in pytoch. 7. 总体精度可视化.(kappa,precision,recll,f1,accuracy,mpa) - **丰富的模型库** - 1. 由作者整合的丰富模型库,主流的模型基本全部支持,支持的模型个数高达50+,其全部支持ImageNet的预训练权重,[详细请看Model Zoo.(变形金刚系列后续更新)](#3) + 1. 由作者整合的丰富模型库,主流的模型基本全部支持,支持的模型个数高达50+,其全部支持ImageNet的预训练权重,[详细请看Model Zoo.(变形金刚系列后续更新)](#ModelZoo) 2. 目前支持的模型都是通过作者从github和torchvision整合,因此支持修改、改进模型进行实验,并不是直接调用库创建模型. - **丰富的训练策略** @@ -33,6 +43,9 @@ image classifier implement in pytoch. - **丰富的学习率调整策略** 本程序支持学习率预热,支持预热后的自定义学习率策略.[详细看Some explanation第五点](#1) +- **支持导出各种常用推理框架模型** + 目前支持导出torchscript,onnx,tensorrt推理模型. + - **简单的安装过程** @@ -44,11 +57,15 @@ image classifier implement in pytoch. 1. 大部分可视化数据(混淆矩阵,tsne,每个类别的指标)都会以csv或者log的格式保存到本地,方便后期美工图像. 2. 程序大部分输出信息使用PrettyTable进行美化输出,大大增加可观性. + + ## How to use 1. 安装程序所需的[环境](#6). 2. 根据[Some explanation中的第三点](#5)处理好数据集. + + ## Argument Explanation - **main.py** @@ -66,6 +83,9 @@ image classifier implement in pytoch. - **config** type: string, default: config/config.py 配置文件的路径. + - **device** + type: string, default: '' + 使用的设备.(cuda device, i.e. 0 or 0,1,2,3 or cpu) - **train_path** type: string, default: dataset/train 训练集的路径. @@ -165,6 +185,9 @@ image classifier implement in pytoch. - **rdrop** default: False 是否采用R-Drop.(不支持知识蒸馏) + - **ema** + default: False + 是否采用EMA.(不支持知识蒸馏) - **metrice.py** 实现计算指标的主要程序. 参数解释: @@ -179,7 +202,10 @@ image classifier implement in pytoch. 测试集的路径. - **label_path** type: string, default: dataset/label.txt - 标签的路径. + 标签的路径. + - **device** + type: string, default: '' + 使用的设备.(cuda device, i.e. 0 or 0,1,2,3 or cpu) - **task** type: string, default: test, choices: ['train', 'val', 'test', 'fps'] 任务类型.选择fps就是单独计算fps指标,选择train、val、test就是计算其指标. @@ -222,7 +248,9 @@ image classifier implement in pytoch. - **cam_type** type: string, default: GradCAMPlusPlus, choices: ['GradCAM', 'HiResCAM', 'ScoreCAM', 'GradCAMPlusPlus', 'AblationCAM', 'XGradCAM', 'EigenCAM', 'FullGrad'] 热力图可视化的类型. - + - **device** + type: string, default: '' + 使用的设备.(cuda device, i.e. 0 or 0,1,2,3 or cpu) - **processing.py** 实现预处理数据集的主要程序. 参数解释: @@ -238,7 +266,6 @@ image classifier implement in pytoch. - **test_size** type: float, default: 0.2 测试集的比例. - - **config/config.py** 一些额外的参数配置文件. 参数解释: @@ -246,26 +273,55 @@ image classifier implement in pytoch. default: None Example: lr_scheduler = torch.optim.lr_scheduler.CosineAnnealingLR 自定义的学习率调整器. - - **lr_scheduler_params** default: {'T_max': 10,'eta_min': 1e-6} Example: lr_scheduler_params = {'step_size': 1,'gamma': 0.95} (此处默认为lr_scheduler = torch.optim.lr_scheduler.StepLR) 自定义的学习率调整器的参数,参数需与lr_scheduler匹配. - - **random_seed** default: 0 随机种子设定值. - - **plot_train_batch_count** default: 5 训练过程可视化数据的生成数量. - - **custom_augment** default: transforms.Compose([]) Example: transforms.Compose([transforms.RandomHorizontalFlip(p=0.5),transforms.RandomRotation(degrees=20),]) 自定义的数据增强. +- **export.py** + 导出模型的文件.目前支持torchscript,onnx. + 参数解释: + - **save_path** + type: string, default: runs/exp + 保存的模型路径,也是保存转换结果的路径. + - **image_size** + type: int, default: 224 + 输入模型的图像尺寸大小. + - **image_channel** + type:int, default: 3 + 输入模型的图像通道大小.(目前只支持三通道) + - **batch_size** + type: int, default: 1 + 单次测试所选取的样本个数. + - **dynamic** + default: False + onnx中的dynamic参数. + - **simplify** + default: False + onnx中的simplify参数. + - **half** + default: False + FP16模型导出.(仅支持GPU环境导出) + - **verbose** + default: False + 导出tensorrt时是否显示日志. + - **export** + type: string, default: torchscript choices: ['onnx', 'torchscript', 'tensorrt'] + 选择导出模型. + - **device** + type: string, default: torchscript + 使用的设备.(cuda device, i.e. 0 or 0,1,2,3 or cpu) -

+

## Model Zoo @@ -288,6 +344,8 @@ image classifier implement in pytoch. | cspnet | cspresnet50,cspresnext50,cspdarknet53,cs3darknet_m,cs3darknet_l,cs3darknet_x,cs3darknet_focus_m,cs3darknet_focus_l
cs3sedarknet_l,cs3sedarknet_x,cs3edgenet_x,cs3se_edgenet_x | | dpn | dpn68,dpn68b,dpn92,dpn98,dpn107,dpn131 | + + ## Some explanation 1. 关于cpu和gpu的问题. @@ -526,13 +584,20 @@ image classifier implement in pytoch.

- 17. 关于如何使用albumentations的数据增强问题. - + 17. 关于如何使用albumentations的数据增强问题. 我们可以在[albumentations的github](https://github.com/albumentations-team/albumentations)或者[albumentations的官方网站](https://albumentations.ai/docs/api_reference/augmentations/)中找到自己需要的数据增强的名字,比如[RandomGridShuffle](https://github.com/albumentations-team/albumentations#:~:text=%E2%9C%93-,RandomGridShuffle,-%E2%9C%93)的方法,我们可以在config/config.py中进行创建: Create_Albumentations_From_Name('RandomGridShuffle') 还有些使用者可能需要修改其默认参数,参数可以在其api文档中找到,我们的函数也是支持修改参数的,比如这个RandomGridShuffle函数有一个grid的参数,具体方法如下: Create_Albumentations_From_Name('RandomGridShuffle', grid=(3, 3)) 不止一个参数的话直接也是在后面加即可,但是需要指定其参数的名字. + + 18. 关于export文件的一些解释. + 1. tensorrt建议在ubuntu上使用,并且tensorrt只支持在gpu上导出和推理. + 2. FP16仅支持在gpu上导出和推理. + 3. FP16模式不能与dynamic模式一并使用. + 4. 详细GPU和CPU的推理速度实验请看[v1.2更新日志](v1.2-update_log.md). + + ## TODO - [x] Knowledge Distillation @@ -540,13 +605,16 @@ image classifier implement in pytoch. - [x] R-Drop - [ ] SWA - [ ] DDP Mode -- [ ] Export Model(onnx, tensorrt, torchscript) +- [x] Export Model(onnx, torchscript, TensorRT) - [ ] C++ Inference Code - [ ] Accumulation Gradient - [ ] Model Ensembling - [ ] Freeze Training +- [ ] Support Fuse Conv and Bn - [x] Early Stop + + ## Reference https://github.com/BIGBALLON/CIFAR-ZOO diff --git a/config/__pycache__/config.cpython-38.pyc b/config/__pycache__/config.cpython-38.pyc index 0f17312..bcc2bcf 100644 Binary files a/config/__pycache__/config.cpython-38.pyc and b/config/__pycache__/config.cpython-38.pyc differ diff --git a/export.py b/export.py new file mode 100644 index 0000000..38339be --- /dev/null +++ b/export.py @@ -0,0 +1,119 @@ +import os, argparse +import numpy as np +os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" +import torch +import torch.nn as nn +from utils.utils import select_device + +def export_torchscript(opt, model, img, prefix='TorchScript'): + print('Starting TorchScript export with pytorch %s...' % torch.__version__) + f = os.path.join(opt.save_path, 'best.ts') + ts = torch.jit.trace(model, img, strict=False) + ts.save(f) + print(f'Export TorchScript Model Successfully.\nSave sa {f}') + +def export_onnx(opt, model, img, prefix='ONNX'): + import onnx + f = os.path.join(opt.save_path, 'best.onnx') + print('Starting ONNX export with onnx %s...' % onnx.__version__) + if opt.dynamic: + dynamic_axes = {'images': {0: 'batch', 2: 'height', 3: 'width'}, 'output':{0: 'batch'}} + else: + dynamic_axes = None + + torch.onnx.export( + (model.to('cpu') if opt.dynamic else model), + (img.to('cpu') if opt.dynamic else img), + f, verbose=False, opset_version=13, input_names=['images'], output_names=['output'], dynamic_axes=dynamic_axes) + + onnx_model = onnx.load(f) # load onnx model + onnx.checker.check_model(onnx_model) # check onnx model + + if opt.simplify: + try: + import onnxsim + print('\nStarting to simplify ONNX...') + onnx_model, check = onnxsim.simplify(onnx_model) + assert check, 'assert check failed' + except Exception as e: + print(f'Simplifier failure: {e}') + onnx.save(onnx_model, f) + + print(f'Export Onnx Model Successfully.\nSave sa {f}') + +def export_engine(opt, model, img, workspace=4, prefix='TensorRT'): + export_onnx(opt, model, img) + onnx_file = os.path.join(opt.save_path, 'best.onnx') + assert img.device.type != 'cpu', 'export running on CPU but must be on GPU, i.e. `python export.py --device 0`' + import tensorrt as trt + print('Starting TensorRT export with TensorRT %s...' % trt.__version__) + f = os.path.join(opt.save_path, 'best.engine') + + TRT_LOGGER = trt.Logger(trt.Logger.VERBOSE) if opt.verbose else trt.Logger() + builder = trt.Builder(TRT_LOGGER) + config = builder.create_builder_config() + config.max_workspace_size = workspace * 1 << 30 + + flag = (1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) + network = builder.create_network(flag) + parser = trt.OnnxParser(network, TRT_LOGGER) + if not parser.parse_from_file(str(onnx_file)): + raise RuntimeError(f'failed to load ONNX file: {onnx_file}') + + inputs = [network.get_input(i) for i in range(network.num_inputs)] + outputs = [network.get_output(i) for i in range(network.num_outputs)] + for inp in inputs: + print(f'input {inp.name} with shape {inp.shape} and dtype {inp.dtype}') + for out in outputs: + print(f'output {out.name} with shape {out.shape} and dtype {out.dtype}') + + if opt.dynamic: + if img.shape[0] <= 1: + print(f"{prefix} WARNING: --dynamic model requires maximum --batch-size argument") + profile = builder.create_optimization_profile() + for inp in inputs: + profile.set_shape(inp.name, (1, *img.shape[1:]), (max(1, img.shape[0] // 2), *img.shape[1:]), img.shape) + config.add_optimization_profile(profile) + + print(f'{prefix} building FP{16 if builder.platform_has_fast_fp16 and opt.half else 32} engine in {f}') + if builder.platform_has_fast_fp16 and opt.half: + config.set_flag(trt.BuilderFlag.FP16) + with builder.build_engine(network, config) as engine, open(f, 'wb') as t: + t.write(engine.serialize()) + print(f'Export TensorRT Model Successfully.\nSave sa {f}') + +def parse_opt(): + parser = argparse.ArgumentParser() + parser.add_argument('--save_path', type=str, default=r'runs/exp', help='save path for model and log') + parser.add_argument('--image_size', type=int, default=224, help='image size') + parser.add_argument('--image_channel', type=int, default=3, help='image channel') + parser.add_argument('--batch_size', type=int, default=1, help='batch size') + parser.add_argument('--dynamic', action='store_true', help='dynamic ONNX batchsize') + parser.add_argument('--simplify', action='store_true', help='simplify onnx model') + parser.add_argument('--half', action="store_true", help='FP32 to FP16') + parser.add_argument('--verbose', action="store_true", help='TensorRT:verbose export log') + parser.add_argument('--export', default='torchscript', type=str, choices=['onnx', 'torchscript', 'tensorrt'], help='export type') + parser.add_argument('--device', type=str, default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') + + opt = parser.parse_known_args()[0] + if not os.path.exists(os.path.join(opt.save_path, 'best.pt')): + raise Exception('best.pt not found. please check your --save_path folder') + DEVICE = select_device(opt.device) + if opt.half: + assert DEVICE.type != 'cpu', '--half only supported with GPU export' + assert not opt.dynamic, '--half not compatible with --dynamic' + ckpt = torch.load(os.path.join(opt.save_path, 'best.pt')) + model = ckpt['model'].float().to(DEVICE) + img = torch.rand((opt.batch_size, opt.image_channel, opt.image_size, opt.image_size)).to(DEVICE) + + return opt, (model.half() if opt.half else model), (img.half() if opt.half else img), DEVICE + +if __name__ == '__main__': + opt, model, img, DEVICE = parse_opt() + + if opt.export == 'onnx': + export_onnx(opt, model, img) + elif opt.export == 'torchscript': + export_torchscript(opt, model, img) + elif opt.export == 'tensorrt': + export_engine(opt, model, img) \ No newline at end of file diff --git a/main.py b/main.py index d8bf934..27a01b0 100644 --- a/main.py +++ b/main.py @@ -12,7 +12,7 @@ from utils.utils_model import select_model from utils import utils_aug from utils.utils import save_model, plot_train_batch, WarmUpLR, show_config, setting_optimizer, check_batch_size, \ - plot_log, update_opt, load_weights, get_channels, dict_to_PrettyTable, ModelEMA + plot_log, update_opt, load_weights, get_channels, dict_to_PrettyTable, ModelEMA, select_device from utils.utils_distill import * from utils.utils_loss import * @@ -29,6 +29,7 @@ def parse_opt(): parser.add_argument('--pretrained', action="store_true", help='using pretrain weight') parser.add_argument('--weight', type=str, default='', help='loading weight path') parser.add_argument('--config', type=str, default='config/config.py', help='config path') + parser.add_argument('--device', type=str, default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') parser.add_argument('--train_path', type=str, default=r'dataset/train', help='train data path') parser.add_argument('--val_path', type=str, default=r'dataset/val', help='val data path') @@ -76,7 +77,7 @@ def parse_opt(): # Tricks parameters parser.add_argument('--rdrop', action="store_true", help='using R-Drop') - parser.add_argument('--ema', action="store_true", help='using EMA(Exponential Moving Average)') + parser.add_argument('--ema', action="store_true", help='using EMA(Exponential Moving Average) Reference to YOLOV5') opt = parser.parse_known_args()[0] if opt.resume: @@ -100,7 +101,7 @@ def parse_opt(): show_config(deepcopy(opt)) CLASS_NUM = len(os.listdir(opt.train_path)) - DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") + DEVICE = select_device(opt.device, opt.batch_size) train_transform, test_transform = utils_aug.get_dataprocessing(torchvision.datasets.ImageFolder(opt.train_path), opt) @@ -126,9 +127,7 @@ def parse_opt(): test_dataset = torch.utils.data.DataLoader(test_dataset, max(batch_size // (10 if opt.test_tta else 1), 1), shuffle=False, num_workers=(0 if opt.test_tta else opt.workers)) scaler = torch.cuda.amp.GradScaler(enabled=(opt.amp if torch.cuda.is_available() else False)) - ema = None - if opt.ema: - ema = ModelEMA(model) + ema = ModelEMA(model) if opt.ema else None optimizer = setting_optimizer(opt, model) lr_scheduler = WarmUpLR(optimizer, opt) if opt.resume: @@ -181,7 +180,7 @@ def parse_opt(): elif opt.kd_method == 'AT': kd_loss = AT().to(DEVICE) - print('{} begin train on {}!'.format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'), DEVICE)) + print('{} begin train!'.format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'))) for epoch in range(begin_epoch, opt.epoch): if epoch > (save_epoch + opt.patience) and opt.patience != 0: print('No Improve from {} to {}, EarlyStopping.'.format(save_epoch + 1, epoch)) diff --git a/metrice.py b/metrice.py index 5314325..457091a 100644 --- a/metrice.py +++ b/metrice.py @@ -6,7 +6,7 @@ os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE" import numpy as np from utils import utils_aug -from utils.utils import classification_metrice, Metrice_Dataset, visual_predictions, visual_tsne, dict_to_PrettyTable +from utils.utils import classification_metrice, Metrice_Dataset, visual_predictions, visual_tsne, dict_to_PrettyTable, Model_Inference, select_device torch.backends.cudnn.deterministic = True def set_seed(seed): @@ -21,26 +21,28 @@ def parse_opt(): parser.add_argument('--val_path', type=str, default=r'dataset/val', help='val data path') parser.add_argument('--test_path', type=str, default=r'dataset/test', help='test data path') parser.add_argument('--label_path', type=str, default=r'dataset/label.txt', help='label path') - parser.add_argument('--task', type=str, choices=['train', 'val', 'test', 'fps'], default='val', help='train, val, test, fps') + parser.add_argument('--device', type=str, default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') + parser.add_argument('--task', type=str, choices=['train', 'val', 'test', 'fps'], default='test', help='train, val, test, fps') parser.add_argument('--workers', type=int, default=4, help='dataloader workers') parser.add_argument('--batch_size', type=int, default=64, help='batch size') - parser.add_argument('--save_path', type=str, default=r'runs/mobilenetv2_ST', help='save path for model and log') + parser.add_argument('--save_path', type=str, default=r'runs/exp', help='save path for model and log') parser.add_argument('--test_tta', action="store_true", help='using TTA Tricks') parser.add_argument('--visual', action="store_true", help='visual dataset identification') parser.add_argument('--tsne', action="store_true", help='visual tsne') parser.add_argument('--half', action="store_true", help='use FP16 half-precision inference') + parser.add_argument('--model_type', type=str, choices=['torch', 'torchscript', 'onnx', 'tensorrt'], default='torch', help='model type(default: torch)') opt = parser.parse_known_args()[0] + DEVICE = select_device(opt.device, opt.batch_size) + if opt.half and DEVICE.type == 'cpu': + raise Exception('half inference only supported GPU.') if not os.path.exists(os.path.join(opt.save_path, 'best.pt')): raise Exception('best.pt not found. please check your --save_path folder') ckpt = torch.load(os.path.join(opt.save_path, 'best.pt')) - DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") - model = (ckpt['model'] if opt.half else ckpt['model'].float()) - model.to(DEVICE) - model.eval() train_opt = ckpt['opt'] set_seed(train_opt.random_seed) + model = Model_Inference(DEVICE, opt) print('found checkpoint from {}, model type:{}\n{}'.format(opt.save_path, ckpt['model'].name, dict_to_PrettyTable(ckpt['best_metrice'], 'Best Metrice'))) @@ -48,7 +50,7 @@ def parse_opt(): if opt.task == 'fps': inputs = torch.rand((opt.batch_size, train_opt.image_channel, train_opt.image_size, train_opt.image_size)).to(DEVICE) - if opt.half: + if opt.half and torch.cuda.is_available(): inputs = inputs.half() warm_up, test_time = 100, 300 fps_arr = [] @@ -83,7 +85,6 @@ def parse_opt(): if __name__ == '__main__': opt, model, test_dataset, DEVICE, CLASS_NUM, label, save_path = parse_opt() y_true, y_pred, y_score, y_feature, img_path = [], [], [], [], [] - model.eval() with torch.no_grad(): for x, y, path in tqdm.tqdm(test_dataset, desc='Test Stage'): x = (x.half().to(DEVICE) if opt.half else x.to(DEVICE)) @@ -100,7 +101,11 @@ def parse_opt(): if opt.tsne: pred_feature = model.forward_features(x) - pred = torch.softmax(pred, 1) + try: + pred = torch.softmax(pred, 1) + except: + pred = torch.softmax(torch.from_numpy(pred), 1) # using torch.softmax will faster than numpy + y_true.extend(list(y.cpu().detach().numpy())) y_pred.extend(list(pred.argmax(-1).cpu().detach().numpy())) y_score.extend(list(pred.max(-1)[0].cpu().detach().numpy())) diff --git a/model/__pycache__/__init__.cpython-38.pyc b/model/__pycache__/__init__.cpython-38.pyc index abd0181..242e622 100644 Binary files a/model/__pycache__/__init__.cpython-38.pyc and b/model/__pycache__/__init__.cpython-38.pyc differ diff --git a/model/__pycache__/convnext.cpython-38.pyc b/model/__pycache__/convnext.cpython-38.pyc index df5cc41..ffc943e 100644 Binary files a/model/__pycache__/convnext.cpython-38.pyc and b/model/__pycache__/convnext.cpython-38.pyc differ diff --git a/model/__pycache__/cspnet.cpython-38.pyc b/model/__pycache__/cspnet.cpython-38.pyc index 734cc88..88d3bcc 100644 Binary files a/model/__pycache__/cspnet.cpython-38.pyc and b/model/__pycache__/cspnet.cpython-38.pyc differ diff --git a/model/__pycache__/densenet.cpython-38.pyc b/model/__pycache__/densenet.cpython-38.pyc index 908a023..ee1bd52 100644 Binary files a/model/__pycache__/densenet.cpython-38.pyc and b/model/__pycache__/densenet.cpython-38.pyc differ diff --git a/model/__pycache__/dpn.cpython-38.pyc b/model/__pycache__/dpn.cpython-38.pyc index d8ad239..eb068ec 100644 Binary files a/model/__pycache__/dpn.cpython-38.pyc and b/model/__pycache__/dpn.cpython-38.pyc differ diff --git a/model/__pycache__/efficientnetv2.cpython-38.pyc b/model/__pycache__/efficientnetv2.cpython-38.pyc index 90a15f1..eabe70a 100644 Binary files a/model/__pycache__/efficientnetv2.cpython-38.pyc and b/model/__pycache__/efficientnetv2.cpython-38.pyc differ diff --git a/model/__pycache__/ghostnet.cpython-38.pyc b/model/__pycache__/ghostnet.cpython-38.pyc index f6a888e..9d5452d 100644 Binary files a/model/__pycache__/ghostnet.cpython-38.pyc and b/model/__pycache__/ghostnet.cpython-38.pyc differ diff --git a/model/__pycache__/mnasnet.cpython-38.pyc b/model/__pycache__/mnasnet.cpython-38.pyc index 62bba1b..4919f15 100644 Binary files a/model/__pycache__/mnasnet.cpython-38.pyc and b/model/__pycache__/mnasnet.cpython-38.pyc differ diff --git a/model/__pycache__/mobilenetv2.cpython-38.pyc b/model/__pycache__/mobilenetv2.cpython-38.pyc index d01776f..1ae2b42 100644 Binary files a/model/__pycache__/mobilenetv2.cpython-38.pyc and b/model/__pycache__/mobilenetv2.cpython-38.pyc differ diff --git a/model/__pycache__/mobilenetv3.cpython-38.pyc b/model/__pycache__/mobilenetv3.cpython-38.pyc index e45f541..8eb4b5e 100644 Binary files a/model/__pycache__/mobilenetv3.cpython-38.pyc and b/model/__pycache__/mobilenetv3.cpython-38.pyc differ diff --git a/model/__pycache__/repvgg.cpython-38.pyc b/model/__pycache__/repvgg.cpython-38.pyc index 603bcad..69592bb 100644 Binary files a/model/__pycache__/repvgg.cpython-38.pyc and b/model/__pycache__/repvgg.cpython-38.pyc differ diff --git a/model/__pycache__/resnest.cpython-38.pyc b/model/__pycache__/resnest.cpython-38.pyc index 24bc0a1..cc468eb 100644 Binary files a/model/__pycache__/resnest.cpython-38.pyc and b/model/__pycache__/resnest.cpython-38.pyc differ diff --git a/model/__pycache__/resnet.cpython-38.pyc b/model/__pycache__/resnet.cpython-38.pyc index 786b87c..20bdaea 100644 Binary files a/model/__pycache__/resnet.cpython-38.pyc and b/model/__pycache__/resnet.cpython-38.pyc differ diff --git a/model/__pycache__/sequencer.cpython-38.pyc b/model/__pycache__/sequencer.cpython-38.pyc index b2b713e..7f37a30 100644 Binary files a/model/__pycache__/sequencer.cpython-38.pyc and b/model/__pycache__/sequencer.cpython-38.pyc differ diff --git a/model/__pycache__/shufflenetv2.cpython-38.pyc b/model/__pycache__/shufflenetv2.cpython-38.pyc index c9fe966..9033cb9 100644 Binary files a/model/__pycache__/shufflenetv2.cpython-38.pyc and b/model/__pycache__/shufflenetv2.cpython-38.pyc differ diff --git a/model/__pycache__/vgg.cpython-38.pyc b/model/__pycache__/vgg.cpython-38.pyc index a9de89a..e40147a 100644 Binary files a/model/__pycache__/vgg.cpython-38.pyc and b/model/__pycache__/vgg.cpython-38.pyc differ diff --git a/model/__pycache__/vovnet.cpython-38.pyc b/model/__pycache__/vovnet.cpython-38.pyc index 2093e86..e788fe7 100644 Binary files a/model/__pycache__/vovnet.cpython-38.pyc and b/model/__pycache__/vovnet.cpython-38.pyc differ diff --git a/model/cspnet.py b/model/cspnet.py index dcd9709..084ae5f 100644 --- a/model/cspnet.py +++ b/model/cspnet.py @@ -847,7 +847,7 @@ def forward(self, x, need_fea=False): if need_fea: features, features_fc = self.forward_features(x, need_fea=need_fea) x = self.forward_head(features_fc) - return features, features_fc, x + return features, features_fc, x else: x = self.forward_features(x) x = self.forward_head(x) diff --git a/predict.py b/predict.py index f39120a..920c34f 100644 --- a/predict.py +++ b/predict.py @@ -7,7 +7,7 @@ import matplotlib.pyplot as plt import numpy as np from utils import utils_aug -from utils.utils import predict_single_image, cam_visual, dict_to_PrettyTable +from utils.utils import predict_single_image, cam_visual, dict_to_PrettyTable, select_device def set_seed(seed): random.seed(seed) @@ -24,13 +24,17 @@ def parse_opt(): parser.add_argument('--cam_visual', action="store_true", help='visual cam') parser.add_argument('--cam_type', type=str, choices=['GradCAM', 'HiResCAM', 'ScoreCAM', 'GradCAMPlusPlus', 'AblationCAM', 'XGradCAM', 'EigenCAM', 'FullGrad'], default='FullGrad', help='cam type') parser.add_argument('--half', action="store_true", help='use FP16 half-precision inference') + parser.add_argument('--device', type=str, default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') opt = parser.parse_known_args()[0] - if not os.path.exists(os.path.join(opt.save_path, 'best.pt')): raise Exception('best.pt not found. please check your --save_path folder') ckpt = torch.load(os.path.join(opt.save_path, 'best.pt')) - DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu") + DEVICE = select_device(opt.device) + if opt.half and DEVICE.type == 'cpu': + raise Exception('half inference only supported GPU.') + if opt.half and opt.cam_visual: + raise Exception('cam visual only supported FP32.') model = (ckpt['model'] if opt.half else ckpt['model'].float()) model.to(DEVICE) model.eval() diff --git a/requirements.txt b/requirements.txt index 53a9071..4b64ad1 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1,3 +1,7 @@ +# Pytorch-Classifier requirements +# Usage: pip install -r requirements.txt + +# Base ------------------------------------------------------------------------ opencv-python grad-cam timm @@ -8,4 +12,14 @@ pillow thop rfconv albumentations -pycm \ No newline at end of file +pycm + +# Export ---------------------------------------------------------------------- +# onnx # ONNX export +# onnx-simplifier # ONNX simplifier +# nvidia-pyindex # TensorRT export +# nvidia-tensorrt # TensorRT export + +# Export Inference ---------------------------------------------------------------- +# onnxruntime # ONNX CPU Inference +# onnxruntime-gpu # ONNX GPU Inference \ No newline at end of file diff --git a/utils/__pycache__/utils.cpython-38.pyc b/utils/__pycache__/utils.cpython-38.pyc index d6793a3..96ced2e 100644 Binary files a/utils/__pycache__/utils.cpython-38.pyc and b/utils/__pycache__/utils.cpython-38.pyc differ diff --git a/utils/__pycache__/utils_aug.cpython-38.pyc b/utils/__pycache__/utils_aug.cpython-38.pyc index 466a95f..49b4b3f 100644 Binary files a/utils/__pycache__/utils_aug.cpython-38.pyc and b/utils/__pycache__/utils_aug.cpython-38.pyc differ diff --git a/utils/__pycache__/utils_distill.cpython-38.pyc b/utils/__pycache__/utils_distill.cpython-38.pyc index e205253..e48df5b 100644 Binary files a/utils/__pycache__/utils_distill.cpython-38.pyc and b/utils/__pycache__/utils_distill.cpython-38.pyc differ diff --git a/utils/__pycache__/utils_fit.cpython-38.pyc b/utils/__pycache__/utils_fit.cpython-38.pyc index c00582e..e811dc8 100644 Binary files a/utils/__pycache__/utils_fit.cpython-38.pyc and b/utils/__pycache__/utils_fit.cpython-38.pyc differ diff --git a/utils/__pycache__/utils_loss.cpython-38.pyc b/utils/__pycache__/utils_loss.cpython-38.pyc index ffa1b61..4184354 100644 Binary files a/utils/__pycache__/utils_loss.cpython-38.pyc and b/utils/__pycache__/utils_loss.cpython-38.pyc differ diff --git a/utils/__pycache__/utils_model.cpython-38.pyc b/utils/__pycache__/utils_model.cpython-38.pyc index 3110170..078909a 100644 Binary files a/utils/__pycache__/utils_model.cpython-38.pyc and b/utils/__pycache__/utils_model.cpython-38.pyc differ diff --git a/utils/utils.py b/utils/utils.py index 5ed5ee0..eb0d71b 100644 --- a/utils/utils.py +++ b/utils/utils.py @@ -1,5 +1,5 @@ from sklearn import utils -import torch, itertools, os, time, thop, json, cv2, math +import torch, itertools, os, time, thop, json, cv2, math, platform, yaml import torch.nn as nn import torchvision.transforms as transforms import numpy as np @@ -20,6 +20,7 @@ from collections import OrderedDict from .utils_aug import rand_bbox from pycm import ConfusionMatrix +from collections import namedtuple cnames = { 'aliceblue': '#F0F8FF', @@ -315,8 +316,9 @@ def show_config(opt): else: opt[keys] = opt[keys].replace('\n', '') - with open(os.path.join(opt['save_path'], 'param.json'), 'w+') as f: - f.write(json.dumps(opt, indent=4, separators={':', ','})) + with open(os.path.join(opt['save_path'], 'param.yaml'), 'w+') as f: + # f.write(json.dumps(opt, indent=4, separators={':', ','})) + yaml.dump(opt, f) def plot_confusion_matrix(cm, classes, save_path, normalize=True, title='Confusion matrix', cmap=plt.cm.Blues, name='test'): plt.figure(figsize=(min(len(classes), 30), min(len(classes), 30))) @@ -636,20 +638,26 @@ def visual_tsne(feature, y_true, path, labels, save_path): def predict_single_image(path, model, test_transform, DEVICE, half=False): pil_img = Image.open(path) tensor_img = test_transform(pil_img).unsqueeze(0).to(DEVICE) - tensor_img = (tensor_img.half() if half else tensor_img) + tensor_img = (tensor_img.half() if (half and torch.cuda.is_available()) else tensor_img) if len(tensor_img.shape) == 5: tensor_img = tensor_img.reshape((tensor_img.size(0) * tensor_img.size(1), tensor_img.size(2), tensor_img.size(3), tensor_img.size(4))) - pred_result = torch.softmax(model(tensor_img).mean(0), 0) + output = model(tensor_img).mean(0) else: - pred_result = torch.softmax(model(tensor_img)[0], 0) + output = model(tensor_img)[0] + + try: + pred_result = torch.softmax(output, 0) + except: + pred_result = torch.softmax(torch.from_numpy(output), 0) # using torch.softmax will faster than numpy return int(pred_result.argmax()), pred_result class cam_visual: def __init__(self, model, test_transform, DEVICE, target_layers, opt): self.test_transform = test_transform self.DEVICE = DEVICE + self.opt = opt - self.cam_model = eval(opt.cam_type)(model=deepcopy(model).float(), target_layers=[target_layers], use_cuda=torch.cuda.is_available()) + self.cam_model = eval(opt.cam_type)(model=deepcopy(model), target_layers=[target_layers], use_cuda=torch.cuda.is_available()) def __call__(self, path, label): pil_img = Image.open(path) @@ -749,4 +757,117 @@ def update(self, model): if v.dtype.is_floating_point: # true for FP16 and FP32 v *= d v += (1 - d) * msd[k].detach() - # assert v.dtype == msd[k].dtype == torch.float32, f'{k}: EMA {v.dtype} and model {msd[k].dtype} must be FP32' \ No newline at end of file + # assert v.dtype == msd[k].dtype == torch.float32, f'{k}: EMA {v.dtype} and model {msd[k].dtype} must be FP32' + +class Model_Inference: + def __init__(self, device, opt): + self.opt = opt + self.device = device + + if self.opt.model_type == 'torch': + ckpt = torch.load(os.path.join(opt.save_path, 'best.pt')) + self.model = (ckpt['model'] if opt.half else ckpt['model'].float()) + self.model.to(self.device) + self.model.eval() + elif self.opt.model_type == 'onnx': + import onnx, onnxruntime + providers = ['CUDAExecutionProvider'] if torch.cuda.is_available() else ['CPUExecutionProvider'] + self.model = onnxruntime.InferenceSession(os.path.join(opt.save_path, 'best.onnx'), providers=providers) + elif self.opt.model_type == 'torchscript': + self.model = torch.jit.load(os.path.join(opt.save_path, 'best.ts')) + self.model = (self.model.half() if opt.half else self.model) + self.model.to(self.device) + self.model.eval() + elif self.opt.model_type == 'tensorrt': + import tensorrt as trt + if device.type == 'cpu': + raise RuntimeError('Tensorrt not support CPU Inference.') + Binding = namedtuple('Binding', ('name', 'dtype', 'shape', 'data', 'ptr')) + logger = trt.Logger() + with open(os.path.join(opt.save_path, 'best.engine'), 'rb') as f, trt.Runtime(logger) as runtime: + model = runtime.deserialize_cuda_engine(f.read()) + context = model.create_execution_context() + bindings = OrderedDict() + fp16 = False # default updated below + dynamic = False + for index in range(model.num_bindings): + name = model.get_binding_name(index) + dtype = trt.nptype(model.get_binding_dtype(index)) + if model.binding_is_input(index): + if -1 in tuple(model.get_binding_shape(index)): # dynamic + dynamic = True + context.set_binding_shape(index, tuple(model.get_profile_shape(0, index)[2])) + if dtype == np.float16: + fp16 = True + shape = tuple(context.get_binding_shape(index)) + im = torch.from_numpy(np.empty(shape, dtype=dtype)).to(device) + bindings[name] = Binding(name, dtype, shape, im, int(im.data_ptr())) + self.bindings = bindings + self.binding_addrs = OrderedDict((n, d.ptr) for n, d in bindings.items()) + self.batch_size = bindings['images'].shape[0] # if dynamic, this is instead max batch size + self.model = model + self.fp16 = fp16 + self.dynamic = dynamic + self.context = context + + def __call__(self, inputs): + if self.opt.model_type == 'torch': + return self.model(inputs) + elif self.opt.model_type == 'onnx': + inputs = inputs.cpu().numpy().astype(np.float16 if '16' in self.model.get_inputs()[0].type else np.float32) + return self.model.run([self.model.get_outputs()[0].name], {self.model.get_inputs()[0].name: inputs})[0] + elif self.opt.model_type == 'torchscript': + return self.model(inputs) + elif self.opt.model_type == 'tensorrt': + if self.fp16: + inputs = inputs.half() + if self.dynamic and inputs.shape != self.bindings['images'].shape: + i_in, i_out = (self.model.get_binding_index(x) for x in ('images', 'output')) + self.context.set_binding_shape(i_in, inputs.shape) # reshape if dynamic + self.bindings['images'] = self.bindings['images']._replace(shape=inputs.shape) + self.bindings['output'].data.resize_(tuple(self.context.get_binding_shape(i_out))) + s = self.bindings['images'].shape + assert inputs.shape == s, f"input size {inputs.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}" + self.binding_addrs['images'] = int(inputs.data_ptr()) + self.context.execute_v2(list(self.binding_addrs.values())) + y = self.bindings['output'].data + return y + + def forward_features(self, inputs): + try: + return self.model.forward_features(inputs) + except: + raise 'this model is not a torch model.' + + def cam_layer(self): + try: + return self.model.cam_layer() + except: + raise 'this model is not a torch model.' + +def select_device(device='', batch_size=0): + device = str(device).strip().lower().replace('cuda:', '').replace('none', '') # to string, 'cuda:0' to '0' + cpu = device == 'cpu' + if cpu: + os.environ['CUDA_VISIBLE_DEVICES'] = '-1' + elif device: + os.environ['CUDA_VISIBLE_DEVICES'] = device + assert torch.cuda.is_available() and torch.cuda.device_count() >= len(device.replace(',', '')), \ + f"Invalid CUDA '--device {device}' requested, use '--device cpu' or pass valid CUDA device(s)" + + print_str = f'Image-Classifier Python-{platform.python_version()} Torch-{torch.__version__} ' + if not cpu and torch.cuda.is_available(): + devices = device.split(',') if device else '0' + n = len(devices) # device count + if n > 1 and batch_size > 0: # check batch_size is divisible by device_count + assert batch_size % n == 0, f'batch-size {batch_size} not multiple of GPU count {n}' + space = ' ' * len(print_str) + for i, d in enumerate(devices): + p = torch.cuda.get_device_properties(i) + print_str += f"{'' if i == 0 else space}CUDA:{d} ({p.name}, {p.total_memory / (1 << 20):.0f}MiB)\n" + arg = 'cuda:0' + else: + print_str += 'CPU' + arg = 'cpu' + print(print_str) + return torch.device(arg) diff --git a/v1.2-update_log.md b/v1.2-update_log.md new file mode 100644 index 0000000..2375652 --- /dev/null +++ b/v1.2-update_log.md @@ -0,0 +1,82 @@ +# pytorch-classifier v1.2 更新日志 + +1. 新增export.py,支持导出(onnx, torchscript, tensorrt)模型. +2. metrice.py支持onnx,torchscript,tensorrt的推理. + + 此处在predict.py中暂不支持onnx,torchscript,tensorrt的推理的推理,原因是因为predict.py中的热力图可视化没办法在onnx、torchscript、tensorrt中实现,后续单独推理部分会额外写一部分代码. + 在metrice.py中,onnx和torchscript和tensorrt的推理也不支持tsne的可视化,那么我在metrice.py中添加onnx,torchscript,tensorrt的推理的目的是为了测试fps和精度. + 所以简单来说,使用metrice.py最好还是直接用torch模型,torchscript和onnx和tensorrt的推理的推理模型后续会写一个单独的推理代码. +3. main.py,metrice.py,predict.py,export.py中增加--device参数,可以指定设备. +4. 优化程序和修复一些bug. + +--- +#### 训练命令: + python main.py --model_name efficientnet_v2_s --config config/config.py --batch_size 128 --Augment AutoAugment --save_path runs/efficientnet_v2_s --device 0 \ + --pretrained --amp --warmup --ema --imagenet_meanstd + +#### GPU 推理速度测试 sh脚本: + batch_size=1 # 1 2 4 8 16 32 64 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --half --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --half --model_type torchscript --batch_size $batch_size + python export.py --save_path runs/efficientnet_v2_s --export onnx --simplify --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --model_type onnx --batch_size $batch_size + python export.py --save_path runs/efficientnet_v2_s --export onnx --simplify --half --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --model_type onnx --batch_size $batch_size + python export.py --save_path runs/efficientnet_v2_s --export tensorrt --simplify --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --model_type tensorrt --batch_size $batch_size + python export.py --save_path runs/efficientnet_v2_s --export tensorrt --simplify --half --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --model_type tensorrt --half --batch_size $batch_size + +#### CPU 推理速度测试 sh脚本: + python export.py --save_path runs/efficientnet_v2_s --export onnx --simplify --dynamic --device cpu + batch_size=1 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type onnx --batch_size $batch_size + batch_size=2 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type onnx --batch_size $batch_size + batch_size=4 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type onnx --batch_size $batch_size + batch_size=8 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type onnx --batch_size $batch_size + batch_size=16 + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type torchscript --batch_size $batch_size + python metrice.py --task fps --save_path runs/efficientnet_v2_s --device cpu --model_type onnx --batch_size $batch_size + +### 各导出模型在cpu和gpu上的fps实验: + +实验环境: + +| System | CPU | GPU | RAM | Model | +| :----: | :----: | :----: | :----: | :----: | +| Ubuntu20.04 | i7-12700KF | RTX-3090 | 32G DDR5 6400 | efficientnet_v2_s | + + +#### GPU +| model | Torch FP32 FPS | Torch FP16 FPS | TorchScript FP32 FPS| TorchScript FP16 FPS | ONNX FP32 FPS | ONNX FP16 FPS | TensorRT FP32 FPS | TensorRT FP16 FPS | +| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | +| batch-size 1 | 93.77 | 105.65 | 233.21 | 260.07 | 177.41 | 308.52 | 311.60 | 789.19 | +| batch-size 2 | 94.32 | 108.35 | 208.53 | 253.83 | 166.23 | 258.98 | 275.93 | 713.71 | +| batch-size 4 | 95.98 | 108.31 | 171.99 | 255.05 | 130.43 | 190.03 | 212.75 | 573.88 | +| batch-size 8 | 94.03 | 85.76 | 118.79 | 210.58 | 87.65 | 122.31 | 147.36 | 416.71 | +| batch-size 16 | 61.93 | 76.25 | 75.45 | 125.05 | 50.33 | 69.01 | 87.25 | 260.94 | +| batch-size 32 | 34.56 | 58.11 | 41.93 | 72.29 | 26.91 | 34.46 | 48.54 | 151.35 | +| batch-size 64 | 18.64 | 31.57 | 23.15 | 38.90 | 12.67 | 15.90 | 26.19 | 85.47 | + +#### CPU +| model | Torch FP32 FPS | Torch FP16 FPS | TorchScript FP32 FPS| TorchScript FP16 FPS | ONNX FP32 FPS | ONNX FP16 FPS | TensorRT FP32 FPS | TensorRT FP16 FPS | +| :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | :----: | +| batch-size 1 | 27.91 | Not Support | 46.10 | Not Support | 79.27 | Not Support | Not Support | Not Support | +| batch-size 1 | 25.26 | Not Support | 24.98 | Not Support | 45.62 | Not Support | Not Support | Not Support | +| batch-size 4 | 14.02 | Not Support | 13.84 | Not Support | 23.90 | Not Support | Not Support | Not Support | +| batch-size 8 | 7.53 | Not Support | 7.35 | Not Support | 12.01 | Not Support | Not Support | Not Support | +| batch-size 16 | 3.07 | Not Support | 3.64 | Not Support | 5.72 | Not Support | Not Support | Not Support | \ No newline at end of file