Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Net-to-CoreML Conversion Script #222

Closed

Conversation

ChinChangYang
Copy link

This commit introduces the net_to_coreml.py script in the tf/ directory. The script facilitates the conversion of a neural network file into a TensorFlow model, followed by its transformation into a CoreML model. This process mirrors the TensorFlow model conversion methodology used in net_to_model.py.

Key features of the CoreML conversion include:

  • Setting the input shape to (1, 112, 8, 8).
  • Defining input_planes as the input name.
  • Specifying output names as output_policy, output_value, and output_moves_left.
  • Assigning a concise description to the model, formatted as Lc0 converted from {net name}. The script concludes by saving the CoreML model as {net name}.mlpackage. This enhancement enables the conversion of neural networks into CoreML models, which can be executed using Apple's Neural Engine. Future development of the CoreML backend is planned within the lc0 repository.

Test 1: 128x10 (PASS)

% python net_to_coreml.py --cfg 128x10.yaml-20210723-1032 weights_run2_744706.lc0          
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test: dev2/test/
  input_train: dev2/train/
  input_validation: dev2/validate/
  num_chunks: 1000000
  train_ratio: 0.9
gpu: 0
model:
  filters: 128
  residual_blocks: 10
  se_ratio: 4
name: 128x10-t74
training:
  batch_size: 1024
  lr_boundaries:
  - 120
  lr_values:
  - 4.0e-05
  - 4.0e-05
  mask_legal_moves: true
  max_grad_norm: 5.4
  moves_left_loss_weight: 1.0
  num_batch_splits: 1
  num_test_positions: 40000
  path: dev2/networks
  policy_loss_weight: 1.0
  q_ratio: 0
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 2000
  train_avg_report_steps: 200
  validation_steps: 500
  value_focus_min: 1.0
  value_focus_slope: 0.0
  value_loss_weight: 2.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
Wrote model to dev2/networks/128x10-t74/128x10-t74-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 20.80 passes/s]
Converting TF Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████| 413/413 [00:00<00:00, 11943.65 ops/s]
Running MIL frontend_tensorflow2 pipeline: 100%|█████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 1628.68 passes/s]
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:00<00:00, 86.30 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 1404.81 passes/s]
Input names: ['input_planes']
Output names: ['output_policy', 'output_value', 'output_moves_left']
Rebuilding model with updated spec ...
Saving model ...
CoreML model saved at dev2/networks/128x10-t74/weights_run2_744706.lc0.mlpackage

Test 2: 512x19 (FAILED)

% python net_to_coreml.py --cfg 512x19-t80.yaml-20230507-0216 512x19-t81-swa-10061000.pb.gz
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test:
  - dev1/test/
  input_train:
  - dev1/train/
  input_validation: dev1/validate/
  num_chunks: 3000000
  test_workers: 8
  train_ratio: 0.9
  train_workers: 32
gpu: 0
model:
  default_activation: mish
  filters: 512
  pol_encoder_layers: 0
  policy: attention
  residual_blocks: 19
  se_ratio: 16
name: 512x19-t80
training:
  batch_size: 1024
  checkpoint_steps: 4000
  diff_focus_min: 0.025
  diff_focus_slope: 3.0
  lookahead_optimizer: true
  lr_boundaries:
  - 100
  lr_values:
  - 0.0004
  - 0.0004
  mask_legal_moves: true
  max_grad_norm: 4.0
  moves_left_loss_weight: 1.0
  num_batch_splits: 2
  num_test_positions: 40000
  path: dev1/networks
  policy_loss_weight: 1.0
  q_ratio: 0.0
  reg_term_weight: 0.05
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 500
  train_avg_report_steps: 200
  validation_steps: 500
  value_loss_weight: 1.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Wrote model to dev1/networks/512x19-t80/512x19-t80-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  7.15 passes/s]
Converting TF Frontend ==> MIL Ops:  97%|███████████████████████████████████████████████████████████████████████████████████▍  | 960/989 [00:00<00:00, 11277.54 ops/s]
Traceback (most recent call last):
  File "/Users/chinchangyang/Code/lczero-training-ccy/tf/net_to_coreml.py", line 50, in <module>
    coreml_model = ct.convert(
                   ^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/_converters_entry.py", line 574, in convert
    mlmodel = mil_convert(
              ^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 188, in mil_convert
    return _mil_convert(model, convert_from, convert_to, ConverterRegistry, MLModel, compute_units, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 212, in _mil_convert
    proto, mil_program = mil_convert_to_proto(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 286, in mil_convert_to_proto
    prog = frontend_converter(model, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/converter.py", line 98, in __call__
    return tf2_loader.load()
           ^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/load.py", line 82, in load
    program = self._program_from_tf_ssa()
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow2/load.py", line 210, in _program_from_tf_ssa
    return converter.convert()
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 522, in convert
    self.convert_main_graph(prog, graph)
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/converter.py", line 421, in convert_main_graph
    outputs = convert_graph(self.context, graph, self.output_names)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/convert_utils.py", line 191, in convert_graph
    add_op(context, node)
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/frontend/tensorflow/ops.py", line 1332, in RealDiv
    y = mb.cast(x=context[node.inputs[1]], dtype="fp32")
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/registry.py", line 182, in add_op
    return cls._add_op(op_cls_to_add, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/builder.py", line 184, in _add_op
    new_op.type_value_inference()
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 260, in type_value_inference
    output_vals = self._auto_val(output_types)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 377, in _auto_val
    vals = self.value_inference()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/operation.py", line 111, in wrapper
    return func(self)
           ^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 868, in value_inference
    return self.get_cast_value(self.x, self.dtype.val)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py", line 894, in get_cast_value
    return input_var.val.astype(dtype=string_to_nptype(dtype_val))
           ^^^^^^^^^^^^^^^^^^^^
AttributeError: 'float' object has no attribute 'astype'

The error message is similar with this issue. apple/coremltools#1768

This commit introduces the `net_to_coreml.py` script in the `tf/` directory. The script facilitates the conversion of a neural network file into a TensorFlow model, followed by its transformation into a CoreML model. This process mirrors the TensorFlow model conversion methodology used in `net_to_model.py`.

Key features of the CoreML conversion include:

- Setting the input shape to (1, 112, 8, 8).
- Defining `input_planes` as the input name.
- Specifying output names as `output_policy`, `output_value`, and `output_moves_left`.
- Assigning a concise description to the model, formatted as `Lc0 converted from {net name}`.
The script concludes by saving the CoreML model as `{net name}.mlpackage`. This enhancement enables the conversion of neural networks into CoreML models, which can be executed using Apple's Neural Engine. Future development of the CoreML backend is planned within the `lc0` repository.
@ChinChangYang
Copy link
Author

Regarding with the AttributeError, it can be fixed by the following diff for the coremltools source code:

% git diff --cached coremltools
diff --git a/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py b/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
index c5ebc40..fb6902f 100644
--- a/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
+++ b/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py
@@ -890,7 +890,7 @@ class cast(Operation):
                 return np.array(result)
             return None
 
-        if not types.is_tensor(input_var.sym_type):
-            return input_var.val.astype(dtype=string_to_nptype(dtype_val))
-        else:
+        if isinstance(input_var.val, float) or types.is_tensor(input_var.sym_type):
             return np.array(input_var.val).astype(dtype=string_to_nptype(dtype_val))
+        else:
+            return input_var.val.astype(dtype=string_to_nptype(dtype_val))

I am running coremltools test suites. I will create a pull request in coremltools GitHub repository. If the pull request is accepted, hopefully a new coremltools release includes this fix.

@ChinChangYang
Copy link
Author

Test 2: 512x19 (PASSED)

% python net_to_coreml.py --cfg 512x19-t80.yaml-20230507-0216 512x19-t81-swa-10061000.pb.gz
TensorFlow version 2.15.0 has not been tested with coremltools. You may run into unexpected errors. TensorFlow 2.12.0 is the most recent version that has been tested.
dataset:
  allow_less_chunks: true
  input_test:
  - dev1/test/
  input_train:
  - dev1/train/
  input_validation: dev1/validate/
  num_chunks: 3000000
  test_workers: 8
  train_ratio: 0.9
  train_workers: 32
gpu: 0
model:
  default_activation: mish
  filters: 512
  pol_encoder_layers: 0
  policy: attention
  residual_blocks: 19
  se_ratio: 16
name: 512x19-t80
training:
  batch_size: 1024
  checkpoint_steps: 4000
  diff_focus_min: 0.025
  diff_focus_slope: 3.0
  lookahead_optimizer: true
  lr_boundaries:
  - 100
  lr_values:
  - 0.0004
  - 0.0004
  mask_legal_moves: true
  max_grad_norm: 4.0
  moves_left_loss_weight: 1.0
  num_batch_splits: 2
  num_test_positions: 40000
  path: dev1/networks
  policy_loss_weight: 1.0
  q_ratio: 0.0
  reg_term_weight: 0.05
  renorm: true
  renorm_max_d: 0.0
  renorm_max_r: 1.0
  shuffle_size: 500000
  swa: true
  swa_max_n: 10
  swa_output: true
  swa_steps: 100
  test_steps: 500
  total_steps: 500
  train_avg_report_steps: 200
  validation_steps: 500
  value_loss_weight: 1.0
  warmup_steps: 1000

[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
/Users/chinchangyang/miniconda3/envs/lczero-training-py3.11/lib/python3.11/site-packages/tensorflow_addons/utils/tfa_eol_msg.py:23: UserWarning: 

TensorFlow Addons (TFA) has ended development and introduction of new features.
TFA has entered a minimal maintenance and release mode until a planned end of life in May 2024.
Please modify downstream libraries to take dependencies from other repositories in our TensorFlow community (e.g. Keras, Keras-CV, and Keras-NLP). 

For more information see: https://github.com/tensorflow/addons/issues/2807 

  warnings.warn(
Wrote model to dev1/networks/512x19-t80/512x19-t80-0
Running TensorFlow Graph Passes: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00,  9.77 passes/s]
Converting TF Frontend ==> MIL Ops: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 987/987 [00:00<00:00, 13041.75 ops/s]
Running MIL frontend_tensorflow2 pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 788.02 passes/s]
Running MIL default pipeline: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 71/71 [00:03<00:00, 20.14 passes/s]
Running MIL backend_mlprogram pipeline: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 805.74 passes/s]
Input names: ['input_planes']
Output names: ['output_policy', 'output_value', 'output_moves_left']
Rebuilding model with updated spec ...
Saving model ...
CoreML model saved at dev1/networks/512x19-t80/512x19-t81-swa-10061000.pb.gz.mlpackage

The AttributeError has been resolved in apple/coremltools#2087.

Modify TFProcess init_net method to accept an argument for including
attention weights in the output, and update construct_net method to
conditionally include attention weights in the outputs list. This
enables flexibility in specifying whether to include attention weights
in the model outputs.
Consolidate net-to-model conversion in a separate function to enhance
modularity and reduce repeated code.
Modify conversion routines to enable rescaling option for Rule50 inputs, ensuring compatibility with client expectations. This paves the way for improved model adaptation.
Enable specifying compute precision (e.g., FLOAT16) for CoreML model conversion, offering increased flexibility in choosing the precision, thus potentially optimizing model performance.
@ChinChangYang
Copy link
Author

Unable to convert the 11248.pb.gz net into a model by net_to_model.py. The issue has been described in #224.

@ChinChangYang
Copy link
Author

After months of exploration, I’ve discovered a more effective solution for converting a net into a Core ML model, rendering this pull request unnecessary.

The Improved Workflow:

  1. Convert the net into an ONNX model using the following command:
    lc0 leela2onnx --onnx2pytorch --input=/path/to/net --output=/path/to/onnx
  2. Convert the ONNX model into a PyTorch model.
  3. Convert the PyTorch model into a Core ML model.

For steps 2 and 3, I’ve created a script to merge the process, which can be found here: https://gist.github.com/ChinChangYang/5d3a9206032842056f2a6a597bc0ea04

I’m closing this pull request in favor of this superior approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant