Implementing new nodes #30

fishcu · 2023-03-03T11:05:17Z

fishcu
Mar 3, 2023

Hi,

I am trying to convert the network powering KataGo (https://github.com/lightvector/KataGo), a computer Go implementation, to C code, so that I can have a extremely lightweight Go engine for embedded platforms. I am using an older, smaller network, and I have successfully modified https://github.com/isty2e/KataGoONNX to produce an ONNX file for one of the older and smaller models. The final model size is about 4.3 MB (attached).

network.zip

When trying to run this network through onnx2c, I run into the issue that the "ReduceSum" node is not implemented.
I would like to contribute this implementation (and potentially others missing -- I cannot tell yet which are required). However, I am still a bit unsure about how to best proceed despite the very helpful page here: https://github.com/kraiskil/onnx2c/blob/master/development.md .

While I am an experienced programmer, I have very little experience with ML frameworks in general.

A reduction sum operation is generally not too difficult, especially in a sequential setting. However, how would I make sure that the onnx2c implementation is 100% correct and true to the original? Is it a good idea to find the ONNX runtime implementation and convert that to the C-code emitting node code?
How do I check which attributes need to be parsed? As referenced in parseAttributes here: https://github.com/kraiskil/onnx2c/blob/master/src/nodes/TEMPLATE
I am completely lost when it comes to adding a test. I still have to look into existing tests for inspiration. I see that Google Benchmark is the recommended approach. Does this generally involve producing a small ONNX network that uses the new node type and comparing output against known ground truth?

Thanks a lot!

Answered by kraiskil

Mar 3, 2023

Hi fishcu,

First, the quantization in onnx2c was done at a time when that general idea was just transiting from academic papers to available tools. At the time, it was simpler to write my own quantization than try to get others working. I haven't looked into quantization since I wrote that MNIST-AVR demo, but I am under the impression quantization is much more user friendly in the high level ML frameworks today. So I would guess the best way of doing quantization with onnx2c today is to compile an already quantized .onnx file.

And to your other questions :)

1

I never looked into the onnxruntime implementation for a reference. The two reasons for this was that I wanted to get an understand…

View full answer

fishcu · 2023-03-03T11:18:26Z

fishcu
Mar 3, 2023
Author

BTW, I am also looking into quantization or even binarization (i.e., multiplies -> xors), but that is still a bit farther away. From what I've gathered so far, it seems like a plausible approach would be to do the quantization directly in torch or ONNX, and then use onnx2c on the quantized version.

0 replies

kraiskil · 2023-03-03T18:29:21Z

kraiskil
Mar 3, 2023
Maintainer

Hi fishcu,

First, the quantization in onnx2c was done at a time when that general idea was just transiting from academic papers to available tools. At the time, it was simpler to write my own quantization than try to get others working. I haven't looked into quantization since I wrote that MNIST-AVR demo, but I am under the impression quantization is much more user friendly in the high level ML frameworks today. So I would guess the best way of doing quantization with onnx2c today is to compile an already quantized .onnx file.

And to your other questions :)

1

I never looked into the onnxruntime implementation for a reference. The two reasons for this was that I wanted to get an understanding of what is happening in the nodes, but also because different implementations are very much tied to the frameworks they are implemented in - i.e. porting is as much effort as writing from scratch.

All of onnx2c implementaion is written against the ONNX documentation https://github.com/onnx/onnx/blob/main/docs/Operators.md and the backend unit tests https://github.com/onnx/onnx/tree/main/onnx/backend/test/data (especially the small ones under the node directory there).

This approach is not always the best: the ONNX backend tests are not covering 100% of the specification, and the specification itself is many times a bit too "high level" (case in point: the Conv node, i.e. Convolution, is specified with "The convolution operator consumes an input tensor and a filter, and computes the output." Which is a bit lacking in details...). ReduceSum luckily is defined in more detail :)

2

The attributes that need to be parsed are defined in the Operands document: https://github.com/onnx/onnx/blob/main/docs/Operators.md#ReduceSum

3

onnx2c has several layers of testing. Benchmarking is something added fairly recently, and still a bit work in progress.

The main testing is the ONNX backend nodes as unit tests. These tests consists of a network as an .onnx file, and all inputs and outputs as protobuf stored files.

To use these backend tests, it suffices to add the ONNX tests for ReduceSum as ONNX_backend_node_test lines into onnx2c's unit test build file https://github.com/kraiskil/onnx2c/blob/master/test/CMakeLists.txt CMake will then compile the unit tests using a specific tool (https://github.com/kraiskil/onnx2c/blob/master/test/onnx_backend_tests_generator.cc). The generated code runs the network and checks the output. Each backend node test becomes its own, stand alone binary.

If your ReduceSum implementation passes all the ONNX backend tests for this operand, that is good enough (for me) to merge in.

If the ONNX backend unit tests are very few or lack some particularly useful corner cases, there are some local (to onnx2c) unit tests that I've generated using scailableonnx (python files for this can be found somewhere under the test/ directory). The generated tests are in the same format as the ONNX backend tests, so they integrate nicely to onnx2c's CMake scripts.

I'm leaving it up to you to decide if extra tests are needed for ReduceSum. Writing my own were needed with LSTM and Conv operands. But those are the most complex operands anyway.

end notes

This answer became a bit long... Hopefully it answers your questions :)

One thing to be aware of is, that onnx2c stops instantly if when it discovers an unimplemented operand. I.e. ReduceSum might not be the only one missing...

But looking forward to your contributions :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implementing new nodes #30

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Implementing new nodes #30

fishcu Mar 3, 2023

1

Replies: 2 comments

fishcu Mar 3, 2023 Author

kraiskil Mar 3, 2023 Maintainer

1

2

3

end notes

fishcu
Mar 3, 2023

fishcu
Mar 3, 2023
Author

kraiskil
Mar 3, 2023
Maintainer