Loading Caffe models #52

flukeskywalker · 2015-10-08T11:02:23Z

Since some papers have made available pre-trained Caffe convnets, it'd be nice to be able to use them in Brainstorm.

flukeskywalker · 2015-10-25T19:14:17Z

This requires conversion from models for NCHW format (Caffe) to those for NHWC (Brainstorm), so it's not straightforward, but should still be possible.

pranv · 2015-10-29T19:07:05Z

I have some experience with this - I started with this same goal for Keras that took many turns resulting in things like the Graph model, but this hasn't been merged yet due to an issue. I'll try this over the weekend along with the keras part. As you've said, it's slightly tricky. I've now understood that you need to rotate the Kernels 90 degrees TWICE.

Meanwhile, if I could hijack this issue, is there any design document that explains some of the design choices you made? Just to get a better understanding of your goals.

flukeskywalker · 2015-10-29T20:24:53Z

Cool, looking forward to it! NHWC layout makes things like this a bit trickier, but we think it's the better format for the long run. Plus, cuDNN v4 will fully support it soon :)

We will indeed provide details about the design choices in brainstorm soon (beginning next week). If you get curious in the meantime, you may ask us questions on the mailing list.

pranv · 2015-11-03T17:45:07Z

I could complete code for keras/theano conversion [PR: #921 on keras repo]. Most of it can be reused here, though I would like to know what would be best. The code I wrote there is really generic, takes in any Caffe Network and converts it to a equivalent DAG and then loads the weights. There is a lot of tiny things that are taken care of, for this to happen, making the process really complicated and cumbersome to follow. Unlike my approach, Chainer devs decided to just support available BVLC models (it could work for OxfordNet though), which are are simple sequential models. This reduces the size of code by half, and makes it a lot more easier and quicker for someone who wants to understand what is going on.

What would you guys prefer to have?

flukeskywalker · 2015-11-03T18:54:24Z

Does Keras also use NHWC?

We'd like to have a more general approach (full DAG). It's fine to start with handling simpler cases, with extensibility in mind.

Brainstorm also works with DAGs. The difference in connecting layers (compared to Caffe) is that every layer in Brainstorm uses inputs and outputs with fixed names (except the Input layer).

Side note: We are working on explaining the design in the docs branch. See the section Internals.

pranv · 2015-11-04T01:51:11Z

Theano uses NCHW (bc01) layout.
np.swapaxes() should help in conversion I think :)

Thanks for the docs, things are making more sense now..

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loading Caffe models #52

Loading Caffe models #52

flukeskywalker commented Oct 8, 2015

flukeskywalker commented Oct 25, 2015

pranv commented Oct 29, 2015

flukeskywalker commented Oct 29, 2015

pranv commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

pranv commented Nov 4, 2015

Loading Caffe models #52

Loading Caffe models #52

Comments

flukeskywalker commented Oct 8, 2015

flukeskywalker commented Oct 25, 2015

pranv commented Oct 29, 2015

flukeskywalker commented Oct 29, 2015

pranv commented Nov 3, 2015

flukeskywalker commented Nov 3, 2015

pranv commented Nov 4, 2015