Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading Caffe models #52

Open
flukeskywalker opened this issue Oct 8, 2015 · 6 comments
Open

Loading Caffe models #52

flukeskywalker opened this issue Oct 8, 2015 · 6 comments

Comments

@flukeskywalker
Copy link
Collaborator

Since some papers have made available pre-trained Caffe convnets, it'd be nice to be able to use them in Brainstorm.

@flukeskywalker
Copy link
Collaborator Author

This requires conversion from models for NCHW format (Caffe) to those for NHWC (Brainstorm), so it's not straightforward, but should still be possible.

@pranv
Copy link

pranv commented Oct 29, 2015

I have some experience with this - I started with this same goal for Keras that took many turns resulting in things like the Graph model, but this hasn't been merged yet due to an issue. I'll try this over the weekend along with the keras part. As you've said, it's slightly tricky. I've now understood that you need to rotate the Kernels 90 degrees TWICE.

Meanwhile, if I could hijack this issue, is there any design document that explains some of the design choices you made? Just to get a better understanding of your goals.

@flukeskywalker
Copy link
Collaborator Author

Cool, looking forward to it! NHWC layout makes things like this a bit trickier, but we think it's the better format for the long run. Plus, cuDNN v4 will fully support it soon :)

We will indeed provide details about the design choices in brainstorm soon (beginning next week). If you get curious in the meantime, you may ask us questions on the mailing list.

@pranv
Copy link

pranv commented Nov 3, 2015

I could complete code for keras/theano conversion [PR: #921 on keras repo]. Most of it can be reused here, though I would like to know what would be best. The code I wrote there is really generic, takes in any Caffe Network and converts it to a equivalent DAG and then loads the weights. There is a lot of tiny things that are taken care of, for this to happen, making the process really complicated and cumbersome to follow. Unlike my approach, Chainer devs decided to just support available BVLC models (it could work for OxfordNet though), which are are simple sequential models. This reduces the size of code by half, and makes it a lot more easier and quicker for someone who wants to understand what is going on.

What would you guys prefer to have?

@flukeskywalker
Copy link
Collaborator Author

Does Keras also use NHWC?

We'd like to have a more general approach (full DAG). It's fine to start with handling simpler cases, with extensibility in mind.

Brainstorm also works with DAGs. The difference in connecting layers (compared to Caffe) is that every layer in Brainstorm uses inputs and outputs with fixed names (except the Input layer).

Side note: We are working on explaining the design in the docs branch. See the section Internals.

@pranv
Copy link

pranv commented Nov 4, 2015

Theano uses NCHW (bc01) layout.
np.swapaxes() should help in conversion I think :)

Thanks for the docs, things are making more sense now..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants