Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ModuleNotFoundError: No module named '__main__.complex_utils'; '__main__' is not a package #14

Open
Dhrubauk opened this issue Jan 3, 2020 · 10 comments

Comments

@Dhrubauk
Copy link

Dhrubauk commented Jan 3, 2020

Hello,
I am getting this error when i ran "butterfly_multiply" module.
ModuleNotFoundError Traceback (most recent call last)
in
4 from torch import nn
5
----> 6 from .complex_utils import complex_mul
7
8 use_extension = True

ModuleNotFoundError: No module named 'main.complex_utils'; 'main' is not a package
Can you please help me solve the problem.
Thanks,
Dhruba

@Dhrubauk Dhrubauk changed the title Hello, Hello, I am getting this error when i ran "butterfly_multiply" module. ModuleNotFoundError Traceback (most recent call last) <ipython-input-6-8e8311ccf4c5> in <module> 4 from torch import nn 5 ----> 6 from .complex_utils import complex_mul 7 8 use_extension = True ModuleNotFoundError: No module named '__main__.complex_utils'; '__main__' is not a package Can you please tell me how to solve this problem. Thanks, Dhruba Jan 3, 2020
@Dhrubauk Dhrubauk changed the title Hello, I am getting this error when i ran "butterfly_multiply" module. ModuleNotFoundError Traceback (most recent call last) <ipython-input-6-8e8311ccf4c5> in <module> 4 from torch import nn 5 ----> 6 from .complex_utils import complex_mul 7 8 use_extension = True ModuleNotFoundError: No module named '__main__.complex_utils'; '__main__' is not a package Can you please tell me how to solve this problem. Thanks, Dhruba ModuleNotFoundError: No module named '__main__.complex_utils'; '__main__' is not a package Jan 3, 2020
@tridao
Copy link
Collaborator

tridao commented Jan 3, 2020

Are you running this from an interactive session (e.g. ipython)? If so, relative import (from .blah import blah) doesn't work. This is an issue with Python import in general (https://realpython.com/absolute-vs-relative-python-imports/).
If you just need to run this file from an interactive session, you can remove the dot (so from complex_utils import complex_mul). But in general we mostly run the training scripts that then call butterfly_multiply.

@Dhrubauk
Copy link
Author

Dhrubauk commented Jan 3, 2020

Hi tridao
thank you for the reply, a quick question may I know, in which part of the code segment the butterfly matrix is generated .
Or do you have any flowchart of slides related to this. Your help in this concern is highly appreciated

Regards
Dhruba

@tridao
Copy link
Collaborator

tridao commented Jan 3, 2020

The butterfly matrix isn't explicitly generated (as that would use O(n^2) space and takes O(n^2) time to multiply by a vector). Instead, we provide a Butterfly class that stores O(n log n) parameters and implements O(n log n) multiplication algorithm (https://github.com/HazyResearch/learning-circuits/blob/master/butterfly/butterfly.py#L76).

@Dhrubauk
Copy link
Author

Dhrubauk commented Jan 6, 2020

In butterfly connection, each node is connected to two other nodes, so why it is not 2n log n? Also can you please tell me where in the code you multiply the weight vector and the input features? and where you update the vector of weights?
Thanks,
Dhruba

@tridao
Copy link
Collaborator

tridao commented Jan 6, 2020

You're right, it's 2n log n parameters. Remember to pass in tied_weight=False, like Butterfly(1024, 1024, tied_weight=False).
The version with tied weights was used in an earlier paper, where we only had 4n parameters.
The O(n log n) multiplication algorithm is implemented in 3 times in 3 different languages:

  1. In Python (pure Pytorch), for ease of understanding: https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/butterfly_multiply.py#L126
    The actual implementation is only 4-5 lines of Python.
  2. In C++, for efficient inference: https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/factor_multiply/factor_multiply.cpp#L463
    with Python interface here: https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/butterfly_multiply.py#L165
  3. In CUDA, for efficient training: https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/factor_multiply/factor_multiply_cuda.cu#L1241
    With the same Python interface as 2.

The weights are stored in the attribute twiddle of the Butterfly class, which is just a Pytorch tensor. To update it, you pass it to a Pytorch optimizer just like any other trainable weight.

@tridao
Copy link
Collaborator

tridao commented Jan 8, 2020

If you're asking about how to compute the gradient wrt to the weight:

  1. In the pure Pytorch implementation, we rely on Pytorch's autodifferentiation to compute gradients.
  2. In the C++ and CUDA implementations, we compute the gradient explicitly, and provide Python interface. If you're just using the Butterfly class, you do not need to know how backward is implemented.
    In any case, the algorithm is simple: treat the butterfly matrix as a product of log_n matrices, which you can think of as log_n layers. Then run back-propagation/autodiff on it. This amounts to a few for loops. The core implementation is about 20 lines of C++ (https://github.com/HazyResearch/learning-circuits/blob/cnn/butterfly/factor_multiply/factor_multiply.cpp#L733).

@Dhrubauk
Copy link
Author

Dhrubauk commented Jan 9, 2020

Okay great. I am kind of confused, how did you calculate loss or which loss function you actually used?

@tridao
Copy link
Collaborator

tridao commented Jan 9, 2020

It can be used with any differentiable loss function.
In back-propagration/reverse mode autodiff, for each layer, we are given the incoming error/grad from layers above, and use that to compute gradient wrt to the parameters of the current layer.
That's exactly the interface of a custom Pytorch layer (https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html): in the backward, given the gradient wrt to the output of the layer, compute the gradient wrt to the parameters of the layer.

@Dhrubauk
Copy link
Author

Hi tridao,
I use windows OS and jupyter notebook. i am having trouble with ray. Can i run your code with any modification? or do i have to change my OS?

@tridao
Copy link
Collaborator

tridao commented Jan 14, 2020

We only use ray to run the experiments (parallel runs and hyperparameter optimization).
The butterfly class (in the butterfly directory) doesn't require ray.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants