per_sample gradient is None but grad is populated #578

anirban-nath · 2023-03-29T06:32:57Z

I have a particular LayerNorm function in my code because of which I am not able to successfully run Opacus in my code. This LayerNorm function function is defined just like 3 - 4 others in my code and is used in 2 places. When I execute loss.backward(), the grad of the layer function is populated but per_sample grad isn't, which leads Opacus to throw the error "Per sample gradient is not initialized. Not updated in backward pass?"

Under what circumstances is this possible?

PS: This is how the norm is defined

decoder_norm = nn.LayerNorm(d_model) self.decoder = TransformerDecoder(decoder_layer, num_decoder_layers, decoder_norm, return_intermediate=return_intermediate_dec)

This is how it is used. The usages are shown with comments beside them

`class TransformerDecoder(nn.Module):

def __init__(self, decoder_layer, num_layers, norm=None, return_intermediate=False):
    super().__init__()
    self.layers = _get_clones(decoder_layer, num_layers)
    self.num_layers = num_layers
    self.norm = norm // HERE
    self.return_intermediate = return_intermediate

def forward(self, tgt, memory,
            tgt_mask: Optional[Tensor] = None,
            memory_mask: Optional[Tensor] = None,
            tgt_key_padding_mask: Optional[Tensor] = None,
            memory_key_padding_mask: Optional[Tensor] = None,
            pos: Optional[Tensor] = None,
            query_pos: Optional[Tensor] = None):
    output = tgt

    intermediate = []

    for layer in self.layers:
        output = layer(output, memory, tgt_mask=tgt_mask,
                       memory_mask=memory_mask,
                       tgt_key_padding_mask=tgt_key_padding_mask,
                       memory_key_padding_mask=memory_key_padding_mask,
                       pos=pos, query_pos=query_pos)
        # print(output.shape)
        if self.return_intermediate:
            intermediate.append(self.norm(output)) // HERE

    if self.norm is not None:
        output = self.norm(output // HERE
        if self.return_intermediate:
            intermediate.pop()
            intermediate.append(output)`

The text was updated successfully, but these errors were encountered:

alexandresablayrolles · 2023-03-29T09:38:18Z

Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass grad_sample_mode="functorch" to make_private(), which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).

anirban-nath · 2023-03-29T10:08:30Z

Thanks for raising this issue. The reason is that Opacus computes grad_samples using "hooks", so it only works for standard layers. You can pass grad_sample_mode="functorch" to make_private(), which will make Opacus use functorch to automatically compute grad_samples for new layers (it is not guaranteed to work but most of the time it does the job).

Hi. I was using the make_private_with_epsilon function and I tried "functorch" but it did not work.

alexandresablayrolles · 2023-03-29T11:38:47Z

It should also work with make_private_with_epsilon. Do you still have the same error message?

anirban-nath · 2023-03-29T12:04:22Z

It should also work with make_private_with_epsilon. Do you still have the same error message?

Exact same error message. No difference. I tried with both make_private and make_private_with_epsilon. I even tried replacing that LayerNorm with a GroupNorm but none of these have made any difference.

RobRomijnders · 2024-09-26T07:45:56Z

Hi, I have a similar error. Was this issue resolved @anirban-nath ?

HuanyuZhang · 2024-09-29T03:11:02Z

@RobRomijnders feel free to share your code here for us to better help you.

alexandresablayrolles self-assigned this Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

per_sample gradient is None but grad is populated #578

per_sample gradient is None but grad is populated #578

anirban-nath commented Mar 29, 2023 •

edited

Loading

alexandresablayrolles commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

alexandresablayrolles commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

RobRomijnders commented Sep 26, 2024

HuanyuZhang commented Sep 29, 2024 •

edited

Loading

per_sample gradient is None but grad is populated #578

per_sample gradient is None but grad is populated #578

Comments

anirban-nath commented Mar 29, 2023 • edited Loading

alexandresablayrolles commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

alexandresablayrolles commented Mar 29, 2023

anirban-nath commented Mar 29, 2023

RobRomijnders commented Sep 26, 2024

HuanyuZhang commented Sep 29, 2024 • edited Loading

anirban-nath commented Mar 29, 2023 •

edited

Loading

HuanyuZhang commented Sep 29, 2024 •

edited

Loading