Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The loss-aggregation for the attacker should be sum not mean #115

Open
rsokl opened this issue May 2, 2022 · 3 comments
Open

The loss-aggregation for the attacker should be sum not mean #115

rsokl opened this issue May 2, 2022 · 3 comments

Comments

@rsokl
Copy link

rsokl commented May 2, 2022

loss = ch.mean(losses)

Assuming that you are solving for per-datum perturbations, and not a broadcasted (or uniform) perturbation, then the loss-aggregation performed prior to backprop should be sum, and not mean. Using mean, the gradient of each perturbation in the batch is scaled by the inverse batch size, whereas the perturbation's gradient should be independent of batch size. Obviously, this does not effect methods where the gradient is normalized.

@cdluminate
Copy link

mean = sum / N, and thus partial mean / partial input = (1/N) partial sum / partial input.
As PGD use the sign of gradient, we have sign(partial mean / partial input) = sign((1/N) partial sum / ..) = sign(partial sum/..). So mean lead to the same result as sum.

@rsokl
Copy link
Author

rsokl commented May 8, 2022

Right, as I stated "obviously, this does not effect methods where the gradient is normalized." The point is that this happens to not affect methods like FGSM because of the signed gradient, but other methods would yield the incorrect behavior.

@cdluminate
Copy link

Indeed. As long as sign(grad) is not in the update equation, it will trigger weird bugs for people who want to customize new algorithms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants