You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Assuming that you are solving for per-datum perturbations, and not a broadcasted (or uniform) perturbation, then the loss-aggregation performed prior to backprop should be sum, and not mean. Using mean, the gradient of each perturbation in the batch is scaled by the inverse batch size, whereas the perturbation's gradient should be independent of batch size. Obviously, this does not effect methods where the gradient is normalized.
The text was updated successfully, but these errors were encountered:
mean = sum / N, and thus partial mean / partial input = (1/N) partial sum / partial input.
As PGD use the sign of gradient, we have sign(partial mean / partial input) = sign((1/N) partial sum / ..) = sign(partial sum/..). So mean lead to the same result as sum.
Right, as I stated "obviously, this does not effect methods where the gradient is normalized." The point is that this happens to not affect methods like FGSM because of the signed gradient, but other methods would yield the incorrect behavior.
robustness/robustness/attacker.py
Line 195 in a954124
Assuming that you are solving for per-datum perturbations, and not a broadcasted (or uniform) perturbation, then the loss-aggregation performed prior to backprop should be
sum
, and notmean
. Usingmean
, the gradient of each perturbation in the batch is scaled by the inverse batch size, whereas the perturbation's gradient should be independent of batch size. Obviously, this does not effect methods where the gradient is normalized.The text was updated successfully, but these errors were encountered: