You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the code and pseudocode in your paper "SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot".
Specifically, in line 96 of sparsegpt.py, the formula tmp = W1 ** 2 / (torch.diag(Hinv1).reshape((1, -1)) ** 2) squares the diagonal elements of the Hessian matrix. Similarly, in line 8 of the pseudocode (Algorithm 1) in the paper, the diagonal elements of the Hessian matrix are also squared.
However, in the original formula (e.g., Equation 3), the diagonal elements of the Hessian matrix are not squared. Could you kindly clarify why squaring the diagonal elements is necessary in the implementation and pseudocode?
Thank you for your time, and I appreciate the important work you've done with SparseGPT.
The text was updated successfully, but these errors were encountered:
I have a question regarding the code and pseudocode in your paper "SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot".
Specifically, in line 96 of sparsegpt.py, the formula tmp = W1 ** 2 / (torch.diag(Hinv1).reshape((1, -1)) ** 2) squares the diagonal elements of the Hessian matrix. Similarly, in line 8 of the pseudocode (Algorithm 1) in the paper, the diagonal elements of the Hessian matrix are also squared.
However, in the original formula (e.g., Equation 3), the diagonal elements of the Hessian matrix are not squared. Could you kindly clarify why squaring the diagonal elements is necessary in the implementation and pseudocode?
Thank you for your time, and I appreciate the important work you've done with SparseGPT.
The text was updated successfully, but these errors were encountered: