Question about the Squaring of the Hessian Diagonal in SparseGPT #43

xiejingcheng · 2024-12-29T13:11:09Z

I have a question regarding the code and pseudocode in your paper "SparseGPT: Massive Language Models Can be Accurately Pruned in One-Shot".

Specifically, in line 96 of sparsegpt.py, the formula tmp = W1 ** 2 / (torch.diag(Hinv1).reshape((1, -1)) ** 2) squares the diagonal elements of the Hessian matrix. Similarly, in line 8 of the pseudocode (Algorithm 1) in the paper, the diagonal elements of the Hessian matrix are also squared.

However, in the original formula (e.g., Equation 3), the diagonal elements of the Hessian matrix are not squared. Could you kindly clarify why squaring the diagonal elements is necessary in the implementation and pseudocode?

Thank you for your time, and I appreciate the important work you've done with SparseGPT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

xiejingcheng commented Dec 29, 2024

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

Question about the Squaring of the Hessian Diagonal in SparseGPT #43

Comments

xiejingcheng commented Dec 29, 2024