Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What was the "latest methodology" in the v3s? #27

Open
noisefloordev opened this issue Sep 5, 2024 · 1 comment
Open

What was the "latest methodology" in the v3s? #27

noisefloordev opened this issue Sep 5, 2024 · 1 comment

Comments

@noisefloordev
Copy link

In https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 you mentioned a new methodology, but what changed that made it so much more effective? For a while I've been trying to reproduce this (originally with Llama 3 and now with 3.1, both 8B and 70B). With Llama 3.1 70B I have to edit layers 10 through 40, and it gets less effective as I narrow the range further.

The only way I've been able to get a decent effect from just a single layer is by multiplying the direction by about 1.5 after normalization. You mentioned somewhere that you did something that sounds similar. On Llama 3.1 8B I can get a good result by scaling the direction by 1.5 and applying it just to layer 11. But that only worked for me when hooking activations, I wasn't able to figure out how to bake that to the matrix (just scaling the direction when orthogonalizing didn't work). I haven't tried it with the 70B.

Was I accidentally on the right track with scaling the directions, or was there something else? Nothing else I've tried (layer selection, sampling different tokens, varying and mixing training sets) has worked with fewer than about 7 layers on 8B and 30 layers on 70B.

@noisefloordev
Copy link
Author

I've tried everything I can think of, and with Llama 3.2 out there's another model I'd really like a good version of. Any info about how you arrived at a single-layer edit would be really helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant