What was the "latest methodology" in the v3s? #27

noisefloordev · 2024-09-05T05:52:24Z

In https://huggingface.co/failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 you mentioned a new methodology, but what changed that made it so much more effective? For a while I've been trying to reproduce this (originally with Llama 3 and now with 3.1, both 8B and 70B). With Llama 3.1 70B I have to edit layers 10 through 40, and it gets less effective as I narrow the range further.

The only way I've been able to get a decent effect from just a single layer is by multiplying the direction by about 1.5 after normalization. You mentioned somewhere that you did something that sounds similar. On Llama 3.1 8B I can get a good result by scaling the direction by 1.5 and applying it just to layer 11. But that only worked for me when hooking activations, I wasn't able to figure out how to bake that to the matrix (just scaling the direction when orthogonalizing didn't work). I haven't tried it with the 70B.

Was I accidentally on the right track with scaling the directions, or was there something else? Nothing else I've tried (layer selection, sampling different tokens, varying and mixing training sets) has worked with fewer than about 7 layers on 8B and 30 layers on 70B.

noisefloordev · 2024-09-25T19:34:51Z

I've tried everything I can think of, and with Llama 3.2 out there's another model I'd really like a good version of. Any info about how you arrived at a single-layer edit would be really helpful.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What was the "latest methodology" in the v3s? #27

What was the "latest methodology" in the v3s? #27

noisefloordev commented Sep 5, 2024

noisefloordev commented Sep 25, 2024

What was the "latest methodology" in the v3s? #27

What was the "latest methodology" in the v3s? #27

Comments

noisefloordev commented Sep 5, 2024

noisefloordev commented Sep 25, 2024