Multiple albitration / steering presets? #20

Skorchekd · 2024-05-30T21:12:21Z

perhaps could make an idea where there are configs that could steer the model towards certain things.. for example different personalitys different emotions etc preset into the code?.. just an idea i had... very cool though!

tretomaszewski · 2024-05-31T18:50:30Z

You can find a notebook for a non-refusal use-case here:
https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb

Of course, you'll need to adjust to your needs.

The "refusal" / "harmful" / "harmless" terminology in this library can be seen as whatever behaviors you want to ablate. That is, you want to achieve non-"refusal" responses to the whatever you decide is a "harmful" prompt, but "refusal" is simply what you don't want to see given a prompt. This would require two datasets of polarized/opposite prompts.

Alternatively, as shown in the notebook above, you can use also use special system prompt (see notebook).

Eventually we hope to change the terminology towards a general behavioral-ablation use-case.

Most of this is still very exploratory and, at best, experimental.
If you find anything of interest, let us know!

Skorchekd · 2024-06-03T01:44:00Z

You can find a notebook for a non-refusal use-case here: https://huggingface.co/failspy/Llama-3-8B-Instruct-MopeyMule/blob/main/MopeyMule-Induce-Melancholy.ipynb

Of course, you'll need to adjust to your needs.

The "refusal" / "harmful" / "harmless" terminology in this library can be seen as whatever behaviors you want to ablate. That is, you want to achieve non-"refusal" responses to the whatever you decide is a "harmful" prompt, but "refusal" is simply what you don't want to see given a prompt. This would require two datasets of polarized/opposite prompts.

Alternatively, as shown in the notebook above, you can use also use special system prompt (see notebook).

Eventually we hope to change the terminology towards a general behavioral-ablation use-case.

Most of this is still very exploratory and, at best, experimental. If you find anything of interest, let us know!

doesnt work.... does it need a gpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple albitration / steering presets? #20

Multiple albitration / steering presets? #20

Skorchekd commented May 30, 2024

tretomaszewski commented May 31, 2024

Skorchekd commented Jun 3, 2024

Multiple albitration / steering presets? #20

Multiple albitration / steering presets? #20

Comments

Skorchekd commented May 30, 2024

tretomaszewski commented May 31, 2024

Skorchekd commented Jun 3, 2024