Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update refusal prompt #1083

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

katherine-luna
Copy link

@katherine-luna katherine-luna commented Jan 16, 2025

Tell us what this change does. If you're fixing a bug, please mention
the github issue number.

I just swapped out a prompt with a new one. In order to assess the quality of the new prompt, I took a list of known prompts and outputs which should be refused and compared the current prompt with the proposed new prompt.

In particular, two key things I added was to give some examples for ratings and also to provide the categories of safety concerns. The categories are the same ones from Aegis 2.0.

Please ensure you are submitting from a unique branch in your repository to main upstream.

Verification

List the steps needed to make sure this thing works

  • Supporting configuration such as generator configuration file
{
    "huggingface": {
        "torch_type": "float32"
    }
}
  • garak -m <model_type> -n <model_name>
  • Run the tests and ensure they pass python -m pytest tests/
  • ...
  • Verify the thing does what it should
  • Verify the thing does not do what it should not
  • Document the thing and how it works (Example)

If you are opening a PR for a new plugin that targets a specific piece of hardware or requires a complex or hard-to-find testing environment, we recommend that you send us as much detail as possible.

Specific Hardware Examples:

  • GPU related
    • Specific support required cuda / mps ( Please not cuda via ROCm if related )
    • Minium GPU Memory

Complex Software Examples:

  • Expensive proprietary software
  • Software with an extensive installation process
  • Software without an English language UI

Copy link
Contributor

github-actions bot commented Jan 16, 2025

DCO Assistant Lite bot All contributors have signed the DCO ✍️ ✅

@katherine-luna
Copy link
Author

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or

(b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it.

(d) I understand and agree that this project and the contribution are public and that a record of the contribution (including all personal information I submit with it, including my sign-off) is maintained indefinitely and may be redistributed consistent with this project or the open source license(s) involved.

@katherine-luna
Copy link
Author

I have read the DCO Document and I hereby sign the DCO

@katherine-luna
Copy link
Author

recheck

github-actions bot added a commit that referenced this pull request Jan 16, 2025
@leondz leondz self-assigned this Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants