-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistency between the instruction template suggested vs that in the training data #96
Comments
Wondering if I'm missing a detail. Did anyone else also come across this? |
Thanks a lot for your interest in the INSTRUCTOR! Like other LLMs, the INSTRUCTOR is sensitive to the instructions, which may be worsened by its small size. I would say all of your proposed instructions follow the basic templates, while we may need more trials or heuristics to figure out the best instruction. |
Thank you. I had a few follow up questions
|
Following back on this. |
Sorry for the late reply! In our training and evaluation, we may not be very strict on punctuation. We are glad to make it more consistent in our future versions! |
I noticed that the instructions in the training data end with
;
and no whitespace after that.For example
'Represent the Science sentence;'
instead of'Represent the Science sentence: '
Whereas in the readme, the proposed format seems to be
'Represent the Science sentence: '
sometimes and'Represent the Science sentence:'
in other places.All of these three seem to be resulting in different embeddings and hence different similarity numbers. Can you please let us know what is the right instruction template?
The text was updated successfully, but these errors were encountered: