-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible Error Leading to Inaccurate Evaluation Outcomes #54
Comments
The question is included in the user prompt. Please let us know if you’re able to get better results with other prompts. |
I'm aware, I just thought your intention was to include part of the assistant's response with the question repeated, if that is possible, and for the LLM to continue prediction from there. At least, that is how I understood it based on the code. Is my understanding incorrect? |
That's a fair hypothesis :) In practice, we found that the Assistant prompt matters very little and is generally ignored by OpenAI in ChatCompletion endpoint. We included it to maintain comparability with the other prompts. If we add the line Having said that – we should just fix it in the code anyway. Have opened a PR here, and we'll merge it on Monday. Thanks for reporting! |
Thank you for taking time to explain! |
Hey there, I'm no expert so maybe I'm mistaken (So please verify twice :P) but I think the way you evaluate openai is not the way you intended. I tried to replicate the evaluation and I came across this issue:
This is the prompt
https://github.com/defog-ai/sql-eval/blob/main/prompts/prompt_openai.md
But
https://github.com/defog-ai/sql-eval/blob/024ff013d02d5fac248fb56b99279cdb16d70aa0/query_generators/openai.py#L123C4-L123C4
When changing the prompt for each question, you call .format on the text of the user prompt only leaving the assistant prompt part untouched. And then you call the generate function but your assistant_prompt part is:
Without any user_question parameter. Meaning that this^ is exactly what the model gets as input, which changes the input, which can change the results, which can change the outcome and insights.
@rishsriv @wongjingping
The text was updated successfully, but these errors were encountered: