-
Notifications
You must be signed in to change notification settings - Fork 521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix in src/autotrain/trainers/clm/utils.py in function #836
base: main
Are you sure you want to change the base?
Conversation
This is my first-ever open source contribution, and I’m really excited to be part of this project! 🎉 I’d love to hear your feedback or suggestions for improvement on this PR. Please let me know if there’s anything I can refine or do differently. Looking forward to learning from your insights! Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your idea is work.
But it no longer takes the argument of the valid_data
Strictly.
Is that the actual need?
Hi @Ruhaan838 , Thank you for your feedback! The reason for this fix was because: If config.valid_split is not None, but valid_data is None, the current implementation would raise an error when attempting to call .map on valid_data. Making valid_data optional ensures the function gracefully handles such cases without breaking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#836 Is Ready for merge
I think he is right.
also, it's allows skipping the valid_data
to be Optional
sorry for my late response on this. as mentioned in several past issues, we dont want to have option for validation in llm finetuning. And if we do, this PR doesnt cover it. I suggest you to take a look at the codebase and provide changes for all places where valid data can be used instead in order to proceed with the PR. you should also provide example test runs with different llm tasks :) |
Oh, I understand you're asking if we need to cover all the existing functions or classes to accomplish this specific task and maintain the codebase effectively. Therefore, please make the valid_data argument optional everywhere. |
Refactor
process_data_with_chat_template
to Improve Validation Data HandlingDescription:
This pull request refactors the
process_data_with_chat_template
function to improve clarity and handling of the optionalvalid_data
argument. Specifically, it ensures thatvalid_data
can be passed asNone
without requiring manual initialization within the function.Key Changes:
Optional Argument Handling:
valid_data
argument to be optional (valid_data=None
) in the function definition.valid_data
toNone
within the function body.Improved Readability:
valid_data
.valid_data
is not provided.Unchanged Logic:
train_data
andvalid_data
if specified in the configuration.