-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback #1
base: feedback
Are you sure you want to change the base?
Feedback #1
Conversation
from sklearn.feature_extraction.text import TfidfVectorizer | ||
from datasets import Dataset | ||
|
||
def load_and_process_data(data_path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 어떤 역할을 하는 함수인지 간단한 doc string이 추가되면 좋을 것 같습니다.
- 어떤 데이터가 입력으로 들어오는지 hinting이 있으면 가독성을 높일 수 있습니다.
PEP8등의 규칙을 찾아보시면 좋습니다.
|
||
|
||
# 시드 고정 | ||
set_seed(42) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main 함수로 묶어 주시는 것이 가독성 및 확장성이 높아집니다.
|
||
|
||
# 난수 고정 | ||
def set_seed(random_seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
train.py와 중복되는 코드들이 조금 있는 것 같습니다! 분기를 잘 이용하면 두 파일을 하나로 합칠 수 있을 것 같네요
@@ -0,0 +1,22 @@ | |||
import json | |||
|
|||
from peft import LoraConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utils.py로 따로 빼기에는 유틸 함수가 없어보입니다!
|
||
|
||
# Metric 설정 | ||
def preprocess_logits_for_metrics(logits, labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
중간 중간에 함수 정의가 나오는 것은 가독성을 해칠 수 있습니다.
|
||
|
||
# Callback to store metrics after each epoch | ||
class SaveMetricsCallback(TrainerCallback): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class 같은 경우에는 중간에 정의하기 보다, 따로 하나의 script를 만들어 주는 것이 좋습니다.
code/make_train_dataset.ipynb
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
쥬피터 노트북은 코드가 제대로 보이지 않아 리뷰가 어렵습니다.
@@ -0,0 +1,13 @@ | |||
ipykernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requiremetns는 사용하고 있는 버전까지 작성되는 것이 좋습니다.
|
||
|
||
# 시드 고정 함수 | ||
def set_seed(seed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
시드를 고정하는 것도 좋지만,
혹시나 매 epoch의 배치마다 랜덤하게 데이터를 sampling 하는 등의 랜덤성이 필요하다면
seed 고정은 오히려 이 부분을 막아버리게 되니 주의가 필요합니다.
infer_results.append({"id": _id, "answer": predict_value}) | ||
|
||
# CSV 파일로 결과 저장 | ||
output_file_path = os.path.join(config["output_dir"], "output.csv") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dir로 관리되지만, output.csv인 경우 실수로 덮어씌워지는 경우가 많은 것 같습니다.
👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.
Notes for teachers
Use this PR to leave feedback. Here are some tips:
For more information about this pull request, read “Leaving assignment feedback in GitHub”.
Subscribed: @chris40461 @dhl0929 @Kwon-Jisu @beaver-zip @peter520416