Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback #1

Open
wants to merge 9 commits into
base: feedback
Choose a base branch
from
Open

Feedback #1

wants to merge 9 commits into from

Conversation

github-classroom[bot]
Copy link

@github-classroom github-classroom bot commented Nov 11, 2024

👋! GitHub Classroom created this pull request as a place for your teacher to leave feedback on your work. It will update automatically. Don’t close or merge this pull request, unless you’re instructed to do so by your teacher.
In this pull request, your teacher can leave comments and feedback on your code. Click the Subscribe button to be notified if that happens.
Click the Files changed or Commits tab to see all of the changes pushed to the default branch since the assignment started. Your teacher can see this too.

Notes for teachers

Use this PR to leave feedback. Here are some tips:

  • Click the Files changed tab to see all of the changes pushed to the default branch since the assignment started. To leave comments on specific lines of code, put your cursor over a line of code and click the blue + (plus sign). To learn more about comments, read “Commenting on a pull request”.
  • Click the Commits tab to see the commits pushed to the default branch. Click a commit to see specific changes.
  • If you turned on autograding, then click the Checks tab to see the results.
  • This page is an overview. It shows commits, line comments, and general comments. You can leave a general comment below.
    For more information about this pull request, read “Leaving assignment feedback in GitHub”.

Subscribed: @chris40461 @dhl0929 @Kwon-Jisu @beaver-zip @peter520416

from sklearn.feature_extraction.text import TfidfVectorizer
from datasets import Dataset

def load_and_process_data(data_path):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. 어떤 역할을 하는 함수인지 간단한 doc string이 추가되면 좋을 것 같습니다.
  2. 어떤 데이터가 입력으로 들어오는지 hinting이 있으면 가독성을 높일 수 있습니다.

PEP8등의 규칙을 찾아보시면 좋습니다.



# 시드 고정
set_seed(42)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main 함수로 묶어 주시는 것이 가독성 및 확장성이 높아집니다.



# 난수 고정
def set_seed(random_seed):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

train.py와 중복되는 코드들이 조금 있는 것 같습니다! 분기를 잘 이용하면 두 파일을 하나로 합칠 수 있을 것 같네요

@@ -0,0 +1,22 @@
import json

from peft import LoraConfig

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utils.py로 따로 빼기에는 유틸 함수가 없어보입니다!



# Metric 설정
def preprocess_logits_for_metrics(logits, labels):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

중간 중간에 함수 정의가 나오는 것은 가독성을 해칠 수 있습니다.



# Callback to store metrics after each epoch
class SaveMetricsCallback(TrainerCallback):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class 같은 경우에는 중간에 정의하기 보다, 따로 하나의 script를 만들어 주는 것이 좋습니다.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

쥬피터 노트북은 코드가 제대로 보이지 않아 리뷰가 어렵습니다.

@@ -0,0 +1,13 @@
ipykernel

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiremetns는 사용하고 있는 버전까지 작성되는 것이 좋습니다.



# 시드 고정 함수
def set_seed(seed):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

시드를 고정하는 것도 좋지만,
혹시나 매 epoch의 배치마다 랜덤하게 데이터를 sampling 하는 등의 랜덤성이 필요하다면
seed 고정은 오히려 이 부분을 막아버리게 되니 주의가 필요합니다.

infer_results.append({"id": _id, "answer": predict_value})

# CSV 파일로 결과 저장
output_file_path = os.path.join(config["output_dir"], "output.csv")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dir로 관리되지만, output.csv인 경우 실수로 덮어씌워지는 경우가 많은 것 같습니다.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants