Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edmond/llsc 27 Data Seeding (Matching) #9

Draft
wants to merge 9 commits into
base: rohan-edmond-mayank/matching-algo
Choose a base branch
from

Conversation

EdmondLi1
Copy link
Collaborator

@EdmondLi1 EdmondLi1 commented Nov 12, 2024

Notion ticket link

LLSC-27 Create DB Seeds

Implementation description

data seeder for the matching algorithm. Make classes orignally for the form (to make it more scalable if needed), and CLI to get the data towards the target data fromat.

NOTE: upserting the data to the DB is not complete yet; labeled todos
NOTE: this isnt being merged to main; instead its being merged to the main matching branch (aka this isnt a main-breaking pr)
NOTE: need to make sure the pdm recognizes these packages added and fast api can boot up

Steps to test

Finalized and tested the seeder. The cmd to run atm are:
cd to the backend dir and then run

python -m backend.matching.data.data_generator volunteer 10 json --file_path ~/Downloads/outputt.test

there is a CLI with approp help cmds (HERE IS FOR FILE PATH SPECIFIED)

python -m backend.matching.data.data_generator [option] [num_records] --file_path [file_path]

if no file path we can do:

python -m backend.matching.data.data_generator [option] [num_records]

What should reviewers focus on?

  • coding style; formatting
  • any bad practicies
  • anything that wasnt pretained into the Notion Ticket

Checklist

  • My PR name is descriptive and in imperative tense
  • My commit messages are descriptive and in imperative tense. My commits are atomic and trivial commits are squashed or fixup'd into non-trivial commits
  • I have run the appropriate linter(s)
  • I have requested a review from the PL, as well as other devs who have background knowledge on this PR or who will be building on top of this PR

@EdmondLi1 EdmondLi1 changed the title Edmond/llsc 27 data seeding (FIX THE DESCRIPTION FOR THE PR) Edmond/llsc 27 Data Seeding (Matching) Nov 21, 2024
Copy link
Collaborator

@mslwang mslwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments for testing.

@@ -1,7 +1,11 @@
import argparse
import sys
from llsc.backend.matching.data.seeder.data_seeder import Seeder
from config import OUTPUT_FORMAT_CHOICES, FILE_PATH_REQUIRED_FORMATS
from backend.matching.data.seeder.data_seeder import Seeder
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to make this backend folder centric, or you can use local file paths .seeder.data_seeder. I can't access this from backend iirc

OUTPUT_FORMAT_CHOICES,
OPTIONS_FOR_DATA,
FILE_PATH_REQUIRED_FORMATS,
)


class CLI:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a pdm script command so we can access faker and pandas packages. You can add this in backend/pyproject.toml under [tool.pdm.scripts] as matching: python3 matching/data/data_generator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants