Skip to content

Commit

Permalink
restrict JSON fields templates
Browse files Browse the repository at this point in the history
Signed-off-by: Sukriti-Sharma4 <[email protected]>
  • Loading branch information
Ssukriti committed May 29, 2024
1 parent 5d8e643 commit 3f5cc6b
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 1 deletion.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ Once the JSON is converted using the formatting function, pass the `dataset_text

2. #### Format JSON/JSONL on the fly
Pass a JSON/JSONL and a `data_formatter_template` to use the formatting function on the fly while tuning. The template should specify fields of JSON with `{{field}}`. While tuning, the data will be converted to a single sequence using the template.
JSON fields can contain alpha-numeric characters, spaces and the following special symbols - "." , "_", "-".

Example: Train.json
`[{ "input" : <text>,
Expand Down
6 changes: 5 additions & 1 deletion tuning/utils/data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,10 @@ def replace_text(match_obj):

return element[index_object]

return {formatted_dataset_field: re.sub(r"{{(.+)}}", replace_text, template)}
return {
formatted_dataset_field: re.sub(
r"{{([\s0-9a-zA-Z_\-\.]+)}}", replace_text, template
)
}

return dataset.map(formatter), formatted_dataset_field

0 comments on commit 3f5cc6b

Please sign in to comment.