-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
failed in datasets.load_dataset #14
Comments
I suspect it may be related to the format of certain JSONL files. It seems that input_image is expected to be a list of strings, but in some annotation files, it is marked as a single string. some code to check the annotation files # download the dataset first by huggingface_hub.snapshot_download('BleachNick/MIC_full', repo_type='dataset')
from glob import glob
import os.path as osp
from pathlib import Path
import json
data_jsonl_root = Path(r'datasets--BleachNick--MIC_full/snapshots/499162c4f0a3f919f0a417918d71aab51280db84/data_jsonl')
for file in glob(str(Path(data_jsonl_root) / r'**/*.jsonl'), recursive=True):
for line in open(file, 'r'):
obj = json.loads(line)
if not isinstance(obj['input_image'], list):
print(f"{osp.relpath(file, data_jsonl_root)} input_image: {obj['input_image']}")
break and I got this
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thank you for your excellent work! I'd like to express my gratitude for your efforts in contributing to open-source data and models. I encountered a minor issue when loading a dataset from Hugging Face, and when I used the following code
I received the following error message:
Is there any way to resolve this issue or get more information on how to handle it? Your assistance would be greatly appreciated. Thank you!
The text was updated successfully, but these errors were encountered: