Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extra label column #3014

Open
severo opened this issue Aug 2, 2024 · 0 comments
Open

Remove extra label column #3014

severo opened this issue Aug 2, 2024 · 0 comments
Labels
blocked-by-upstream The issue must be fixed in a dependency bug Something isn't working P2 Nice to have

Comments

@severo
Copy link
Collaborator

severo commented Aug 2, 2024

In example dataset https://huggingface.co/datasets/datasets-examples/doc-audio-4, we have an "unexpected" label column with only null values.

Capture d’écran 2024-08-02 à 12 33 10

I think it's due to a "collision" between the heuristics that define splits and/or classes based on the directories. There is a drop_labels=True option in the datasets library, if it helps.

Ideally, in this case, we should have two splits (train and test), and no additional label column.

I think the issue also exists with image datasets.

@severo severo changed the title Extra 'label' column Remove extra 'label' column Aug 2, 2024
@severo severo added bug Something isn't working blocked-by-upstream The issue must be fixed in a dependency P2 Nice to have labels Aug 2, 2024
@severo severo changed the title Remove extra 'label' column Remove extra label column Aug 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked-by-upstream The issue must be fixed in a dependency bug Something isn't working P2 Nice to have
Projects
None yet
Development

No branches or pull requests

1 participant