Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Datasets Currently Mapped as Tasks.TEXTUAL_ENTAILMENT #682

Closed
jason-fries opened this issue Jun 4, 2022 · 5 comments
Closed

Fix Datasets Currently Mapped as Tasks.TEXTUAL_ENTAILMENT #682

jason-fries opened this issue Jun 4, 2022 · 5 comments
Labels
bug Something isn't working

Comments

@jason-fries
Copy link
Member

Several datasets are currently tagged as supporting Tasks.TEXTUAL_ENTAILMENT incorrectly. Perhaps I'm misunderstanding the tasks, but these largely seem like text classification/labeling problems not entailment.

  • medical_data is a sentiment analysis task not textual entailment. Better suited to pairs or text classification
  • evidence_inference this is labeling a relationship between text snippets i.e., "intervention of interest either significantly increased, significantly decreased or had significant effect on the outcome, relative to the comparator"

Fix is migrating these to the correct schema

@jason-fries jason-fries added the bug Something isn't working label Jun 4, 2022
@shamikbose
Copy link
Contributor

@jason-fries The issue with medical_data is addressed in the conversation in #613

So every text comes with a drug mention and what the text thinks of that specific drug. Putting this in the Classification format loses some of that information

@jason-fries
Copy link
Member Author

@shamikbose thanks for the comment. If a simple classification schema isn’t suited then this should be a text pairs task if it involves reasoning over 2 units of text. The most important issue is that this is not an entailment task.

@shamikbose
Copy link
Contributor

I can tackle the medical_data tomorrow

@shamikbose
Copy link
Contributor

@jason-fries medical_data is updated to bigbio_pairs in #684

@phlobo phlobo mentioned this issue Oct 25, 2024
8 tasks
phlobo added a commit that referenced this issue Oct 26, 2024
* Update medical_data.py

Updated to `bigbio_pairs` schema
Passes all tests

* Update medical_data.py

* refactor: Refactor SAMD dataset implementation to hub-based schema

* fix: Change task for SAMD dataset to TEXT_PAIRS_CLASSIFICATION

* Fixed license

---------

Co-authored-by: Mario Sänger <[email protected]>
Co-authored-by: Florian Borchert <[email protected]>
@phlobo
Copy link
Collaborator

phlobo commented Oct 26, 2024

Both datasets mentioned here now have other tasks assigned, therefore I will close the issue.

@phlobo phlobo closed this as completed Oct 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants