-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #119 - Add loctext #515
Conversation
Load from URL and parse it via the JSON format into bigbio_kb_schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@napsternxg
This is almost ready to merge - would you mind converting the relation references to the entity name in the relation view? (i.e. T18 -> cell wall?) Otherwise the mapping will not be trivial to construct.
Hi @hakunanatasha thanks. I will finish this and send by early next week. |
Hi @hakunanatasha I have now made the relation arguments map to the entity ID so that we can uniquely resolve them. This is similar to the format used in data["train"]["entities"][0][:5]
data["train"]["relations"][0][:5] Will show the following entities [{'id': '10072396-T1',
'type': 'go',
'text': ['nuclear'],
'offsets': [[46, 53]],
'normalized': [{'db_name': 'go', 'db_id': 'GO:0005634'}]},
{'id': '10072396-T2',
'type': 'go',
'text': ['cytoplasmic'],
'offsets': [[58, 69]],
'normalized': [{'db_name': 'go', 'db_id': 'GO:0005737'}]},
{'id': '10072396-T3',
'type': 'taxonomy',
'text': ['Arabidopsis'],
'offsets': [[86, 97]],
'normalized': [{'db_name': 'taxonomy', 'db_id': '3702'}]},
{'id': '10072396-T4',
'type': 'uniprot',
'text': ['COP1'],
'offsets': [[98, 102]],
'normalized': [{'db_name': 'uniprot', 'db_id': 'P43254'}]},
{'id': '10072396-T5',
'type': 'taxonomy',
'text': ['Arabidopsis'],
'offsets': [[108, 119]],
'normalized': [{'db_name': 'taxonomy', 'db_id': '3702'}]}] And following relations: [{'id': '10072396-R1',
'type': 'localizeTo',
'arg1_id': '10072396-T4',
'arg2_id': '10072396-T2',
'normalized': []},
{'id': '10072396-R10',
'type': 'localizeTo',
'arg1_id': '10072396-T29',
'arg2_id': '10072396-T28',
'normalized': []},
{'id': '10072396-R2',
'type': 'localizeTo',
'arg1_id': '10072396-T4',
'arg2_id': '10072396-T1',
'normalized': []},
{'id': '10072396-R3',
'type': 'localizeTo',
'arg1_id': '10072396-T9',
'arg2_id': '10072396-T11',
'normalized': []},
{'id': '10072396-R4',
'type': 'localizeTo',
'arg1_id': '10072396-T9',
'arg2_id': '10072396-T10',
'normalized': []}] |
@hakunanatasha can you approve the pr i have already addressed the changes. |
Dataset seems no longer available :-( |
Fixes #119
If the following information is NOT present in the issue, please populate:
Checkbox
biodatasets/my_dataset/my_dataset.py
(please use only lowercase and underscore for dataset naming)._CITATION
,_DATASETNAME
,_DESCRIPTION
,_HOMEPAGE
,_LICENSE
,_URLs
,_SUPPORTED_TASKS
,_SOURCE_VERSION
, and_BIGBIO_VERSION
variables._info()
,_split_generators()
and_generate_examples()
in dataloader script.BUILDER_CONFIGS
class attribute is a list with at least oneBigBioConfig
for the source schema and one for a bigbio schema.datasets.load_dataset
function.python -m tests.test_bigbio biodatasets/my_dataset/my_dataset.py
.