Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Writing "Good" Labeling functions #8

Open
danich1 opened this issue Jan 24, 2017 · 2 comments
Open

Writing "Good" Labeling functions #8

danich1 opened this issue Jan 24, 2017 · 2 comments

Comments

@danich1
Copy link
Contributor

danich1 commented Jan 24, 2017

Our aim is to generate useful labeling functions from a given set of candidate sentences provided below:

@danich1 danich1 changed the title Writing " Writing "Good" Labeling functions Jan 24, 2017
@dhimmel
Copy link
Collaborator

dhimmel commented Jan 24, 2017

We'll know more about this issue once we start writing labeling functions.

Some background, for each relationship type we'll be starting with a knowledgebase (gold standard) of known relationships. These will generally be a relationship type from Hetionet. So the first two labeling functions will be:

  1. return 1 if the relationship is in the gold standard
  2. return -1 if the relationship is not in the gold standard

Then we will have to make additional labeling functions to refine the classifier. We're hoping to parallelize this task to some degree, i.e. everyone involved can submit additional labeling functions. So we'll have to develop a framework that allows anyone to submit labeling functions.

And it's our impression that snorkel will be able to evaluate the quality of each labeling function? So it's not the end of the world if some of our labeling functions are imperfect.

@danich1
Copy link
Contributor Author

danich1 commented Jan 28, 2017

Below are examples of the desired and undesired Disease-Gene candidate relationships we will be working with.

Good Example:

PATIENT: We describe a male infant with early infantile epileptic encephalopathy with suppression-burst (Ohtahara syndrome) who carried a de novo 2.0-Mb microdeletion in chromosome 9q33q34, including STXBP1.

In the quote above, the Disease-Gene candidate relation is in bold. This is a good example because the relationship is in our gold standard list, so it would receive a +1.

Bad Example:

Xq28, which includes MECP2 is the major locus for submicroscopic X-chromosome duplications, whereas duplications in Xq25 and Xq26 have been reported in only a few cases.

This is a bad example because the disease in this context has nothing to do with epilepsy, so this relationship would receive a -1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants