Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous notes from the great snorkeling of 2018 #39

Open
dhimmel opened this issue Apr 30, 2018 · 3 comments
Open

Miscellaneous notes from the great snorkeling of 2018 #39

dhimmel opened this issue Apr 30, 2018 · 3 comments

Comments

@dhimmel
Copy link
Collaborator

dhimmel commented Apr 30, 2018

In Palo Alto.

@dhimmel
Copy link
Collaborator Author

dhimmel commented Apr 30, 2018

Monday

  • When using labeling functions to suppress mistagged genes, never return positive evidence, just 0 or -1. source
  • Make LFS a dictionary of name to function

Issue

Hetionet labeling function is mostly voting 1 rather than -1 (almost all sentences seem to have a gene and disease for a relationship in Hetionet, regardless of whether the sentence attests to that relationship)

dhimmel pushed a commit that referenced this issue May 1, 2018
Progress from day 1 of snorkel week #39

Output additional information about gene-disease pairs.
dhimmel pushed a commit that referenced this issue May 1, 2018
@dhimmel
Copy link
Collaborator Author

dhimmel commented May 1, 2018

Tuesday

  • Scale up to 50k labeled sentences
  • Consider labeling dev set
  • Determine how we want label probabilities to be scaled

dhimmel pushed a commit that referenced this issue May 1, 2018
Merges #45
Part of Snorkel Week day 2 #39
dhimmel pushed a commit that referenced this issue May 2, 2018
Merges #46

With 10,000 development sentences, ordered randomly for manual curation.

Part of Snorkel Week day 3 #39
dhimmel pushed a commit that referenced this issue May 2, 2018
dhimmel pushed a commit that referenced this issue May 2, 2018
@dhimmel
Copy link
Collaborator Author

dhimmel commented May 4, 2018

Human calls for 100 development sentences

I've gone through 100 sentences, which will be useful for assessing our generative model (consensus/training labels). These are good examples to look at to see why this is a very hard problem. sentence-labels-dev.xlsx. CC @danich1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant