Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Oracle entity in Table 2 VS. Oracle keywords in Table 7 #10

Open
lifelongeek opened this issue Jun 21, 2021 · 5 comments
Open

Oracle entity in Table 2 VS. Oracle keywords in Table 7 #10

lifelongeek opened this issue Jun 21, 2021 · 5 comments
Labels
question Further information is requested

Comments

@lifelongeek
Copy link

I am trying to reproduce ROUGE on CNNDM with 'oracle keyword in Table 7'. 'oracle entity setting in Table 2' sounds similar to 'oracle keyword in Table 7', however, ROUGE score is very different. Could you explain how these settings are different?

image

@lifelongeek lifelongeek added the question Further information is requested label Jun 21, 2021
@jxhe
Copy link
Collaborator

jxhe commented Jun 21, 2021

Hi,

"Oracle entity" in Table 2 uses only the entity words in the groud-truth target, while "oracle keywords" contains non-entity words as well, as described in the paper

@lifelongeek
Copy link
Author

Thanks for the clarification.
I have some follow-up questions.

Does example_dataset/test.oraclewordns imply "oracle keywords"?
Does "longest sub-sequences" used for training automatic keyword extractor imply "oracle keywords"?
image

@jxhe
Copy link
Collaborator

jxhe commented Jul 28, 2021

  1. Yes, example_dataset/test.oraclewordns imply "oracle keywords"
  2. The keywords used for training automatic keyword extractor are "oracle keywords", yet strictly speaking "oracle keywords" are not exactly "longest sub-sequences" -- as described in your screenshot, "we remove duplicate words and stop words and keep the remaining tokens as keywords"

@Wendy-Xiao
Copy link

Hi,

I have a quick follow-up question on this point. For 'oracle entities', which NER tool did you used for extacting oracle entities from the reference summary?

Thanks a lot!!

@jxhe
Copy link
Collaborator

jxhe commented Jul 5, 2022

Hi, we use stanza for NER, you may refer to some examples here:

def entity_random(split, src, datadir, nsample=100, human_study=False):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants