Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🙅 Inaccurate model coref predictions master thread #215

Closed
svlandeg opened this issue Oct 16, 2019 · 19 comments
Closed

🙅 Inaccurate model coref predictions master thread #215

svlandeg opened this issue Oct 16, 2019 · 19 comments

Comments

@svlandeg
Copy link
Collaborator

Master thread for collecting incorrect and/or problematic coreference predictions with the pretrained models. These can be interesting test cases when training the next version of the model.

@svlandeg svlandeg pinned this issue Oct 16, 2019
@svlandeg svlandeg changed the title Inaccurate model coref predictions master thread 🙅 Inaccurate model coref predictions master thread Oct 17, 2019
@svlandeg svlandeg changed the title 🙅 Inaccurate model coref predictions master thread Inaccurate model coref predictions master thread Oct 17, 2019
@svlandeg svlandeg changed the title Inaccurate model coref predictions master thread 🙅 Inaccurate model coref predictions master thread Oct 17, 2019
@petulla
Copy link

petulla commented Nov 21, 2019

updated to include article url, doh *

For this article, the model struggles with NASA's James Webb Space Telescope.

This is the mentions array:

[Mauna Kea: [Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, Mauna Kea, it, Mauna Kea, Mauna Kea], Hawaii: [Hawaii, Hawaii, Hawaii, Hawaii, Hawaii, its, Hawaii, Hawaii, Hawaii, Hawaii], Spain: [Spain, Spain, Spain, Spain, Spain, Spain], Earth: [Earth, Earth, Earth, Earth], astronomers: [astronomers, their], the world's largest telescope in Hawaii: [the world's largest telescope in Hawaii, the telescope, the telescope, it, the Webb telescope, the telescope, it, Thirty Meter Telescope, Its, the telescope on La Palma, the telescope in Spain, this telescope], the islands': [the islands', their], Meter Telescope officials: [Meter Telescope officials, their], their backup site atop a peak on the Spanish Canary island of La Palma: [their backup site atop a peak on the Spanish Canary island of La Palma, it, it, the site], La Palma: [La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma, La Palma], Mauna Kea's: [Mauna Kea's, Mauna Kea's], Bolte, who has used existing Mauna Kea telescopes: [Bolte, who has used existing Mauna Kea telescopes, he], Bolte: [Bolte, Bolte, Bolte, The telescope group's Bolte], Webb: [Webb, Webb], Mather: [Mather, He, Mather, he], bright stars: [bright stars, them], Loeb: [astrophysicist Avi Loeb, who chairs Harvard University's astronomy department, Loeb, Loeb, he], The Native Hawaiian opponents: [The Native Hawaiian opponents, themselves, their, They], the telescope group: [the telescope group, The telescope group], protest leader Kealoha Pisciotta: [protest leader Kealoha Pisciotta, Pisciotta], Thirty Meter Telescope officials: [Thirty Meter Telescope officials, they], the Canary Islands: [the Canary Islands, the Canary Islands, the Canary Islands], Others: [Others, their, their], Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands: [Jos Manuel Vilchez, an astronomer with Spain's Higher Council of Scientific Research and a former member of the scientific committee of the Astrophysics Institute of the Canary Islands, We, We], Vilchez: [Vilchez, Vilchez, Vilchez, Vilchez], Native Hawaiians: [Native Hawaiians, their, they, Native Hawaiians]]

Webb is broken out as if it is a last name when it is the part of the telescope's name. In general the model struggles to tell the difference between the two telescopes mentioned in the article.

I'm wondering if Bert Span-based model might be an option for the next release? I tried the above text on it and it is slightly better (though still imperfect). https://github.com/mandarjoshi90/coref

@Atul-Anand-Jha
Copy link

Hey @svlandeg ,
I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails.
To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond.
Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

our results
Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

live demo results
Fig: your live demo result

@EvanFabry
Copy link

EvanFabry commented Jan 28, 2020

Hey @svlandeg ,
I have observed that the live demo on https://huggingface.co/coref/ is surprisingly correct at multiple cases, when My local implementation of the same model fails.
To ensure this furthermore, I have tried to resolve co-reference with both of the models, latest one - neuralcorefv4.0 and neuralcoref-lg-3.0.0. And, our results are way poorer than your live demo. I am attaching a screenshot for your better understanding of the situation. Have a look, and please respond.
Does the live demo implements a different model than both of the above mentioned ones. if So; How can we get them to implement in our project.?????

our results
Fig: our implementation results ( neuralcoref v3 + neuralcored v4)

live demo results
Fig: your live demo result

+1. I've noticed discrepancies between performance locally and in the dev environment. @svlandeg @thomwolf, can you comment on what exactly is currently served by the demo environment?

@svlandeg
Copy link
Collaborator Author

svlandeg commented Feb 9, 2020

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

@Atul-Anand-Jha
Copy link

I wasn't involved with this project when the demo environment was created. However, note that it's not just the version that was trained that makes a difference, but also the specific hyperparameters used for making the predictions. So that is definitely something you can "play" with too.

Thanks,
I actually tried different options for these Hyper-parameters, But, none of the model release' uploaded here could match the demo one.

@aereobert
Copy link

aereobert commented Mar 15, 2020

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf


resolved by compiling spacy 2.1.0 and neuralcoref from source code.

@aamin3
Copy link

aamin3 commented Apr 11, 2020

Same here.

With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.

In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.

I am wondering how exactly to reproduce the result on the sample page.

Thank you very much!

@svlandeg @thomwolf

resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello,
you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

@aereobert
Copy link

aereobert commented Apr 15, 2020

Same here.
With exactly the same sentence provided from the sample site, I tried all kinds of options for hyperparameters, but I am still unable to reproduce the result. I installed Spacy and Neuralcoref from source on a brand new Docker, so it should not be a problem of environment.
In the sample page, the score is usually like 3 to 15, where on my environment the result is always like -2 to 2.
I am wondering how exactly to reproduce the result on the sample page.
Thank you very much!
@svlandeg @thomwolf
resolved by compiling spacy 2.1.0 and neuralcoref from source code.

hello,
you confirm that the demo results can be achieved by compiling spacy 2.1.0 and neuralcoref from source code?

Not exactly. I am just saying that this would increase the accuracy on my side, from unusable to usable.

@cfoster0
Copy link

Surprised to see the following.

On the example sentence in the README, neuralcoref predicts accurately:

Screen Shot 2020-04-17 at 3 48 29 PM

But on a slight modification, where we switch sister to brother, and swap the pronouns, we get an incorrect prediction on the second sentence:

Screen Shot 2020-04-17 at 3 49 03 PM

@noelslice
Copy link
Contributor

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

@pborysov
Copy link

It would be very helpful if someone could shed some light on what model and combination of package versions are used in the demo environment. Like others here I'm not able to reproduce what I see on the demo in my local setup, even when rolling spacy back to 2.1.3 and building neuralcoref from source. It feels like the model served in the demo environment is a different model or it was trained with different word embeddings. Are pretrained neuralcoref models tied to any specific spacy language model tag? I've played with the parameters like others here with some improvement but I'm still seeing systematic differences with the live demo.

Totally agree!!! Online demo is an ideal starting point, but only if it is reproducible :(

@Keating950
Copy link

Keating950 commented Jul 27, 2020

I'm not able to share much in the way of text for confidentiality reasons, but I'm noticing that the pre-trained model seems to be gravitating toward resolving "us" to "We." It might be useful to be able to blacklist certain words (e.g. "We") as never being satisfactory coreferents.

\< It is not up to us to rectify things
\---
\> It is not up to We to rectify things

\< It is absolutely an issue, but not only to us
\---
\> It is absolutely an issue, but not only to We
  • neuralcoref 4.0
  • spacy 2.3.2

@aamin3
Copy link

aamin3 commented Jul 28, 2020 via email

@Keating950
Copy link

Keating950 commented Aug 9, 2020

I agree- to have a more customizable blacklist (including they, it, these, who) would be wonderful. this is great tech as it is but just a suggestion

If you're interested in this feature, I've added to in my fork of this project. I'm still making sure it works, so I'm all ears to any feedback and review.

@aamin3
Copy link

aamin3 commented Aug 9, 2020 via email

@Keating950
Copy link

Yup, that's exactly right. I've updated the README. Feel free to open an issue on that repo if you have any other questions.

@aamin3
Copy link

aamin3 commented Aug 11, 2020 via email

@lauwauw
Copy link

lauwauw commented May 28, 2021

Thanks for your work @Keating950! Very helpful!

@Keating950
Copy link

@lauwauw Thanks! I've merged in the latest changes from this repo in light of the renewed interest.

@svlandeg svlandeg closed this as completed Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests