You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Another type of failure I see is looks like this: 10.1086/591526+10.1088/0004-637X/706/1/L203
I'm not sure how we'd be able to tell that a "+" is not part of the DOI.
When I search for this exact string, I found this listing: http://arxiv.org/abs/0805.4758 It seems that both DOIs are associated with the same paper. One of the paper itself and another is an errata for the paper!
I'm thinking that we might get high fitness by having a special rule in the parser for splitting characters like "+&?". If we see them right before some whitespace or a new DOI_START, then stop reading the DOI.
The text was updated successfully, but these errors were encountered:
I'm also seeing DOIs like this: "10.1002/ajp.22007/abstract;jsessionid=397B42DDD36E4F654BAB381E3104ABB3.d02t04" So we should include semi-colon in the list of splitting characters.
Another type of failure I see is looks like this:
10.1086/591526+10.1088/0004-637X/706/1/L203
I'm not sure how we'd be able to tell that a "+" is not part of the DOI.
When I search for this exact string, I found this listing: http://arxiv.org/abs/0805.4758 It seems that both DOIs are associated with the same paper. One of the paper itself and another is an errata for the paper!
I'm thinking that we might get high fitness by having a special rule in the parser for splitting characters like "+&?". If we see them right before some whitespace or a new DOI_START, then stop reading the DOI.
The text was updated successfully, but these errors were encountered: