<doi>+<doi> is a common pattern #4

halfak · 2015-10-29T14:55:37Z

Another type of failure I see is looks like this: 10.1086/591526+10.1088/0004-637X/706/1/L203

I'm not sure how we'd be able to tell that a "+" is not part of the DOI.

When I search for this exact string, I found this listing: http://arxiv.org/abs/0805.4758 It seems that both DOIs are associated with the same paper. One of the paper itself and another is an errata for the paper!

I'm thinking that we might get high fitness by having a special rule in the parser for splitting characters like "+&?". If we see them right before some whitespace or a new DOI_START, then stop reading the DOI.

The text was updated successfully, but these errors were encountered:

halfak · 2015-10-29T14:59:16Z

I'm also seeing DOIs like this: "10.1002/ajp.22007/abstract;jsessionid=397B42DDD36E4F654BAB381E3104ABB3.d02t04" So we should include semi-colon in the list of splitting characters.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

<doi>+<doi> is a common pattern #4

<doi>+<doi> is a common pattern #4

halfak commented Oct 29, 2015

halfak commented Oct 29, 2015

<doi>+<doi> is a common pattern #4

<doi>+<doi> is a common pattern #4

Comments

halfak commented Oct 29, 2015

halfak commented Oct 29, 2015