-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Turtle parser fails for unknown reason (SPARQL 1.1 Syntax Update 1 manifest.ttl) #37
Comments
Just out of curiosity - what's the subject created by the ARC2 parser for the The problem is the I guess it is expected to work the way that when a remote RDF dataset is being read, the reader should set the base IRI to the remote document location but as the TriGParser does not know the source is a remote dataset, it's on you to include the |
Still, we should expect the Funnily both solutions seems to be standards-conformant. For the definition of the base IRI the RDF concepts redirects us to the RFC3986 which says there are four possible sources for the base URI (and first available should be used):
|
Discussed it with @k00ni on the side and he's in favour of using a default base URI as a fallback. A pull request will follow shortly. |
The base URI MUST be configurable in the parser and should be the last URI after redirection of the HTTP fetch that took place. If this file just comes from disk, we should expect an I would not use a default base URI hardcoded in the library |
We should probably add this as a test-case too! |
We never considered fixing it to a single value. The only option considered was the last point of the RFC3986 cited below - a fallback when no better option is available. That being said on the Anyway after some discussion with @k00ni I convinced him to throw an error in such a case with the message explaining the data can't be parse because the base URI is unknown and that in should be set with the |
One more solution to consider. The current problem is caused by the non-strict type comparisons of the I personally think throwing an error explaining the lack of the base URI is still the best solution. |
In fact the Funnily (or not) it works there but a document like Should I investigate it or can we assume the test should be adjusted to expect the error to be thrown there? (btw |
* TriGParser: distinguish empty entities from no-etity being read See #37 (closes #37) * TriGParserTest::testBlankNodes() adjusted * removed the prefixed-only IRIs input line from the first test scenario as this does not belong to the testBlankNodes() tests and is tested aleady in testIssue37() * turned the empty prefixed IRIs test scenario into two - first, where and error is expected due to unknown document base IRI and second, where parsing succeeds thanks to `documentIRI` parser option being set * Update test/TriGParserTest.php * Update test/TriGParserTest.php --------- Co-authored-by: Konrad Abicht <[email protected]>
Because of various PHP-magic, I would be in favor of using strict-comparisons (
If our current tests don't reflect the specified behavior, they have to be adapted accordingly. I am not sure what you mean with "turns parser into undefined state". We should aim for a reliable solution which we understand and control. I have no problem if the parser acts weird (at least for a while) in some edge cases which no one uses. However, I really appreciate you taking the time here @zozlak. |
The $curState = $this->parseTopLevel;
foreach ($lexems as $i) {
$curState = $curState($i);
} Our issue was for an empty relative IRI with no document base IRI the callable handling an entity parsing was returning |
I agree with this as a general statement but I don't see reviewing the hardf code base for that in the predictable future. |
Could someone with enough rights (@pietercolpaert or @k00ni I think) make a release including the #42 merge, please? |
Done. |
@pietercolpaert When parsing turtle file https://www.w3.org/2009/sparql/docs/tests/data-sparql11/syntax-update-1/manifest parser returns 0 triples.
ARC2's Turtle parser has no problems and returns all triples, so file should be correct.
I made a test which shows the problem: https://github.com/pietercolpaert/hardf/blob/error/turtle-parser-fails-unknown/test/TriGParserTest.php#L2061-L2072 (branch error/turtle-parser-fails-unknown)
Any idea why?
The text was updated successfully, but these errors were encountered: