Store Token position in the produces quads #377

BenjaminHofstetter · 2024-02-16T13:50:04Z

Why do I need that:
After parsing a Turtle file, I lose all information about the source file. For better tooling support, I propose implementing some kind of "source maps" to trace back from quads to positions in the Turtle file.

For instance, in tools like https://shacl-playground.zazuko.com/, when encountering errors in SHACL validation reports, locating the error-causing triple requires human intervention. With source map information, editors could pinpoint the exact location in the Turtle file, aiding in error resolution. Implementing source maps would bridge the gap between parsed files and their source, enhancing tooling support. The tokenizer already generates tokens with line, start, and end information, laying the groundwork for this feature.

RubenVerborgh · 2024-02-16T14:05:14Z

This would be possible indeed, if the parser emits the context from the tokenizer in the quads.

We have no plans to take this up, but a pull request that puts this functionality behind a flag would be welcome, provided it has no performance impact when switched off.

faubulous · 2024-06-24T20:45:25Z

This is excactly what I need too. I am currently developing an RDF editing extension for Visual Studio Code named Mentor. For this use case I frequently need to resolve URIs and blank nodes to parsed Tokens and this feature would be extremely helpful.

I found a workaround for URIs which requires parsing the document again after loading and interpreting the Triples, but that only works for URIs and not for blank nodes. This currently blocks me from implementing SHACL support where blank node definitions of (property) shapes are quite common.

Any idea how such source maps could be implemented?

jeswr · 2024-06-24T22:46:25Z

Any idea how such source maps could be implemented?

Luckily tokens emitted by the Lexer already contain information about the line and position of each token emitted by the lexer. In the Parser you could add this information property of Terms every time a new _subject, _predicate, _object or _graph is assigned in the parser. For instance the code here would become

this._subject = this._blankNode();
if (this._recordPosition) {
  this._subject[POS] = { line: token.line, start: token.start }
}
this._saveContext('blank', this._graph,
                        this._subject, null, null);

I would recommend making POS a Symbol that is exported by N3.js, however it could also just be a property name like _internal_position.

The caveat of this approach would be that it might cause a non-negligible performance hit even when the feature is disabled; but I suspect this is something you can perf. test and optimise once the feature is implemented.

BenjaminHofstetter · 2024-06-25T11:48:40Z

I did a POC some time ago. I added it as a use case in the RDF-Star working group. Maybe in the future we can use RDF-Start to define such source maps "externally" from the source turtle.
w3c/rdf-star#285 (comment)

My poc is using n3 parser and exposes the tokens in the quads (not rdf-star).

faubulous · 2024-06-25T14:49:51Z

@BenjaminHofstetter Did you create a patch for N3 and publish the code of the PoC somewhere?

TallTed · 2024-06-26T16:39:34Z

Perhaps change the issue title from —
Store Token position in the produces quads
— to —
Store original positions of Tokens in quads produced by conversion from Turtle"
?

(At least, change produces to produced.)

RubenVerborgh added feature-request future semver.minor labels Feb 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store Token position in the produces quads #377

Store Token position in the produces quads #377

BenjaminHofstetter commented Feb 16, 2024

RubenVerborgh commented Feb 16, 2024

faubulous commented Jun 24, 2024

jeswr commented Jun 24, 2024 •

edited

Loading

BenjaminHofstetter commented Jun 25, 2024 •

edited

Loading

faubulous commented Jun 25, 2024

TallTed commented Jun 26, 2024

Store Token position in the produces quads #377

Store Token position in the produces quads #377

Comments

BenjaminHofstetter commented Feb 16, 2024

RubenVerborgh commented Feb 16, 2024

faubulous commented Jun 24, 2024

jeswr commented Jun 24, 2024 • edited Loading

BenjaminHofstetter commented Jun 25, 2024 • edited Loading

faubulous commented Jun 25, 2024

TallTed commented Jun 26, 2024

jeswr commented Jun 24, 2024 •

edited

Loading

BenjaminHofstetter commented Jun 25, 2024 •

edited

Loading