Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending the writer with the possibility to write a comment to the outputStream #439

Closed
pietercolpaert opened this issue Sep 6, 2024 · 7 comments

Comments

@pietercolpaert
Copy link
Member

pietercolpaert commented Sep 6, 2024

Recently we added support for optionally parsing comments. We should probably also have a public function in the writer that allows one to write a comment. I.e.:

addComment(comment: string) {
   this._outputStream.push('# ' + comment . "\n");
}

In the StreamWriter, it could then be listening in on comment events from the input.

Does this sound reasonable? If so, I can open a PR.

EDIT: we need to be careful about newlines in the comment string

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Sep 6, 2024

If added, it would definitely need to be in Writer, because only that component has the ability to terminate. (For example, the . after a triple will only be written until the next subject arrives, because otherwise it might be a ; instead.)

The piping I'm not sure. I think it starts getting hacky/unstable.

For your purposes, the intended effect can already be achieved just by writing to _outputStream yourself (after some kind of flush to trigger the .). Or maybe the go to is add a flush method to Writer that indeed terminates triples when called.

But then there's also questions surrounding TriG for instance (do we also terminate graphs, maybe yes, maybe no).

TL;DR: There are options, this issue is triggering my thoughts on the general concept of the current https://github.com/pietercolpaert/Wurtle.js/ syntax. I'm very much starting to think

_:w wurtle:begin 0.
ex:Collection1 a tree:Collection;
            rdfs:label "A Collection of 2 subjects"@en;
            tree:member ex:Subject1, ex:Subject2 .
_:w wurtle:end 0.
_:w wurtle:begin 1.
ex:Subject1 a ex:Subject ;
            rdfs:label "Subject 1" ;
            ex:linkedTo [ a ex:Subject ] .
_:w wurtle:begin 1.

Or alternatives like

wurtle: wurtle:begin 1.

or simply

wurtle:separator wurtle:separator wurtle:separator.

or even

:_____ wurtle:separator :_____.

and one for the weekend

<🥕> wurtle:separator <🥕>.

because then all you need is order stability (which, streaming, you have), not comment stability (hacky at best).

And bonus points for also working in JSON-LD, which doesn't have comments. And any current or future RDF format or streaming parser, really.

@pietercolpaert
Copy link
Member Author

If added, it would definitely need to be in Writer, because only that component has the ability to terminate. (For example, the . after a triple will only be written until the next subject arrives, because otherwise it might be a ; instead.)

I don’t believe termination is needed. I.e.:

# @group begin
ex:Collection1 a tree:Collection;
            rdfs:label "A Collection of 2 subjects"@en;
# @group end
# @group begin
            tree:member ex:Subject1 .
ex:Subject1 a ex:Subject ;
            rdfs:label "Subject 1" ;
            ex:linkedTo [ a ex:Subject ] .
# @group end

Is perfectly valid and will return the desired result.

<🥕> <🥕> <🥕> <🥕>

Wrt. Wurtle syntax: while I certainly feel like I could appreciate <🥕> <🥕> <🥕> <🥕>. as a separator of groups, I find mixing a simple parser hint with an actual RDF triple that passes and that might get uselessly important over and over again in triple stores world wide a more controversial proposal than comments 😅

And bonus points for also working in JSON-LD, which doesn't have comments. And any current or future RDF format or streaming parser, really.

You already have syntactic options for JSON-LD profiles there, such as:

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Sep 6, 2024

I don’t believe termination is needed.

You're right; you don't need triple completion syntactically. But, your example would then be

# @group begin
ex:Collection1 a tree:Collection;
            rdfs:label "A Collection of 2 subjects"@en# @group end
# @group begin
;
            tree:member ex:Subject1 .
ex:Subject1 a ex:Subject ;
            rdfs:label "Subject 1" ;
            ex:linkedTo [ a ex:Subject# @group end
 ].

Ugly but works—and you'd probably want push(`\n# ${comment}\n`), i.e., newline before as well.

an actual RDF triple that passes and that might get uselessly important

I do think that your case specifically prevents this, because you are 100% (I think) certain that there is conneg. Namely, Wurtle streams (if I understand correctly) will always be generated by an HTTP server, which means there's always negotiation, which means that clients can specifically ask for Wurtle, which means that the client-side parser can always be configured as Wurtle.

I.e., I don't see—but might be mistaken—scenarios where a non-Wurtle client would ever end up eating Wurtles.

@pietercolpaert
Copy link
Member Author

pietercolpaert commented Sep 6, 2024

Ugly but works

Beauty is in the eye of the beholder of course ;-) I don’t dislike it that way

I do think that your case specifically prevents this, because you are 100% (I think) certain that there is conneg.

I don’t think so: you might want to respond with a text/tree+turtle response cfr. https://github.com/pietercolpaert/TREE-N3-profiles on all requests, even they just ask for text/turtle. In the latter case, the comments will be ignored, the parsing will be slower, but everything remains functional

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Sep 6, 2024

you might want to respond

But why though? You're generating this on the fly—or not? (If not, fair enough. Then maybe I don't understand the purpose of the grouping well.)

text/tree+turtle

Oh god no, that should most definitely be a profile, not a content-type: pietercolpaert/TREE-profiles#1

even they just ask for text/turtle

They wouldn't be able to parse it (unless served with text/turtle, as it should).

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Sep 6, 2024

But in a larger sense, I think the trade-off here is: do we want to change entire parsing infrastructures/pipelines do make comments have meaning (and thus place the onus on the parser), or rather change the interpreter of the RDF?

I didn't question this before the creation of this issue; it shows how fiddly it is to pipe comments out-of-band into another stream. Whole range of issues like back-pressure etc. It will absolutely be out of order and not work.

🚨 Actually, yes. I only realize now: what you propose is not possible. The timing of the events will be off, because of buffering. The comments will end up in the wrong places if you chain the events.

@RubenVerborgh
Copy link
Member

RubenVerborgh commented Sep 6, 2024

Closing this feature suggestion as it definitely will not work as intended on StreamWriter. Would be open to consider on Writer, but that might not solve the issue.

That's the inherent problem with the current approach: to stream Wurtle properly, it must be one stream of triples and comments, which is what takes care of the synchronization. Currently, it's two separate streams, one an actual Node Stream and the other a makeshift EventEmitter-based stream, without synchronization between them.

The "separators as triples" approach would have built-in sync, at the cost indeed of offering non-data triples (which may or may not be an issue). The alternative is to change from Stream<Quad> to Stream<Quad|Comment>, which would also offer built-in sync, at the cost of type variability. The former, N3.js can support; the latter probably would result in an unacceptable performance (V8 compiler polymorphism) and developer experience trade-off. External libraries could build classes on top of N3.Parser and N3.Writer that do provide Stream<Quad|Comment>. However, they cannot be built on top of N3.StreamParser because there is no guarantee about the synchronization of the comments and data (#440).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants