-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #377: Implemented support for adding the tokens to parsed nodes. #388
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @faubulous, off to a good start.
Gist of my comments:
- Let's make the second argument a
context
object{ token }
for extensibility. - Let's expose this factory as-is, with the second optional argument.
- The base
Term
constructor should do the work of assigningcontext
/token
. (Preferablycontext
actually;Term
should not look insidecontext
.) - Let's not use explicit
undefined
; onlynull
or''
.
Open questions:
- Reading mechanism for context/token?
namedNode: iri => namedNode(iri), | ||
blankNode: name => blankNode(name), | ||
variable: name => variable(name), | ||
literal: (name, datatype) => literal(name, datatype), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy leaving these as they are actually. It doesn't hurt to have the second optional argument, and that way, we can expose it to other parsers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand you correctly, you want to pass the default constructor functions to the DataFactory as they are. This would mean, however, that the default behavior of the DataFactory changes to always creating Terms with an initialized { token } context. Which in turn will break some tests and is probably not needed in standard situations.
I was hesitant to put a condition into the Parser to check wether to pass the token/context argument to the factory or not because this would have an impact on the performance. The solution to simply ignore the second argument in the DataFactory seemed like a good compromise and allows the API users to define their own behavior regarding the context variable of a node, such as offsetting positions or adding additional data if needed.
So should I revert this to the default functions and have the context always initialized with a token?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the default behavior of the DataFactory changes to always creating Terms with an initialized { token } context.
That's fine; objects can always have optional fields.
Which in turn will break some tests
Those tests might be too strict; we can loosen them.
I was hesitant to put a condition into the Parser to check wether to pass the token/context argument to the factory
Indeed; let's definitely not.
The solution to simply ignore the second argument in the DataFactory
Aah okay. This actually shows me that we need 2 DataFactory classes then. One that discards the context, and one that keeps it. Let's make both first-class citizens.
src/N3DataFactory.js
Outdated
constructor(id, token) { | ||
super(id); | ||
|
||
this.token = token; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
constructor(id, token) { | |
super(id); | |
this.token = token; | |
} | |
constructor(id, token = null) { | |
super(id); | |
this.token = token; | |
} |
I'm nitting here, but I want things to be explicit. No undefined
.
However, even better would be if the Term
constructor simply took token
; then we don't need any of this.
And let's not make it token, let's make it context
, with let { token } = context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's OK. So will the default be context = { token = null } or context = null?
I am personally hesitant to initialize token
with an empty string as this is a different datataype as the Token
object and it might indicate that one should expect a non-empty string when it is initalized which is misleading.
But I'll do as you wish. Just let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Saw the answer in the comment below. Will go with DEFAULT_CONTEXT.
src/N3DataFactory.js
Outdated
@@ -92,7 +105,7 @@ export class Literal extends Term { | |||
|
|||
// ### The datatype IRI of this literal | |||
get datatype() { | |||
return new NamedNode(this.datatypeString); | |||
return new NamedNode(this.datatypeString, this.token); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm hesitant to pass the token along here. I don't think we should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
src/N3DataFactory.js
Outdated
@@ -273,7 +291,8 @@ export function termToId(term, nested) { | |||
// ## Quad constructor | |||
export class Quad extends Term { | |||
constructor(subject, predicate, object, graph) { | |||
super(''); | |||
super(undefined); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never. Either null
or ''
, but never explicitly undefined
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, no problem.
However, I'm just curious to learn why. Am I right that you want the variable to be initialized with something so that it is visible on an API level? I have to admit that I am not that deep into JavaScript to know all the consequences. It was my reasoning that when passing undefined
the variable might not occupy any memory as interpreters might just delete it and thus having the API behave as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was my reasoning that when passing
undefined
the variable might not occupy any memory
That's not how JavaScript engines work today; the will still explicitly assign the field.
However, I'm just curious to learn why.
Mainly for explicitness. undefined
is what happens when we don't pass anything, forget stuff, uninitialized, etc. null
signifies a choice. Choices are good.
src/N3Parser.js
Outdated
@@ -194,7 +194,7 @@ export default class N3Parser { | |||
case '[': | |||
// Start a new quad with a new blank node as subject | |||
this._saveContext('blank', this._graph, | |||
this._subject = this._blankNode(), null, null); | |||
this._subject = this._blankNode(undefined, token), null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not undefined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should I initialize it then? Please advise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
null
, or allow a constructor that makes the first argument optional, like this._blankNode({ token })
. Then we can do a string test on the first argument.
src/N3Parser.js
Outdated
@@ -206,7 +206,7 @@ export default class N3Parser { | |||
if (!this._n3Mode) | |||
return this._error('Unexpected graph', token); | |||
this._saveContext('formula', this._graph, | |||
this._graph = this._blankNode(), null, null); | |||
this._graph = this._blankNode(undefined, token), null, null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idem (for all below)
The more I think about it, it should be:
with |
Thank you for your quick review @RubenVerborgh! I implemented most of your suggested changes except exporting the DataFactory as is because this will change the default behavior of the API and break some unit tests. Please advise. |
I only had a brief look as I am on vacation and traveling, so forgive me if I missed something. It looks like you are adding arguments to the factory methods, which are defined in the RDF/JS spec. That may cause problems if someone else comes up with the same idea but with different properties. If we want to move forward in that direction, we must extend the specification to avoid any conflicts. It would be good if the spec would cover that topic, but it may take a bit longer and it's understandable if one want to move forward faster. I would recommend another approach that doesn't require extending the spec. Just pass a flag to the constructor of the parser, and if it's set, set the properties after creating the Term instance with the factory like this: const term = factory.namedNode(iri)
term.source = { url, line } Also, please have a look at the additional properties page in the wiki of the spec. I requested creating that page with source maps or DOM element linking in mind. It would be great if we could define a reusable standard structure for it. I have something like this in mind with all properties optional: term.source = {
url, // file or http URL
line, // line number
position, // start position of the token
text, // text of the token
element // DOM element for RDFa
} I'm back in the week of 22nd of July. It would be great if we could keep the discussion open till the end of that week. |
Indeed, and thanks for looking at this. That's the main reason why I am proposing an object as an optional second argument, so
I could also agree with this; just not a big fan of external classes managing the internal state of another class. Plus, some factories could create frozen objects etc… I think the first approach, with an optional object, is actually the safest all things considered.
+1 to this
We'll wait for you, mate. Would be good to have an RDF/JS issue open to track this then. Cheers! |
I agree with that. If we extend the interface, we will almost certainly use an additional object argument. I dug a bit into the code. The debugger showed me that the My proposal:
|
Thanks @bergos! Based on your comment, may I suggest to @faubulous:
We should still retain backward compatibility with those who expect |
Dear @RubenVerborgh, could you please summarise what changes you expect me to make? From the comments I gather that I need to refactor the token argument into an object of conforming to this interface:
Is this correct? |
@faubulous That's the one! |
@faubulous @RubenVerborgh I expected
|
@bergos Doesn't fully make sense to me; some of those are not source attributes. Perhaps:
|
@RubenVerborgh All of them are source attributes, as a different serialization may generate different tokens, and prefixes are also defined as part of the serialization. Maybe a short overview of how I think additional attributes could be used in the future makes it easier to understand my view: Terms and Quads have a lifecycle. They are generated by a source, they can be shown somewhere and bound to UI components, and they can be written to a target, which can be another intermediate data model object or the final serialization. An example UI application could be a Turtle to Turtle mapper with a Quad table component that lists the complete content of the source document. A prefix table component can be used to define the prefixes of the target document. On a click on Quad of the table component, the source and target tokens are highlighted. Behind the scenes, the parser attaches a Using only the Term and Quad objects for this use case may not be optimal. Additional data structures like maps are required for performance reasons. But that may vary based on the use case. The important point here is: how could additional properties be used? My goal would be to get some kind of standard for the properties. It should be possible to use them with different libraries. Therefore, we should have a look at the whole lifecycle of data model objects. |
I'll leave the decision to others. |
@bergos I've implemented most of the requested changes. However, I want to make sure that I got it right. The context object should be structured like this:
|
I'm very unsure now why we have an object with a single property wrapping an object with all the other properties. |
Agree. I would personally prefer the original proposal of: Where the token object is provided as-is in a wrapped context. We could also rename token to source, but I would not like to map the token to another format as this is expensive. At least we would spare ourselves the transformation step for the original token object. However, this also requires the Parser to provide With this approach, every application that needs another information / structure can provide a custom DataFactory to implement all required transformations or mappings like this:
|
I extended the factory methods
namedNode
,blankNode
,literalNode
andvariable
to also accept an optional token as additional parameter and pass it to the created Term instance.Then I changed the Parser to provide the tokens as an additional argument when parsing a file.
The default exported
DataFactory
does, however, not pass the token to the factory methods and thus creates nodes without additional added tokens as a default behaviour. This means that the current behaviour is not changed and all existing tests pass.Using the exported factory methods for one can create a custom DataFactory which can be passed to the parser as an argument using the existing API. This way the exported tokens can either be a) passed as they are or b) modified and stored with the nodes as neded.