-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(tracking) need explicit notes that <<
and >>
are not any of the visually similar characters
#282
Comments
It's not much different than "<" and ">" for IRIREF, which is long standing. If you look at the grammar, quotedTriple uses specific unicode characters for "<<" and ">>". We could add something when they are introduced in 2.2 Quoted Triples that clarifies this, but we never felt the need to do so for "<", and where would such over descriptiveness end? I could imagine that some editors might automatically replace "<<" with "«" when typing, much as '"' is often replaced with '‟' (DOUBLE HIGH-REVERSED09-QUOTATION MARK) or '〞' (DOUBLE PRIME QUOTATION MARK), but that's a different problem. |
Whether this is "over descriptive" depends on the reader and writer, in my opinion. Where "such over descriptiveness [would] end" seems logically to be the Unicode code point of such characters. Specifying the characters which reveal that Unicode code point would certainly feel like over-specification, akin to double-escaping of URL characters. Of course, participants in this WG ought not need this degree of specificity — but some number of us have confusingly referred (and I daresay will refer in future) to the double-{less|greater}-than as "chevrons". Auto-replacement by editors is another potential headache, and seems to me to be worthy of a note in one or more documents, as this would not have been an issue with the single-{less|greater}-than characters used in It might be sufficient to make this explicit in the EBNFs, where a number of characters are already explicitly identified by their Unicode code points ... and where these characters are not now explicitly identified, though the <tr id="grammar-production-quotedTriple">
<td>[7]</td>
<td><code>quotedTriple</code></td>
<td>::=</td>
<td>"<code class="grammar-literal"><<</code>" <a href="#grammar-production-subject">subject</a> <a href="#grammar-production-predicate">predicate</a> <a href="#grammar-production-object">object</a> "<code class="grammar-literal">>></code>"</td>
</tr> The specific Unicode characters are not now made explicit, nor is the HTML markup now visible, where humans would reasonably be expected to consume these. I've created PR#36 on rdf-n-triples for this. If that works for all, it can be echoed on |
Unicode Character Names (the formal name) are just one set of naming or characters. I'm surprised how many of these names are mis-aligned with some common community usage and practices so readers will not have heard of the Unicode Character name. We can avoid further confusion by using the Unicode codepoint value, which is succinct, unambiguous to the reader and has a established way to write it. |
I think this is not far removed from the LANGTAG specification where I firmly believe that all of |
We should avoid the name which is confusing as there are several alternatives for each. Just give the codepoint EBNF is a formal format that has no way to put in commentary. |
No, I believe I mean the same EBNF as you think I do, e.g., I don't meant to add commentary there, but simply to explicitly identify the Unicode characters we've been discussing in the relevant places. For instance --
-- might become --
I do suggest that we do this for all instances of these characters through the EBNFs, which I know some will consider overkill, but visual identification of characters like these is simply not reliable, unlike characters like I can live with leaving the character names out of the body text, though I think it would be better to include them (and to address your concern about multiple and/or similar names, we could be more explicit, e.g., instead of |
The outcome of using code points here, and all the other places you now propose, means that readers have a harder time relating examples they see on the web (material outside the control of this WG), and the examples in our specs, with the grammars. Being clear at the point of definition is enough. |
I agree with @afs, this will make readers job more challenging, and it is unnecessary IMHO. The character sequence '<<' is unambiguous when processed for the purpose of parsing input as the Unicode characters effectively represent themselves, which is the whole point of the EBNF. It is only when reading (really just reading a printed page) that it could be ambiguous if '<<' is intended, or some alternative sequence of similar looking character(s). Adding |
I have witnessed the confusion. Unfortunately, I don't recall whether it was in email, IRC, or otherwise, and I am not finding it easy to locate in my local logs. I will live with |
* Specify IRIREF and quoted-triple wrappers more accurately partially answers w3c/rdf-star#282 --------- Co-authored-by: Gregg Kellogg <[email protected]>
This is necessary for at least Turtle, N-triples, and N-quads. I have not created distinct issues.
The syntax discussions and EBNFs all use the right characters, but I have located nothing that explicitly states that
<<
and>>
are not any of the several visually similar characters. (Degree of similarity varies with font, among other things.)The text was updated successfully, but these errors were encountered: