Warning:
This wiki has been archived and is now read-only.
I18n-Comments
Contents
- 1 Status
- 2 Analysis
- 3 178: Ack [E] Reference IETF BCP 47 for language declaration
- 4 179: Nack [SF] Scope of document language label
- 5 180: Ack [E] 180: Various random character references should be more explicit
- 6 181: Ack [E] Non-ASCII IRI example
- 7 182: Nack [E] cultural relevance of examples
- 8 183: Nack [E] Use of types questionable?
- 9 184: Ack Malformed escapes
- 10 185: Ack [E] Line terminator assumptions
- 11 187: Nack [SBF] escape syntax
- 12 188: Nack [S] special handling of % in IRI
- 13 189: Ack [S] reference obs-language-tag instead of defining your own
- 14 190: Nack [S] attempting to erase combining marks?
- 15 191: Ack [S] Various nits in Appendix B
- 16 192: [E] referencing Unicode
- 17 193: Nack [E] define when escapes are evaluated
Status
I (ericP) believe that Addison Phillips's response closes these issues.
Analysis
Key:
- [S] : also an issue in SPARQL
- [F] : forward compatibility (new documents rejected by old parsers)
- [B] : backward compatibility (old documents rejected by new parsers)
- [E] : editorial
178: Ack [E] Reference IETF BCP 47 for language declaration
Refer to BCP 47 for langtags.
ericP: +1
gavinc: we do indirecly, we link to language tags as defined in RDF concepts which directly refrences BCP 47 +0
179: Nack [SF] Scope of document language label
No LANG directive to set default document language.
ericP: undecided
gavinC: -0.9 way too big a feature, very hard to use for a seralizer, imposible to use for a streaming seralizer
180: Ack [E] 180: Various random character references should be more explicit
ericP: +1
gavinC: +1
181: Ack [E] Non-ASCII IRI example
ericP: +1
gavinc: +1
182: Nack [E] cultural relevance of examples
Good examples with relevent features are difficult to come up with.
183: Nack [E] Use of types questionable?
Asserts the year 2007 and the dollar amount 14074.2E9 should have datatypes.
Most scalar values have a datatype.
ericP: any proposals for unit-less numbers to replace the current example?
184: Ack Malformed escapes
The characters -, \uB7, \u300 to \u36F and \u203F to 2040 are permitted anywhere except the first character.
ericP: I don't understand these codepoints enough to understand why they'd be illegal. ... ahh, it's just that "\uB7" isn't a convention we've been using.
185: Ack [E] Line terminator assumptions
The reference to #xA; is making some assumptions about line terminators
ericP: Apparently "Assumes that line feeds in this document are #xA" described the following example. I've re-worded to clarify that this assumption is not Turtle-wide or even Turtle-spec-wide.
187: Nack [SBF] escape syntax
use \u{xxxxx} instead of \uxxxx or \Uxxxxxxxx
191 also specifically proposes \Ux{6} (six instead of eight)
ericP: -0.5 could be worth it only if the rest of the world really likes that format.
188: Nack [S] special handling of % in IRI
Why aren't %dd's de-escaped?
<>s and the like in IRIs require escaping.
189: Ack [S] reference obs-language-tag instead of defining your own
Use BCP47's "obs-language-tag" production in grammar. obs-language-tag looks like:
obs-language-tag = primary-subtag *( "-" subtag ) primary-subtag = 1*8ALPHA subtag = 1*8(ALPHA / DIGIT)
We could extend the XML expressivity to include {min,max} and incorporate obs-language-tag into the current grammar in the form:
-[144s] LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)* +[144s] LANGTAG ::= '@' [a-zA-Z]{1,8} ('-' [a-zA-Z0-9]{1,8})*
This would also allow us to improve the production for UCHAR:
-[27] UCHAR ::= '\u' HEX HEX HEX HEX | '\U' HEX HEX HEX HEX HEX HEX HEX HEX +[27] UCHAR ::= '\u' HEX{4} | '\U' HEX{8}
Simply referencing obs-language-tag is a pain when folks try to synthesize a grammar, either for implementation or comprehension.
ericP: -.9 to reference, +1 to adopting {}
notation and making Turtle more precise as currently mocked up in editor's draft notations for UCHAR and, most importantly, LANGTAG.
190: Nack [S] attempting to erase combining marks?
Using XML4 productions leads us to prohibit combaining marks and surrogates
Shouldn't we prohibit surrogates?
191: Ack [S] Various nits in Appendix B
Comments on media type registration form:
- covers docs and not in-memory representations -- RDF covers the abstract syntax.
- The reference to U+0 should read U+0000 -- okidoke.
- We recommend a different escape syntax altogether -- costs discussed in 187
- We recommend six-digit rather than eight-digit \U representation -- that's probably break no deployed data.
192: [E] referencing Unicode
Obselete reference to Unicode
ericP: anyone have the keys to update respec / bibref / biblio.js?