W3C

- DRAFT -

Direction Metadata in RDF Literals

05 Jun 2019

Agenda

Attendees

Present
addison, chaals, ivan, r12a, manu, pchampin
Regrets
Chair
Addison Phillips
Scribe
addison, chaals

Contents


<agendabot> clear agenda

<addison> Scribe: addison

https://chime.aws/5323695336

Minutes and Agenda (5 minutes)

ivan: did update the document

<ivan> https://w3c.github.io/rdf-dir-literal/

ivan: has now all the options we discussed and more

Requirements for direction information at all (15 minutes)

<r12a> manu, you may need to look also at this: https://github.com/w3c/rdf-dir-literal/wiki/Draft-ideas-related-to-string-metadata-storage-options

<r12a> and this, manu: https://github.com/w3c/rdf-dir-literal/issues/7#issuecomment-498268872

<chaals> scribe: chaals

Manu: Why do we need this? Can't we get it from language information?

Addison: There is low-quality data all over the place, sometimes tagged properly, sometimes not.
... There are parallels to what we have internally all over the Web. there are places where I might have direction information, where language information is at best a guess.
... I am currently relying on low-quality heuristics, but when I have accurate data, I don't want them to apply.

<addison> chaals: so there are a bunch of use cases

<addison> ... not a standard requirement for all content

<addison> ... e.g. comments on youtube

<addison> ... think about how json handles strings

<addison> ... json doesn't break strings down well

<addison> ... and devs don't like doing that

<addison> ... plain strings that don't do that

<addison> zakim ivan

<addison> ivan: went through use cases you (addison) posted

ivan: read through the use cases and wiki page. I conclude that we won't have a solution that makes everyone equally happy.

<addison> ... won't have estimation that makes everyone happy

<addison> ... going back to completely ignoring?

<addison> ... at point where we cannot produce an optimal solution

<addison> ... have to make suboptimal

<addison> ... which one is a good middle way

<addison> ... and what is reasonable

<addison> ... the data that we have like for verifiable claims

<addison> ... comes from various databases

<addison> ... or human authoring

<addison> ... data is at beginning of process

<addison> ... not like we have base direction already and don't want to lose

<addison> ... for overwhelming use cases for RDF this issue doesn't arise

<addison> r12a: which issues?

<addison> ivan: "I extract data from form"

<addison> ... that's really a problem, but what's the percentage of rdf data that comes this way

Addison: We see a lot of data produced in this way… content coming from multiple places.

<addison> addison: is that true? not mostly dead datasources?

<Zakim> manu, you wanted to note that the direction feels clear to him, then.

<addison> manu: feel that we can knock some items off

<addison> ... -d and -x not a good option, not right

<addison> ... thing that convinced me was addison saying lang info is accurate and sometimes dir is

<addison> ... ensure that we solve long term

<addison> ... provide tooling for right i18n decisions

<addison> ... atomizing info so not co-mingled in a later-damaging way

<addison> manu: way done before was to separate

<addison> ... defer to the folks in i18n

<addison> ... don't fix with LocalizableString

<addison> ... fix LangString

<addison> ... fix implementations first, okay if takes years

<addison> ... fix json/json-ld/etc.

<addison> ... express direction separate from language and uniform

<addison> ... so that would mean in json-ld, e.g. verifiable credentials

<addison> ... use value/language/direction, 3 things you put together

<addison> ... put together in the same way

<addison> ... benefit further to align with HTML, using 'lang' and 'dir'

<addison> ... aliasing might be done in json-ld

<addison> ... simple use cases yes, more complete use cases need markup

<addison> ... workable, years to update rdf

<addison> ... but do impl in short time

[I think Manu is jumping the agenda, but I agree, with a proposal for some more intermediate guidance…]

<addison> r12a: there has been a strong tendency to keep the info in the language tag

<addison> ivan: there are cases where we disagree

<addison> ... for coming several years is that json/json-ld world would be separated from rdf

<addison> ... many not care, but I do

<addison> ... syntactically speaking, we can put in json-ld

<addison> ... when put into rdf, ignored

<addison> ... ideal is to fix rdf

<addison> ... in the meantime this is a hack I don't like

<addison> ... rdf dataset not same example

<addison> ... and there is no rdf wg currently

<addison> ... could last several years

<addison> ... not a good thing

<addison> ivan: influence community, in meantime rely on bcp47

<addison> r12a: looking at 2 possibilities

<addison> ... 1 is change lang string

<addison> ... 2 use bcp47 language info

<r12a> https://github.com/w3c/rdf-dir-literal/issues/7#issuecomment-498755489

<addison> ... more recent discussion linke here

<addison> r12a: not sure bcp47 language tags are that easy to use

<addison> ... unless every time they have a script tag

<addison> addison: (interjecting) grumpy if we put script subtags by fiat

<addison> r12a: putting -d or -x putting script info into tag

<addison> ... problem is, my original concern when writing string-meta

<addison> ... use the script subtag to key off

<addison> ... produce a formalism to use script

<addison> ... addison, you changed that

<addison> ... can detect from normal bcp47 tag, but don't think that's so easy

<addison> ... many many languages you need to check

<addison> ... it's complicated

<addison> ... if you had a rule, you'd need to inspect the language every time to determine dir

<addison> ... not as straightforward as first strong

<addison> ... only need direction if needed

<addison> r12a: 1. not as straightforward to use bcp47

<addison> ... 2. if we add scripts (and that's what mark is suggesting)

<addison> ... have to have cutoff point for languages in/out

<addison> chaals: so want to test propositions

<addison> ... not clear that problem statement is described

<addison> ... vast majority of content direction is obvious

<addison> ... first character tells you

<addison> ... there are exceptions within vast majority

<addison> ... there are cases where script and sometimes language

<addison> ... are mixed

<addison> ... the first thing that appears might not be the semantically dominant direction

<addison> ... expect majority of these are mixed script

<addison> ... and numbers

<addison> ... would like to support manu's proposal

<addison> ... push rdf hard to fix this

<addison> ... none of us control turning up an rdf wg

<addison> ... don't know how fast we can fix this probem

<addison> ... my sense from experience on AB and W3M, would get sympathetic hearing

<addison> ... for getting concrete poposal to change spec, hard part is working with implementers

<addison> ... think we should be taking that path

<addison> ... believe that for mixed script cases

<addison> ... is correct and adds nothing new if you add a language tag

<addison> ... or language+script

<addison> ... to assert a direction

<addison> ... not infallible but something that can be done

<addison> ... shouldn't rely on everywhere, but makes things easier in imperfect world

<Zakim> manu, you wanted to note that the JSON-LD world deviated from RDF for a while before (RDF Datasets...)

<addison> chaals: magick extensions to BCP47 for direction not a good idea

<addison> manu: +1 to chaals

<addison> ... modified json-ld before rdf to push rdf

<r12a> chaals, you can't mix FS heuristics with language information if lang information has precedence, but you do want language information on all strings, so you'd never do FS heuristics

<addison> ... at that time it was controversial

<addison> ... did happen after vigorous debate

<addison> ... needed something to start snowball down mountain

<addison> ... it's a different use case, but same general idea

<addison> ... two choices; one is a hack and second is good long term solution

<addison> ... the hardest part is getting implementation up to date

<addison> ... then rdf spec won't match reality

[if you don't *have* language information - even though you want it - you might well keep an FS-heuristic routine around for the cases you come up against]

<addison> ... hopefully an easy way to get w3c process to generate those docs

<addison> ... use cases and requirements should be our goal

<addison> ... need to switch reality to solving the right way

<addison> ... and let specs catch up

<addison> ... from practical standpoint

<r12a> [chaals, that's not what i meant - maybe i should join the queue]

<addison> ... if we went down this direction/path

<addison> ... one VC spec would defer to string-meta

<addison> ... draft langauge in string-meta

<addison> ... this design pattern works

<addison> ... json-ld would need a direction keywrod

<addison> ... and luckily there is a WG

<addison> ... that's the second hard thing.

<addison> ... pointing to string-meta is easy

<addison> ... json-ld achievable in a month or two

<addison> ... third thing is dataset normalization

<addison> pchampin: start with question for manu

<addison> ... when mentioning dataset signature

<addison> ... based on nquads

<addison> ... considering extending to put direction metadata

<addison> ... so that wouldn't be nquad per spec

<addison> manu: special kind only used for canonicalizing for digital signature

<addison> ... could pull into rdf if they decide to do this way

<addison> ... would be a special thing separate in rdf space

<addison> pchampin: think it is a good idea to put good practice in json-ld now

<addison> ... agree here that having separate direction attribute

<addison> ... have two options for rdf conversion

<addison> ... either lose information

<addison> ... or try to encode this information somehow in rdf

<addison> ... what I was going to propose; I think greg kellogg proposed

<addison> ... a bcp47 extension subtag as a temporary way

<addison> ... being very explicit that we'd deprecated as soon as RDF updated

<addison> ... think that would be a smooth path

<addison> ... cannot rely on temporary implementation to sign things

<addison> ... that is a valid concern

<addison> ... but would push on using temporary solution

<addison> ... the private -x solution

addison: Direction-setting? Are we homing in on something

<addison> ivan: think we have an agreement

<addison> ... not only extending langstring in rdf is ideal solution, but we should be working toward it

<addison> ... not sure how it will happen

<addison> ... will talk to ralph tomorrow

<addison> ... to see what seems to be quickest way of getting there

<addison> ... where we disagree

<addison> ... 1. more pessimistic than chaals in time it will take to get done

<addison> ... rdf concepts doc; update turtle, nquad, sparql

<addison> ... get rdf wg through AC when AC doesn't want to work on rdf

<addison> ... next question I think we disagree is what to do in meantime

<addison> ... think there is some disagreement

<addison> ... putting something in json-ld and then mapping in private use

<addison> ... once genie is out of bottle...

<pchampin> I agree this is a risk

<addison> ... and just losing in rdf is not attractive

<addison> ... we say that we are working on final solution; hope to get charter out there

<addison> ... help us do work, members

<addison> ivan: agree on what to do in the meantime

<addison> r12a: chaals, everything you said was what I said; agree

<addison> ... if we don't have direction, use first-strong

<addison> ... point I wanted to make was metadata, if you have it, always trumps heuristics

<addison> ... heuristics not always accurate; metadata exists to provide

<addison> r12a: think there are problems with using standard bcp string

<addison> ... if use private use, that says "this is a hack"

<addison> ... could use in a way where only use when needed

<addison> ... however still not great

<addison> ... better to have separate metadata

<addison> ... otherwise have to parse langauge strings

[where there is insufficient metadata, you expect to fail. If you have some heuristics that reduce your failure rate, go for it, but you still expect failures]

<Zakim> manu, you wanted to ask if losing directionality when converting to RDF / canonicalizing is a catastrophic thing? and to provide concrete path forward

<addison> manu: want to get to concrete list of things

<addison> ... verifiable credentials will point to string-meta for what right thing is

<addison> ... can provide language

<addison> ... that is going to presume that there will be a direction tag in json-ld at some ponit

<addison> ... can continue in json-ld

<addison> ... then canonicalization; that's like impl details

<addison> ... can resolve in json-ld

<addison> ... make decision on what we'll do, such as -x-dir

<addison> ... convinced that's wrong

<addison> ... or talk about other methods, nquads etc

<addison> ... come to solution that will preserve info

<addison> ... mechanism only thing up in the air?

<addison> ... would i18n be happy with VC?

<Zakim> chaals, you wanted to suggest we propose an RDF WG charter specifically scoped to solving this problem (in practice, if there are other obvious errata, they should be added)

<addison> chaals: same approach as manu

<addison> ... should put opinionated statements in specs, starting with VC

<addison> ... anticipate dir in json-ld

<addison> ... have to mark at risk, depends on getting langstring in rdf core done

<addison> ... clear about the status

<addison> ... might be experimental-yet-recommended

<addison> ... more valuable to start at bottom before propagating

<addison> ... if borken, get info quickly

<addison> ivan: don't like the "hard pushing" style

<addison> ... that is, antipating things will happen in json-ld

[also, I am less sceptical that Ivan about getting AC approval for a well-scoped RDF WG to do a concrete repair task…]

<addison> ... main point manu didn't say, before we do anything else we need to figure out if we can get a proper wg in rdf

<addison> ... happy to try

<addison> ... don't underestimate problems with getting rdf work approved

<addison> ... talk to ralph to gauge reaction

<addison> ... might need a draft charter

<addison> pchampin: when I proposed to use private extension

<addison> ... understand are not favorable to that solution

<addison> ... understand why

<addison> ... still thing we might carefully craft a transition path until have proper path

<addison> ... one way to contain, whenever you see '-x-dir' you have to parse as direction metadata

<addison> ... those -x extensions should only appar in rdf and should be converted to metadata

<addison> ... not propagated to html for example

[I am *very* skeptical of genies going back into bottles - it seems to be much harder than anyone ever thinks it will, and often seems not to happen after all]

<manu> I am also incredibly skeptical of that

<manu> Once you have a tool, people use it

<manu> Yes, exactly, let's do the right thing

<pchampin> until we provide them with a better tool :)

<manu> yes, but then we can never deprecate the old tool :)

<addison> ivan: I will talk to ralph

[I can start an RDF-WG charter proposal…]

<addison> ... have to go down usual process

[Think the normative text needs to be in Rec-track specs, and string-meta is a copy that shows what to do there]

<addison> manu: will put PR into string-meta

<addison> no follow up meeting

<addison> ivan: put dicussion in one place

<addison> ... have a separate mailing list?

[how about we just agree to make a concerted effort as individuals to make sure that this is documented and people know where conversations are happening?]

<addison> addison/richard to update string-meta as the explainer for this

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/06/05 14:18:13 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Succeeded: s/need this?/need this? Can't we get it from language information?/
Succeeded: s/utterings/authoring/
Succeeded: s/pct of rdf data/percentage of rdf data that comes this way/
Succeeded: s/ahere/here/
Succeeded: s/magick tags/magick extensions to BCP47 for direction/
Succeeded: s/ideal solution/ideal solution, but we should be working toward it/
Default Present: addison, chaals, ivan, r12a, manu
Present: addison chaals ivan r12a manu pchampin
Found Scribe: addison
Inferring ScribeNick: addison
Found Scribe: chaals
Inferring ScribeNick: chaals
Scribes: addison, chaals
ScribeNicks: addison, chaals
Agenda: https://lists.w3.org/Archives/Member/member-i18n-core/2019May/0040.html

WARNING: No date found!  Assuming today.  (Hint: Specify
the W3C IRC log URL, and the date will be determined from that.)
Or specify the date like this:
<dbooth> Date: 12 Sep 2002

People with action items: 

WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]