Annotation WG F2F, Sapporo, 1st day -- 25 Oct 2015

ivan: one of two staff contacts in the group. Also leading DPUB. One of the reasons I"m in this group is because digital publishing community has major use for annotations. Not at w3c, but ietf, already using first draft of annotation model, big use case.
... Also part of DPUB IG
... Other interest, Semantic Web Activity lead for 7 years.

clapierre: ... With Benetech, also in DPUB and co-chair of accessibility task force of that group. Interest in how to use annotations, for disabled community

azaroth: at Stanford University, one of the two chairs. In 2009 there were two projects: web annotations and humanities, and one focussed on science annotation, found out about each other, merge goals to start CG. In 2013 completed CG work and last year with Ivan and Doug's assitance we started WG. My interest is from that history, but also my academic background is in humanities, phd in medieval french. Imagine a non-scholar trying to read a french

manuscript, having annotations to describe whats' going on is important

Jeff Xu: from Rakuten/Kobo. Annotations important to share annotations with users, and between user and publisher

shepazu: We need to put in our use cases the idea of sharing annotations between users and publishers
... Doesn't immediately occur to people that annotations can be shared with publishers

ivan: Isn't it in DPUB use cases?

shepazu: two kinds ofo publishers. Of a blog/website. But also publisher of site where you're reading an ebook
... Need to make sure people understand two kinds of use cases

csarven: visiting student at MIT, may join W3C as well. PhD student at Bonn. Why I'm here relates to my research on scholarly publications and how to keep annotations around both from authors, reviewers, and any commenter on the web
... Try not to make major distinctions between them other than roles. Been keeping an eye on this WG on mailing list and github. Overlapping interests with DPUB that I'm also trying to follow. Also in SocialWG. All these things overlapping.

Eric Mannens: From iMinds. Semantics group there. Been with W3C since 2006

scribe: Primarily involved in semantics and media fragments
... Afterwards in Prov WG, and now in annotations and publishing because we have two big projects on that
... One Flemish one with publishers, there, and a European one called (?)
... with Felix who will join later
... Guys from my team in this group. Open source framework to publish ebooks, completely based on EPUB3 and HTML5
... Definitely going to implement your spec as one of the reference implementations
... I always have to look for new stuff for my team, so will be in and out today

rhiaro_: Amy Guy, University of Edinburgh PhD student and visiting student at MIT, in SocialWG

miyazaki: from Japan Broadcasting Corporation. First time at TPAC, observer in this meeting. Research Engineer, in charge of constructing RDF database of TV programmes
... Very interested in semantic web technology and social media
... Interested in how to handle people's review about TV programmes
... How to structure and so on

takeshi: From Sony, Japan. Sony has released device named digital paper, so you can take notes like you are writing on the paper on the device
... It is based on PDF. My motivation is to replace that with web
... I have implemented a prototype into the device
... Can't bring the device
... Also spend many years in ebook industry. Also contributed to epub format. And printing industry background.

shepazu: Rob failed to mention that he's the editor of two for the specs from this WG
... Web Annotation data model spec, and also Web Annotation protocol spec, which is based on LDP
... Basically the notions of how to publish annotations to different servers, write API for the web
... Might be useful for us to go through individual items in out charter.
... I'm Doug Schepers, staff contact for this group. Instigator of larger idea of web annotations beyond data model stuff in CG. Bring together group that solves lots of different parts of the problem.
... Also staff contact for SVG WG and Accessibility. Also Web Audio API WG. Touch Events WG. Web Payments WG.
... Appreciate everyone showing up, we should have more people later.

azaroth: Thanks everyone. Until we are more familiar, just say who you are and/or your handle on IRC

ivan: we are expecting one more person in half an hour

shepazu: I meant to mention, I'm also the editor of one of the specs - Find Text API that was just published as FPWD
... Any questions or comments, feel free to ask

azaroth: One note about the WG - we are public, all communication is done publicly
... Try to be somewhat less formal than other WGs and just roll with it and see how things go
... Try to make most appropriate use of time and conversations
... Might be slower, but we expect deliverables will be better

<azaroth> Agenda: https://www.w3.org/annotation/wiki/Meetings#Monday_26_October

Agenda review

azaroth: Today the majority of the agenda is focussed around client APIs. First FindText work that Doug has been working on
... We have a joint meeting with Web Platform about that at 1130
... Before that discussion particularly about i18n
... After lunch, a meeting with Felix around translation
... Then after that, we have to have tests in place for all of our work. Testing an abstract data model is somewhat complex. We have an IE, Chris Berg, who is going to be leading testing.
... But if we could discuss how we want to go about testing for all of the different APIs and models and so on, we could make some good progress
... A note from the programme, the break is actually between 3 and 4, but that's when we're meeting with DPUB
... So not sure of exact time, but we have joint meeting with DPUB, particularly around use cases
... I'm editor of DPUB note on annotation use cases, which can feed into this group
... Some blank time, probably that will get taken up with discussions
... Towards the end of the day we want to work on next steps for the client APIs
... The charter has a broad pool of client side API deliverables
... Essentially says create some specifications that help browsers to create and deliver annotations
... The FindText API is one of those, but there may be others that can be worked onw ithin that scope
... We do have the beginnings of a second one called DOM Annotations, but after some initial work things stalled a little bit
... Would be good to discuss what would be useful, and who we might be able to reach out to help us
... Wrap up at the end of the day. Any questions or thoughts about today's agenda?

ivan: Want to talk about URIs sometime?

shepazu: While we're talking about items in charter
... Also could during FindText

ivan: We shoudl put it on the table as something we plan to do, important for DPUB

azaroth: at least discuss it before lunch

ivan: the agenda now says as if the FindText goes into i18n, but maybe it's worth for people who are not familiar to have 10 minutes intro to FindText

bigbluehat: Benjamin Young with Hypothes.is
... Hypothes.is is a nonprofit working to bring web anntotation back to the web. Offer a browser extension and bookmarklet, and embed for publishers. BSD licence, Python, angularjs
... I'm coeditor of data model spec now

azaroth: just discussing what else needs to be on the agenda for today
... Agenda for tomorrow. Focussed around data model and protocol
... Starting with protocol because that's more important to make progress on. CG gave us a headstart with the model, and because we have a bunch of people aligned with SocialWG
... 3 parts ot protocol. REST (CRUD), is built on top of LDP. Also for the SocialWG we want to use AS2 Collections and Pages to be able to break up the response
... Two areas we have made less progress, notifications from one system to another than annotation has been created, modified or deleted
... We hope work in SocialWG will help us there
... last TPAC we had a good conversation around using AS2

shepazu: Seems like a natural mechanism

azaroth: After lunch, third part of protocol is search
... If you have millions of annotations across all resources on the web, how do you find those you're interested in
... I have some ideas around that which I was writing up on the plane
... The model, alignment with SocialWG
... Less around here are some new features that are annotations, more here's what is settling on social side, and what we're settling on, and how they can work together
... After the break, continuing to work on further features if we still have energy
... Next steps, how far along with deliverables we are, who we need help from to get there

ivan: Also decide whether we need another f2f, let's not leave that to the last minute

azaroth: unscheduled meeting that we should try to schedule is to talk with TAG about protocol issues
... Erik Wilde brought up some concerns around how protocol works, most of which are derived from LDP
... So given that LDP is a full recommendation, it's a little bit problematic to say we find issues with it
... But we want to be as valuable as possible

ivan: I wouldn't think you want to go there, if Erik has problems with LDP we shouldn't be the ones playing for Erik, it's not our role

shepazu: He had one thing that was not specific to LDP, which was we are saying that something is an annotation server, as opposed to a generic server, and he thought that was not a good design choice

bigbluehat: also our use of server singular instead of servers, as LDP can be spread across multiple machines
... Editorial tweak on our part. To say URLs can be all over on the web. As long as your credentialing will let you move across machines,
... Initially saying annotation client and annotation server makes it sound like a two piece things, Just some clarification

shepazu: You can say that about any LDP application

bigbluehat: we can clarify what distinguishes an annotations one
... Just Link Headers

shepazu: Ralph is domain lead of domain under which this group operates, information and knowledge domain

Overview of work to date

azaroth: WG has six deliverables
... first being a data model for annotations, which we now have a second draft of
... Derived from CG and then has been discussed thoroughly within WG
... Some of the areas that have changed around how it interacts with other specs, eg. some of the specs that the CG used were not full recs, so we can't refer to them normatively from rec, so we needed to remove it
... Over the last few months we've talked about how to have multiple roles, one for each resource used within the annotation
... Tied to the data model is a vocabulary for describing the data model
... Vocabulary is in RDF
... Important to note that we rely heavily on JSON-LD as a way to have the RDF graph model be something that is understandable and implementable by people who do not have a full RDF stack
... One of our main driving principles is that the results should be useable without relying on RDF specific technology. You should be able to write JS in browser and work with JSON that comes back from the server. Time will tell how successful we are with that
... Something we're trying to keep in mind
... The model is a graph based model, using RDF, but the way we expect most people interact with it is via a specific JSON-LD serialization

ivan: is it the intention that the vocabulary will be published as a separate document?

azaroth: at the moment the vocabulary, the serialization and the data model are all rolled together into the data model spec, Annotation-Model
... There has been some limited discussion about having multiple documents, one for model, vocab, and serialization
... Tradeoffs have been that having multiple documents means you need to read multiple documents, with lots of references between them that gets complex
... But if it's all in one, it's more complicated for people who just want to see certain examples
... Bit of a pedagogical issue, rather than a technical one

shepazu: My intention is that the serialization would not be a single spec, but rather a set of specs
... Eg. as HTML, or as exif data in an image. Different ways of portraying the same data that would map back to the same terminology

azaroth: at the moment we've been focussing on JSON-LD, but there likely will be other serializations
... First three, still work, but reasonably well
... Fourth is protocol, how do you transfer annotations from client to server or server to server
... based around LDP, also hopefully Collections from AS2
... Five and six are closely related. Client side API and robust linking anchoring
... Client side API helps browser or user agent create and consume annotations once they have them via the protocol or some event
... So the current work is around the FindText API (previously rangefinder) which allows you to do find in page iwth a bunch of additional cool features

shepazu: fuzzy matching
... defines a set of parameters around which you can do fuzzy matching.
... Robust Link Anchoring is a more complex topic. The first spec as Rob said is the FindText API and that deals simply with text
... BUt if you were annotation eg. an image, there should be a way of getting at a particular part of an image, FindText does not deal with that but robust link anchoring does
... Once you ahve the FindText API, that opens up the door to having a URL scheme that using fragment ids you can say...
... Say that you have a selection of text that you want to search for
... Say it's repetitive, song lyrics for example
... So you want to say, even though this particular text appears three times, I want the third instance specifically
... So in addition to saying this specific string, you also say these are the 32 characters before and after, the prefix and the suffix
... Given those three things, prefix, suffix and selection, you can have a URL that says # something
... haven't decided how to do it yet. Browser takes parameters and finds the instance you're looking for
... If you wanted to select a passage and send a link to a friend, you can send a URL and your friend's browser takes them to the exact place
... It's not the most elegant but we can't think of a more elegant way
... Obviously once you have those things, you can use those for annotations

azaroth: at a slightly higher level, the robust link anchoring topic is, given a resource
... how do I get the representation that I want
... and how do I get the bit that I'm talking about
... Issue around dynamic pages. Eg. js app, makes dynamic changes to page. You annotate something, what information does the client need to reconstruct the state of the page to make the annotation make sense

ivan: at some point we should look at these six, and plan what is realistic and what is not in the coming year
... FindText is great, but personally I don't believe that we will have the time and energy to do anything else under the 6th point

shepazu: not sure I agree, but okay

ivan: that's my opinion. Same for serializations. I don't see us doing everything needed for rec - spec, testing, etc - within less than 1 year
... so we have to be realistic about what we can achieve
... maybe we should say that for certain entries here, we propose an extension or a new WG, but we have to be realisitic. We should try to find some time to discuss that.

shepazu: one last thing about robust anchoring
... We talked about it largely in terms of text and images, but this also applies to media resources. Using media fragments for example to get a particular point at a video
... You can also include a particular location in a video
... All things that can and should be annotable using the data model
... How the robust anchoring links with the data model, it stores all the individual things as parameters
... For example for text, the selection, prefix and suffix, maybe some other bits, those can be recomposed into a URL, but in the annotation they're stored as individual pieces

ivan: may require some adjustment between the two
... the current selectors we have in the document may not cover all the things the FindText API can do
... we may need to push additional terms into the data model

bigbluehat: is robust anchoring the ability to re-anchor across media types?

ivan: I think the idea is that if you get an annotation with a target uri, and somebody changes the text, you could still find the text. Robust against change of the media.

azaroth: the exact change is not well defined. For example if you have a resource that does conneg for plain text, html, pdf, the URI would be the same and the text is there, but the content negotiatble representations, one annotation should be able to re-anchored across all of those representations. OR is it for specific representations

bigbluehat: that definitely needs clarifying
... The scenario that hypothes.is have, is publishing as html, epub, pdf
... want annotations across all of them
... Textually the ranges are the same, but scenarios of anchoring them are pretty different

Resolve minutes of last meeting

<azaroth> proposed RESOLUTION: Minutes of last call are approved: http://www.w3.org/2015/10/21-annotation-minutes.html

azaroth: any objections?

RESOLUTION: Minutes of last call are approved: http://www.w3.org/2015/10/21-annotation-minutes.html

FindText API

<shepazu> http://w3c.github.io/findtext/

shepazu: Editor's Draft ^
... Has latest changes since publication
... Published as FPWD last week
... A little unusual in that we have a liaison in our charter with the WebApps WG to publish this document together
... In the time I was working on it, there were plans to merge WebApps with HTML WG to form Web Platoform WG
... We put out a cfc for the FPWD of FindText
... And from the time the cfc started to the time it ended the new WG launched, so through some quirk of fate this is now published by WebAnnotations and is the first spec published by Web Platform WG
... Web Platform is working on all of the big clientside APIs, plus HTML
... So it's good that we have their attention. I talked informally to somebody from apple who works on safari, and today I bumped into somebody from MS who works on Edge (replacement for IE)
... Both of them said that so far as they could tell without having looked at it too deeply they thought that FindText seemed like a good idea and they're interested in implementing it
... That would be fabulous, and get the WG the attention of the use case that we're trying to do
... While not diminishing the other things, the anchoring that is enabled by FindText, along with the data model, those parts are the core of annotations
... THe publishing stuff is all useful, but those two pieces are the core, if we can get attention for those two pieces we are in very good shape
... We also got the attention of the i18n WG
... Any time you're working with text you need to make sure it's internationalized
... about a year ago the i18n WG started working on a spec called charmodnorm
... character model for the web normalization
... worked on by Addison Philips(?) Amazon
... solves so many of the problems we should have run into, we don't have to translate unicode stuff, already a spec for this, timing really fortunate
... and the fact we were working on FindText got them interested

ivan: that document is a note or rec to be? Timing?

shepazu: Rec-to-be. Don't know about timing. Probably hand in hand with FindText

ivan: otherwise we run into stupid administrative issues

shepazu: they raised several issues on github
... those issues I've started resolving them, some are easy some more tricky, all them are about my own ignorance about i18n
... Just educating myself about the right way to approach a problem

<azaroth> Github Issues link: https://github.com/w3c/findtext/issues

shepazu: There will be a process of negotiation between us and i18n about which parts of defining text search in FindText and which CharModNorm
... CharModNorm applies search to broader set of resources
... FindText specifically developer API, and beyond i18n because there are things around edit distance
... That's the background of this thing. Seems like it's goign to get some momentum. Might change dramatically, but the barebones are here.
... Is anybody interested in hearing how this api works?
... I'll briefly tell you
... Three ways you can provide feedback on the spec
... Either send an email to the mailing list
... public-annotation
... file a bug on github
... Or leave an annotation directly on the spec
... Make an account, select some text and leave annotation
... They're sent to mailing list
... API has several parameters. Pass them in as a JSON object
... Example. Here's a poem, selected because it has the words 'rage rage' several times. So how would you find the fourth instance of 'rage rage'
... EXAMPLE 1 ... pass in string to FindText
... call searchAll()
... find third match if you're looking for third instance

<scribe> New arrivals: Richard Ischida (r12a), Dave Clarke, Felix Sasaki (fsasaki)

shepazu: Here's another example of a search that will find that string
... Intialize FindText object with thsi JSON object, with text and prefix
... This is the specific selection that we're looking for
... So, the kind of parameters you can have
... text and textDistance
... Edit distance is an algorithmic way of saying how two words are related mathematically
... eg. dog -> fog, edit distance is one, have to change one letter
... fog -> frog, have to add a character, so edit distance is one
... edit distance dog -> frog, change and add, = 2
... when you're talking about typos.. on a string this small this is significant. When you're talking about longer strings it becomes more useful
... when you're talking about typos and they miss one letter
... still robust
... you can still match, especially when you have prefix and suffix
... turns out to be a very efficient way of searching for differences
... edit distance is absolute number of changes
... if I say I want an edit distance of one, that means I will allow one change, doesn't matter length of string
... Quite likely if you didn't find match on first pass with FindText API you might increase edit distance until you find a match
... once you get to a distdistance of 15-20% it's very likely this thing doesn't exist in this document any more
... but first you try to have robust anchoring
... selection is the target text you're looking for
... textDistance is edit distance
... prefix and suffix, both of which have edit distance
... scope is an element that says the content I'm looking for must be within this element

ivan: DOM element?

shepazu: yes
... DOM API, to operate on webpages
... So let's say that I make a webapp and my webapp is a text editing app
... So I have an editing area and a bunch of words in there. I have the world 'file'
... and in the UI for my app I have a bunch of menu options, and one of them is 'file'
... so if I want to search this document, if I want my users to be able to search this document, I don't want them to find the UI instance of the word file
... the thing within the content area
... eg. google docs gives you its own find dialog
... another use case is that you might say I know it's in this chapter, which is represented by this element
... can be used to make more efficient searches

takeshi multiple elements?

shepazu: no, you should set parent
... range says where should I start this search
... similar to scope but different use case
... caseFolding, unicodeNormalization, set of choices
... wrap, do you want to wrap around the document
... so if you start from a start position do you want to go all the way around. Maybe not necessary.
... The other stuff is all related to the search operation itself
... The way it works, turn the entire document into a string, normalize it, collapse the white space, search on this long string that is the text of the document
... Once you find a candidate match, you return it as a range, where is it in the DOM
... Not simply where is it in the text
... Allows you to treat element boundaries.. you ignore element boundaries when you're doing a FindText API search
... DOM API that operates on text, and returns DOM range
... A range may span multiple elements
... That's basically how the API works
... an algorithm of how it operates, not for implementation, just explanation of results
... Finally, the notion that you would have this URL syntax, each of these parameters is something you would set in this URL syntax
... Each URL is effectively a findText operation

Jeff_Xu: based on text structure?

shepazu: yes

Jeff_Xu: in html structure, if there's some element in front of keyword, but moved to somewhere else with CSS..?

shepazu: doesn't account for that

ivan: if you generate content by CSS, you don't find it

shepazu: there is discussion now that generated content in CSS should also be accessible
... generating content should be treated as some part of the object model, whether DOM or some higher level, should be serialized as part of it
... however we solve it for accessibility we should solve it as that case

azaroth: will make github issue to track that

shepazu: need to transfer issues from spec to github
... Some of the issues on spec are me thinking out loud

ivan: css stuff is a good question

shepazu: occurred to me before, but not resolved yet
... there's generated content, but also the css has moved the text to appear to be in another part of the document. Nothing to be done for that.
... the nice thing, even if the text is not there, the rendering of the text is different than the DOM order of the text, it will always be that way
... so when you come back to the document, it will still consider that to be part of the document in that order

ivan: one thing to make note of, we may want to talk to CSS people
... (?) project that has started, to try to open what rendering engine does
... may be that we have another version of findText that works on the CSS object model that they produce
... which takes care of these rearrangements

shepazu: needs to be dealt with. Not the only thing we need to talk about with CSS
... also, once you have that range, how can you style it
... once you find the result, how you highlight it once you have the result
... one thing you'd do today is take the range, surround with span with class

ivan: doesn't always work
... the range spans over two paragraphs

shepazu: have to chunk it. It's ugly. If you have a hundred annotations you don't want to do that
... need to be able to style ranges arbitrarily
... outside scope of this, inside scope of robust anchoring discussion

clapierre: do we have to worry about aria hidden role?

shepazu: could ask same question about visibility
... same class of question

clapierre: css visibility is not visible in the DOM

shepazu: Display none is not
... we need to deal with all of that
... there was an issue that was raised that said.. when I talk about that I say when you serialize HTML DOM into text, I suggest using the serialization in the DOM4 spec, somebody gave feedback that said no you should use one of these other serialization methods
... maybe one of those others will deal with it. All good points.

ivan: we discussed with webform guys, small issue about promises

Internationalization

azaroth: welcome to folks in i18n WG
... how would you like to go through issues?

shepazu: before we got to individual issues, meta question to mailing list
... You guys decided to file github issues, which is fine. In addition to that, PRs also welcome
... You can just fix my spec and send an email or describe it in PR
... chances are unless there's a fundamental disagreement I'll simply take a PR

Richard Ishida: we prefer github, much easier to handle conversations

scribe: what we'd also like to do is get a tag that says i18n that we can attach to issues so we can track and get notified

ivan: adding a label to the issues? easy to do

shepazu: issue 4, one of the more complex ones
... didn't know how to handle it in the spec
... the issue is, in the character counts of ranges should they be unicode code points or graphemes or whatever
... I think that you guys were suggesting unicode code points as your preference?

r12a: In javascript, if you have a.. you know supplementary characters?
... unicode can encode around a million code points, there are 65536 slots in the basic multilingual plane
... utf16 you only need 2 bytes for each of those
... if you go above that for some of the newer characters then you need 4 bytes to encode them in utf16
... that is two code units
... in javascript doesn't know how to handle the higher level characters very well
... so you end up with two things that are not actually code points
... it shouldn't be like that, it should be a single code point unit
... that leads to this question about whether we should do code unites or code points
... I think that you should do code points

ivan: is it hte same in ECMAscript6?

shepazu: same question
... I believe they have a way of dealing with this in ECMAscript6

r12a: I believe, but not up to date

ivan: we may want ot say we rely on ECMAscript6

takeshi: current model

shepazu: I think they add capability to deal with code points

takeshi: add capabaility for additional characters (..?..)

shepazu: I think you can deal with it in other ways not just regex
... I think this is an i18n issue, so we should do unicode code points - that's yoru recommendation?

r12a: that's mine, not necessarily...

shepazu: you guys should tell us how to do it

r12a: the other question was about graphene clusters
... in unicode you can encode e with acute accent as a single character, or individually
... supposed to be equivalent, but perceived by user to be a single character
... what's perceived to be a single unit of text is potentially much more complicated
... could be two or three characters
... but there's this concept of graphene cluster which is used for editing process maybe a delete would delete 3 characters instead of one
... grapheme cluster boundaries intead of code point boundaries
... I don't think you should specify this in terms of grapheme clusters, partly because they don't solve all the problems at the moment. Maybe in ambiguous to users what's happening

shepazu: serialization and normalization should take care of that?

r12a: no, a grapheme cluster is how human perceives clusters

ivan: UX.. when they select on screen, what to they select

r12a: it varies

ivan: if you select something in web browsers, what do they select?

r12a: i think it still varies
... I don't know the details

shepazu: we should test that

ivan: what we should do is whatever the web browers do

bigbluehat: we should spec it based on the dominant case inb rowsers and say this is what we want going forwards

r12a: if all the browsers agree but are all doing it the wrong way...

shepazu: I'm sympathetic to both of those positions, we should do what's possible, we have the issue, we shoudl test it and find out if theyr'e doing the right thing or if there is a right thing. We should mov eon.
... issue 5, avoid listing whitespace characters
... we can change definition of whitespace character
... issue 6, let people say case folding or not
... params none, ascii, unicode, language-sensitive
... they say we should say what language
... I think it should be the language of the docuemnt
... the api doesn't need to say that, the document already says that
... the algorithm should include that information, but not necessarily a parameter
... I know there can be mixed language docuemnts
... I think that shoudl be dealt with in the algorithm
... Not necessarily as a spearate parameter
... I'm okay with the spec dealing with it, not the API

r12a: the qeustion is, if you want to search for a particular word in a document, you maybe be able to get the computed language of the docuemnt text, but how do you get the search text?

shepazu: the document says what language it's in, and it knows when it's serializing it it knows when it should be doing case folding, so when it serializes into string it knows how to perform the operation
... not just document based

r12a: that's for the text in the document, but if a user types in in a field their search text..

shepazu: that's a UI decision

bigbluehat: the api needs to know what it's being given

r12a: are you looking for a turkish word?

shepazu: maybe we should be looking at it... afraid to make the api larger and more complex. Need to make sure we need it.

ivan: There's one.. not necessarily only capitlisation, but in some cases when I search French, I can type in the search term without the accent and it will find the relevant french term without the accents
... so the found term might differ from the search term only by the missing accents
... which is not necessarily same as editing distance

r12a: that's in another issue

<annbass> (this is how I figure out where the appropriate accents go, on French words!)

shepazu: issue 10, option to have ascii case folding is not useful. I'm fine with that.
... issue 7 is you'd like us to use charmodnorm as normative reference for ... I'll ask for clarification in the issue
... I'm find with referencing charmodnorm
... issue 8, four parameters, including canconica, compabitibilty and all
... I included all because of my ignorance... It hink we need back and forth so I understand the issue and we'll resolve it
... This is for unicode normalization
... there are multiple kinds
... Before I didn't have anything about that, I just picked one. These guys think that I should.
... Still don't know if this is something browsers are wililng to do
... I included it cos i18n said I should, we'll have back and forth
... issue 10, ascii case folding option is not useful
... that's fine
... when I was reading charmodnorm it seems like it would be useful

r12a: charmodnorm addresses two different scnarios, element names and markup, and the other is natural language coment, th estuff people actually read
... ascii matching is useful for thing slike css identifiers
... typically css doesn't concern itself with ascii case, but you can't extend that through other languages tahn english
... so it's only in the case where you're talking about syntactic content that ascii casefolding is really useful
... never useful in natural language processing

shepazu: I'm fine with that

ivan: we may want to search in html content that includes a pre with javascript content
... for those you fall back to the ascii
... you're not in natural language

shepazu: I don't think you will have aconflict

azaroth: a high level question, what would the default be? just unicode?
... what would be specified?
... if the text was ascii and you wanted to have casefolding, for example searching all lowercase, you owuld say casefolding is unicode rather than (?)
... it's not that all casefolidng would go away, just that that option would go away
... unicode is a superset of ascii therefore we don't need ascii

shepazu: we can drill into issues later
... issue 11, unicode equivalent type not all clear
... I added all, you guys informed me there's already a way of doing most promiscious match

r12a: there maybe additional issues

shepazu: raise as other issues
... issue 12, order of case fold and normalizaiton in algorithm, should reverse, I'll change that
... the algorithm is going to change dramtically anyway
... the last one is issue 13, several different ones, I'd rather you broke these out into multiple issues

<fsasaki> [just for the minutes as background for the case folding discussion: unicode provides a case property http://unicode.org/faq/casemap_charprop.html and that is for all characters not just the ascii set]

shepazu: One of them is the oe for german transliteration
... We need to break down into several things I can understand

r12a: there are all sorts of things you can do to match text. Ignoring accents is a common one. THere are other things depending on language
... SOme might be syntactic stuff like fights vs fight
... Recognising grammatical language specific differences
... it hought we had that in charmodnorm already but we don't
... we don't know all the answers yet, but know that there are a lot more

clapierre: includes punctuation?

<fsasaki> http://www.w3.org/TR/xpath-full-text-10/#ftmatchoptions

shepazu: edit distance would handle a lot of things like that

fsasaki: maybe you're aware of xpath full text specification
... various options related to language, stemming, useful to look at as background

shepazu: I do have to say that as a design decision we want to keep this as simple as possible but no simpler
... Not sure how much the browser vendors will be willing to implement, two different normalization algorithms, we should dtalk to them to find out

r12a: what's important is some of these things like dia(?) stripping(?)
... important for users
... normalization stuff people don't normally know about
... some things people know that they want to do this type of search

shepazu: there should be an option to strip out all dia(?)tics

azaroth: now we relocate
... 1f middle hall a

WebPlatform meeting, then lunch

ITS 2.0 - Internationalization Tag Set

<fsasaki> http://www.w3.org/2015/10/its2-and-web-annotation/?full#1

tag set often used to state something should be translated or not

Translate, Text Analysis, also includes system for annotating parts of text as language components

showing examples of HTML and XML usage

translate is from HTML itself

its-ta-* are from ITS

growing interest in using these types of annotations in other formats

RDF, JSON, etc

ITS 2.0 ontology exists

translate attribute becomes an RDF property (for example)

for JSON there is an `annotations` key which contains the ITS annotations

JSON ITS interest is from the localization community primarily

what role could the Web Annotation model have to fulfill these requirements?

what things can Web Annotation use from ITS?

could Web Annotation benefit from the data types in ITS?

it is not the purpose of todays discussion to file issues for Web Annotation, just to inform the group of the ITS efforts with regards to annotations

it is possible via ITS to have the annotations separate from the original content

via Natural Language Processing Interchange Format (NIF)

NIF developed by various EU projects

there are related APIs by these projects--for example FREME API

@types examples include `nif:String` `nif:Context`

methods for text selection (`anchorOf`, `beginIndex`, `endIndex`)

also has a "confidence" rating generated by the tool

@id is a URI which includes a #char=27,30 (for example) fragment selector

perhaps there are general mechanisms that can be shared

interest in feedback on the ITS work from this WG

ivan: I would like to go back to the markup examples
... from the annotation model point of view these examples show one key discrepency
... in the Web Annotation model there must be an identifier of the separate content
... in the examples, the annotations are inline, they do not have identifiers, they do not have targets
... that being said this markup approach has many usages
... so do we want to deal with scenarios where there is no target?
... should we have annotations with no target identifiers (i.e. the target is a blank node)
... there do seem to be scenarios where this would be useful
... in the CSV working group, the targeting is useful
... but I can be forced into a redundancy
... I would have to make an identifier that is circular
... that may be an aspect worth considering

fsasaki: the markup example can not be easily expressed in Web Annotation?

ivan: it can be anchored via our selectors
... but when you want to use JSON (etc), then you force the developer to use redundant data

azaroth: the higher level point would be what use cases are there where the information is being provided by someone other than the content provider?
... of course if you own the content, you can put in whatever sort of markup you want
... but that scenario is less about what we're building
... which is more about content the person annotating does not control
... does ITS have these use cases?

fsasaki: yes. that's also why exploring Web Annotation is interesting
... someone produces HTML content, and someone else wants to enrich it with separate language related content
... if I were to put all that content into the HTML, then it would be overload
... there can also be overlaps with what's created
... there's no demand for hierarchies
... it also works if you do not own the content, but only export it for translation
... scenarios where lots of people need to look into the content

azaroth: so in the translate case...
... is this the translation? or is this just related to the act of translation?
... here is something that could be translated?
... vs. something that points to the translation?

fsasaki: this could also relate to company policy and process---this should be translated, this should be translated

azaroth: so the target would be the W3C in this case?

ivan: not necessarily, it's the specific found instance of the word
... which is where FindText API could come in

azaroth: the body of the annotation would state what is and is not annotatable

ivan: yes. that sounds right

fsasaki: it would be provided with ITS as additional RDF triples

azaroth: in the current working graph, it would need to be separate from the Annotation expression, but it is possible

ivan: minting a URI for the annotation itself seems unnecessary in the ITS use cases

azaroth: a URI for the annotation is a SHOULD
... the more interest point here is the NIF URI pattern
... it tries to encode quite a bit of data

fsasaki: it's actually changed. it's also separate as `beginIndex` and `endIndex`
... there's also a `wasConvertedFrom` which explains where the content came from--HTML, etc.

azaroth: from my perspective, it looks like it can all be expressed in Web Annotation currently
... what we don't currently have is confidence score

fsasaki: this `taConfidence` is ITS specific
... there's also a place to put a URI for the actual tool that produced the ITS annotations

shepazu: there used to be a confidence score in an earlier version of the FindText API
... I took it out because it was application specific
... by which I mean use case specific

fsasaki: `taConfidence` is about "text analysis confidence"
... there's also provenance information which can be stored
... originally we'd wanted confidence statements everywhere, but they can prove pointless without provenance info

azaroth: it seems like a useful exercise to pick some selected examples
... and see what's missing in Web Annotation to express ITS content
... and then for us (WG & ITS folks) to see if that expression is useful outside of the NIF context
... if so, then we may want to change the model---otherwise, they could be presented as available extensions for those unique use cases

fsasaki: there's an upcoming meeting in November--a hackathon--where these would be useful

azaroth: yeah, that would be great
... canonical examples of all the features you'd like to discuss
... we can iterate those from there

fsasaki: adding these items to the body seems to make sense
... ITS can be used without NIF
... and it may be that those make more sense in Web Annotation--perhaps as an extension
... NIF is just one use case for ITS annotations

azaroth: these examples may also prove useful for analyzing our "motivations" / "roles" list

<fsasaki> http://www.w3.org/TR/its20/#basic-concepts-datacategories

azaroth: are there examples of ITS and NIF expressions that you could link us too?
... for example. we have an "identifying" motivation, but not a "translating" one
... some alignment there would be helpful
... from the DPUB side, we also have some areas where this would be useful
... if you had one example per category and what they'd be used for
... then we can see which make sense as motivations/roles

ivan: i'm a little skeptical that these may not be general enough
... but perhaps as extensions

azaroth: motivations are skos concepts
... if your data categories could be defined as those, then we'd have a very easy time mapping them

fsasaki: we're currently working on the vocabulary, so this is good timing
... some of these are text analysis specific
... but some may be broadly applicable
... we've also had a discussion about directionality
... here's a string, here's a substring within it that has a directionality from right-to-left
... it could be used to provide helpful information for using to solve for these scenarios
... we're not sure what the mechanism could be, but it is something we are exploring
... and also JSON annotation aspect
... sometimes people put HTML into JSON to provide language information---which looks really bad

<azaroth> scribenick: azaroth

bigbluehat: We don't cyrrently have a json selector. Some hacks, like json pointer, but not a finished spec.

ivan: who?

bigbluehat: IETF. Also in json schema, wihich has also expired. If we can't point in to it, then we can't annotate json objects. Need to look at this

ivan: Make it clear in the model how it can be extended

bigbluehat: Not sure if it's in our charter to stir up the specs?

Ivan: I think we can just point to them. Especially at IETF.

<scribe> scribenick: bigbluehat

azaroth: it's 2 pm. DPUB is showing up after the break--because coffee
... the topic next on the agenda is testing
... perhaps we stay with i18n topics instead since we have these folks here

shepazu: here's my reaction. let's storm the castle!
... I was somewhat encouraged, and somewhat discouraged
... we had affirmative, encouraging feedback from Travis at Microsoft

FindText API discussion with Web Platform group

shepazu: we got feedback from someone at Apple, which was less overtly encouraging
... but he didn't say we're not going to expose FindText
... more, I don't like the shape of the API
... he said he wanted fewer options, but then he also wanted to add a feature
... it seems encouraging that they gave us feedback at all
... we should hold fast on the most important part--which is the edit distance, imo

ivan: so, we had a small discussion at lunch, and I was wondering if it makes sense to have different API entries
... one that would say programmatically give us what you do already
... a find or search API which exposes what browsers already implement
... but which you cannot currently access from JS

shepazu: I think that would tactically be a mistake
... the current implementation does not solve our usecase
... and without exposing the edit distance to implementation, then we don't get what we need for our use case

azaroth: it solves giving the client api content about what is selected

shepazu: those aren't related in my view

azaroth: we could reasonably get a find api that exposes finding a range of text in the browser
... the edit distance is a robustness question

ivan: if I have an ebook, for example, the content is frozen

shepazu: you mean, the edition, on that ebook reader, but when you exchange it with someone else, then they won't re-anchor

ivan: sure, but there are also use cases where content is fixed, and for those, the lack of robustness is not a problem
... text books for example

bigbluehat: robustness can go on top of find api

ivan: the edit distance was the only thing a new or brady concept for browsers
... the rest, sure, but could probably be done by regex

shepazu: maybe I had a different take
... leaving out this edit distance thing--which is different, etc--the rest could be done with regex
... I believe his point was about regex, not about edit distance

ivan: separating the two things may make sense, it drives them to expose search--which is currently not done for developers
... maybe first they will do the simple thing without edit distance
... and then later do the robustness thing

shepazu: ok. if that's what we can get---the find text api--then....well....every browser does it difference
... I think we should not so easily give up robustness
... we need to push the issue of robust anchoring

David_clarke: i'm not sure the web apps audience fully understood the value of robust anchoring
... sell them the value proposition first, rather than selling the solution first

shepazu: paul cotton from Microsoft asked where our use cases were--which we didn't really present
... and while we provide some in the spec, the interface was mostly skimmed

ivan: there's a URI issue--the whole mechanism actually serves two purposes
... 1. finding the text
... 2. serializing the identification of that text
... the identifier aspect is also very important
... and it's not in the document

shepazu: we should annotate the spec to point to use cases
... I think Paul Cotton has a really good point about having use cases

bigbluehat: An opportunity to split the doc into sections to give info on use cases here, and outside. Such that browser vendors could see which parts are native, and can give developers access to
... we can start them off in that direction. There won't be consensus around that, even though there's similarity from the user perspective
... the robustness part shouldn't get caught up in that, and could have its own set of use cases on top of that

shepazu: I would like to not separate them so they can be considered as a whole, rather than what they do today.
... The solutions from the whole problem set will be different than if you start with the narrow set
... Would like the conversation to happen.
... in the context of the specification so we have actionable things to do. No one else has volunteered to work on it, and we have their attention
... lets engage directly

bigbluehat: If we mix in robustness, which they see as out of scope, it'll slow down exposing the find stuff which would be a huge value add for any implementer
... the people to win over for robustness would be content editable people
... they deal with it all the time
... translation, internationalization, etc. Find text with static text is different

shepazu: Would rather we didn't throw the major use case out before we try to engage
... they haven't said no yet

ivan: Don't know all the use cases off the top of my head, we have two documents, one from DPUB one here. I'd like to see what the % of use cases that are based on dynamic content
... and what on static
... Something that would help to decide which way to go

Antonio: Can you explain why the dichotomy matters?

shepazu: FindText now serves static case

Ivan: I can use it for annotations, but if I annotate content that can change, then we need the robustness
... that's where edit distance comes in

shepazu: If you'regoing to design something for a limited use case that just does find in page, you might not have the extra params, and then if the API isn't extensible, you'd never get edit distance

bigbluehat: Can design concurrently, to make sure that the extensibility is there
... core of what Im saying is that vendors don't have a motivation
... find text to me seems obvious
... if we can ship that sooner, and it makes lives easier...

shepazu: How things are serialized could be a bigger, longer term problem
... I think there's disagreement on how you serialize out the bit ... convert DOM to text

Testing

ivan: what do we need to test?
... more than one answer ... model, protocol and findtext are all very different
... what do we test for the model?

ivan: theoretically, what I want to see, is that if two implementations create two graphs, what do they have to do in order to test?

azaroth: so in the LDP group, we provided a set of tests that you could run internally against your implementation
... there was no centralized validation service
... each test was implemented in Java
... it would check to see if each direct container had hasMemberOf relation (etc)
... and it would tell you if you had it properly

ivan: we went through the whole RDFa testing, that was an easy one, because...
... this is the way HTML looks like, this is the graph I have to produce
... we created the markup, and then checked the triples it produced
... the starting point involved a bunch of HTML files
... what is the starting point for Web Annotation?

azaroth: client or server?

ivan: I don't care. however the annotation is made, it produces a graph, and I have to see if it's correct
... I can't properly say what the process is.

azaroth: brain storming. we could provide a set of human readable set of annotation scenarios
... create a comment on this URL, and then check their annotation with the testing tool--there would be at least one right answer
... however, there will be more than one right way to do that
... unless we are very very clear about which one we want
... annotate this URL with some text--does it have a language? does it have an string body? a remote body URI?

bigbluehat: we're going to need lots of tests....
... very specific to the very scenarios

ivan: if we make it too detailed, we loose it's value by being over specific
... but if it's too general, then only a human can validate it

azaroth: the distinction is perhaps the difference between syntactic and semantic

bigbluehat: using json-ld playground is a good testing ground.
... would be nice to produce along those lines
... may not count as good validation but help decide how much to implement that
... taking the json-ld playground and adding text graph output.
... only compacted or only in quads

ivan: combining automatic comparison of the graphs was not something they used because ... complicated to setup a processor / install.. practical issues
... in the RDFa case it worked really well b/c you gave the graph to the processor and at the end it produced an output
... playground leaves it back to the human in the end.

bigbluehat: I normally copy/paste around.. and add things to the @context if they don't exist ... in order to validate to check the basic cases at least

ivan: In the case of RDFa we had various versions (b/c of HTML5..) between 200-300 tests per category
... a full RDFa processor is probably too complex
... if we come up with a reasonable set of things e.g., 60-80 tests roughly.
... looking at tests by humans is not unreasonable in that case
... I think it is doable

bigbluehat: If I put the hypothesis tests.. 2/3 tests.. at least for a first test is helpful to understand what is close to an implementation
... there is also a visualizer to help create

ivan: If you need help, there are people.
... eric How do you test implementations?

erikmannens: I have to check with the guys.
... it is at the protocol level

ivan: we will come back to the protocol level :)

erikmannens: will come back to you

takeshi: it is not necessary to implement the selector for instance for us (Sony)'
... rdfdiff helps us with this
... as of today, basic processor to test / how to validate the output... not sure if the output form the JS is correct or o not, but at least what's stored in the DB is correct.

ivan: when you say check.. how is the check done?

azaroth: IEEE (?) ... they use a tool to give a URL.. essentially creating an internal object structure.. it has its own way of going at it to check
... here is an example of an annotation of this feature and that it. If all those working great, and here is what's misformed then ...

ivan: but yo ucheck as a human?

azaroth: No.
... tries to generate the structure internally.

ivan: conceptually speaking it is a dedicated ..(?)

azaroth: The take away from earlier work is .. you've got something missing from your implementation, and here is what is required

ivan: we would put an informal annex for framing. we then run the framing algo for the playground. can the playground actually compare that?

bigbluehat: ours just shows it to you.. matching at the graph level is not clear

ivan: if we use straining it is easy for the human

bigbluehat: .. we don't have to give it to the graph if we use keys.
... Doug we want to make sure the triples are right (i.e., prior discussion)

ivan: generated structures are relatives are small for the tests so humans can make the comparison fairly easy
... formally speaking we have two approaches.

bigbluehat: what we (hypothesis?) don't have is prebuilt code
... I'd like to see something like JSON playground.

shepazu: arne't quads in turtle?

bigbluehat: no

shepazu: what about json-ld (re: N3.js)?

erikmannens: will check

ivan: check if N3.js understand nquads and JSON-lD

shepazu: what is its output?

bigbluehat: It has an internal model

shepazu: you can write a parsing engine and... test against that.

bigbluehat: we don't want to writ einto other people's code necessarily

ivan: with N3 we can check the Turtle serialization..

Meeting with DPUB

ivan: Annotation, DPUB

clapierre: Accessibility of DPUB

Karen: also on DPUB

azaroth: Annotation

Jeff: DPUB IG

bigbluehat: hypothes.is project..
... read ebooks occcassionally

rhiaro: U of Edinburgh, social Web WG

Ann: free agent / Social

erikmannens .. open source .. to publish ebooks

Antonio: W3C JP

Olivier: BBC

Ralph: W3C

shepazu W3C

takeshi Sony

azaroth: We brainstorm for Annotation/DPUB
... DPUB produced a set of use cases (sometime last year) which we looked at in the ANnotation WG. Those use-cases are somewhat transformed into deliveranbles.
... recently, some UC for accessibility

<tzviya> DPUB Annotations Use Cases: https://www.w3.org/dpub/IG/wiki/UseCase_Directory#Social_Reading_and_Annotations

azaroth: we could go into more detail on those

<Ralph> Digital Publishing Annotation Use Cases [W3C IG Note 2014-12-04]

azaroth: we don't have to walkthrough Annotation UC as published

Brady: DPUB
... ... since we are not familiar with what you published..
... we would like to talk about accessibility UC

azaroth: we have 3 publications. data model, protocol, client API
... protocol, we've been discussing but haven't produced a new draft.
... tomorrow we can go over
... the model changes
... we can go over what each sections want to do
... trying to produce normative requirements
... now we have each section with short intro and use-case
... rather than a prose
... with JSON-LD/Turtle/Diagram
... technical change tho the model is the role ot be associated with the "bodies"
... changes the way tags are produced
... here is a resource and it is "tagging" ... as opposed to a tag
... also for editing.
... the protocol is based around essentially REST and HTTP. Update and Delete annotations. We have notions for a Container
... and y[et to be published, but hopefully after tomorrow.. an integrated way to say a list of annotations or a collection of annotations that can be broken into pages
... annotations or lists are great of interest to (idpf)
... At the moment we are trying to align that UC with the protocol

shepazu FindText API is basically exposing 'find in page'

shepazu: functionality to as a DOM API
... the current idea is to include a 'edit distance', a fuzzy matching.
... there was some feedback to handle this with regex
... the basic idea is to let the API to pass in the params and so it ends up with a text result from the document and that returned to you as a range
... one of the features in a page API is a prefix and suffix.
... in case you have multiple instances of a word
... which instance is of the selection that you are trying to find
... and given that, you have a URL syntax, a fragment identifier , this string to be set of creative to search in the page.
... so you could use it as a primary identifier

ivan: the identifier represent the search

shepazu: terms of the search
... certainly a long string

ivan: well we have CFI

shepazu: CFI
... lets say we are working in a browser, I select something, and store the selection, and the prefix.. and some other things, and store them as annotation, and in this case you might not have a body..
... so, then you might share that or keep it for yourself. for the latter, the next time yo ugo to the next version of the book, the annotation still stands

ivan: to be very precise, the reference to the selection, the identifier ot the selection, is the spark of the annotation
... annotation means I give you the id to what I annotate and what it is annotating

<scribe> Unknown: how much sof the source document is part of the selection

Brady: I ask from digital publication and ebooks, there are significant restrictions as to how much you can copy
... at some point you have to stop wha tyou can select and annotate

azaroth: .,.. There are two types: here is an exact match, tand the second is a char offset

Brady: at least for CFI, for char offsets, there are a number of implementation issues

ivan: one of the reasons why this is coming up

Brady: difficult problem
... still painful

shepazu: lets talk about it "off camera"
... or "in camera"
... This was tried. hathitrust case with storage and fair use
... that doesn't remove you from contractual cases, but copyright
... lets not go into that here

bigbluehat we don't have the perfect solution but not telling you about it :P

Brady: I guess it just takes money :P

ivan: for DPUB people is important.. the model document, the Annotation POV... and epub world there shoul dbe a change reference.. (csarven: okay I lost all the references here)_

shepazu: Rather than storing strings, we store hashes
... we have thought about the problem but no solution

Range finder was discussed.. which I assume is just text

^^ tzviya

shepazu: maybe range finder or.. to be this text or non-text to be part of content
... I might have a picture or a face to be highlighted. I think that's a larger thning. different things for different media types. as long as covered by media annotations
... we decided to start smaller, and solve the already smaller difficult task of text search

Brady: can I select 498 image?

azaroth: in current state or..?

Brady: or resource
... I have this collection of images and I want to make sure that I go to the right page

azaroth Which scope?

bigbluehat Within the scope of page 5

Brady: In my case, it will be page media
... and image might appear on different pages so i want to know which

bigbluehat We call it scope currently.

shepazu: There could be sophisticated URI, selecting part of the image being stored so that it can be matched against other things on the page
... an app to compare this fragment on the page to compare with other images to see if it is included

Brady: I want to bookmark page 50, and create an image only for that
... i want to be able to find that on a device now

bigbluehat We are talking about an XPath selector

scribe: any fragment selector
... also exploring CSS/Xpath for canonical selectors

ivan: the model is such that for generic term "selector", and can be extended
... it is not there yet, but XPath can be such selector

bigbluehat CFI already is

ivan: CSS can be one of the selectors

bigbluehat: you can encode a complicated selector and never put it in the URI

ivan: oi think the CSS selector is under utilized.
... reselector is extremely powerful
... using that to locate the target of an annotation

azaroth: Continue on for 5 more minutes?

tzviya Talking about.. our vision for what we'd like to see .. portable web pub. offline being not first-class

tzviya: a lot of work going on in CSS WG
... and ARIA
... issues that this raises.. identifiers are prominent.
... we really focused on heading in that direction
... a lot of the work is involved in international publication
... updating EPUB and what W3C is coming form

bigbluehat: From the W3C side, multiple things expressible in a single package.

ivan: That's a question of the packaging

bigbluehat: I mention it in relation to the protocol.

ivan: Don't open all the worms

tzviya we explored a lot of packaging option

tzviya: definitely open
... talking about it for long time
... right now, to read publications online
... publication object model is a proposal
... welcome to join
... we need to have some sort of an object model, what should be based on is open
... and conversation with service workers

ivan: to update what's happening is that. there is a packaging format i.e., EPUB, zip based.
... we know there is some work in W3C, mainly in TAG to create a packaging format.
... the info I got so far is still a bit distorted.
... unclear what the outcome of that work will be

<azaroth> web packaging: http://www.w3.org/TR/web-packaging/

ivan: from our POV, publishing community can say give us a packaging format..

Brady: from my perspective, we proposed multitype mime.
... every time we proposed it people were confused and asked why didn't you use zip? So, we used zip.
... now we have toolchains, and sorts of code around ZIP.
... if we can turn back time, multipart mime.
... and reality on the web, i prefer service workers

Brady: at least for the srevice workers, and not having a package makes more sense
... especailly for delivery, and in that case, we don't care about streamability

tzviya: I try doing with multipart mime, but then when you try making it work, there is not much out there.. only some old stuff from stackexchange...

bigbluehat when not doing W3C stuff, .. i do couchDB stuff to stream JSON

bigbluehat: for couchDB that's super useful b/c we don't have to read the whole disk

Brady: It sounds really interesting, but not interesting enough to do anything about it (so says some people)

azaroth: This has some impact on this WG, b/c it has offline reading modes
... we should be able to accommodate

<tzviya> DPUB TPAC Agenda https://www.w3.org/dpub/IG/wiki/Oct_2015_F2F_Logistics_and_Details#Schedule

azaroth: at the moment we are looking at ActivityStreams
... to be discussed tomorrow with Social WG people

ivan: In a sense, the way we ... where we are getting ... the usage of service workers, it can fool the main reading system that it can believe that everything is accessible through Web (on/offline)
... b/c the service work will catch the HTTP request and deal with it (cached or not)
... for the time being, the service worker is... - I don't want to say scifi - only one browser implements it

<bigbluehat> Web Packing format (based on MIME Multipart) http://www.w3.org/TR/web-packaging/

ivan: so, some fuzziness there
... that may be that annotation work may be released of this issue

Jeff: .. [csarven: sorry, couldn't track this]

ivan: each annotation can have a role
... We want to annotate @alt

tzviya So it is not visible content

shepazu I'm not sure how we can do that

scribe: I can't think of how a UX would expose that

bigbluehat: The browser can switch off [stuff]

shepazu: From a UX perspective it makes sense

bigbluehat: If you can turn the images off, it is possible

tzviya: This is beyond the scope of DPUB

shepazu: I'm skeptical

having assumed a web resource

bigbluehat: DOM expression to the user.

charles: maybe @alt is in that content stream
... the visual representation is not the only representation the browser can make, e.g., it can be audible

shepazu: I think tzviya was talking about.. what's exposed to the API

shepazu: , not ncessarily what's in a web page
... there are things that can be done for @alt e.g., screen reader, but hard for a browser to do it

Jeff: can a blind user create it and a visual user can find it?

bigbluehat: It really depends on what the visual text is given

shepazu: There is a visual aspect to text

shepazu: ultimately all text can be considered part of range, so not sure if you can't the text. It is possible, but probably has some issues which needs to be figured out
... what you are selecting as a range is the main issue

ivan: The <details> in HTML5, ..

Ralph: Without talking about interfaces, ... whcih is then available the user not using that system

Jeff: so we can annotate anything

shepazu: Anything. Even scene descriptions

ivan: We should have a way to incorporate this
... and not throw away

azaroth: also check the model so that we can.. .make an example of @alt. and how to represent that.

azaroth: is there anything further from DPUB side to discuss?
... welcome to hangout with us.. next topic is.. what we can accomplish before next TPAC for client side

Karen: what would be the top implementations to expect?

shepazu: Realistic for us to say... a data model - finish that with multiple representations/serializations

shepazu: Some find text, polyfill thing? possibly protocol as well
... browser extensions
... supporting annotation data model, findtext as well
... we can hope that a browser supporting some parts of this
... that's not yet clear
... within a year we could at least get a findtext implemented in a browser - even if it is a stripped down version
... b/c webannotation is a bunch of moving parts, so I don't think we are goin gto have the full annotation system in a year
... if you want the implkementation, hypothes.is will probably have it in ayear.. but that's a browser extension

bigbluehat: the selectors we output match spec.

bigbluehat: if we get xpath in , and define how we do lists in..
... I have written some translation code with WebAnnotation, and give it to Hypothes.is group...
... the annotations are in public and public domain
... the hope is that, we get up to the protocol as well. least likely to get done, probably because not done as a draft yet either.
... that part is moving already internally due to other forces
... there are libs that we have ... if we do CSS/XPath, there is plenty of code laready
... if we ship nothing in a year, we can confidently deliver the data model at least
... RadiumJS and (?) JS

ivan: essentially implementations may appear in readers

bigbluehat: these are usually last mile problems, but somebody has to tie them together

Karen: Academic publishers/Journals ...
... I will take what you have discussed and go over it.. headline would be; academic journals will be soon able to such and such from W3C's stuff

shepazu: Are you afraid that they'll talk down to you? ;P

shepazu: The way to store and share annotations - that's the data model - the way to write annotations - that's the protocol - I'm not saying teach them these terms...

shepazu: the way to write and store.. in the cloud.
... open up annotations .. ther eare certainly ways to talk abotu these things like the real people talk

azaroth: Anything else? Overlapping with DPUB?

what should be done before the end of the charter?

azaroth: remaining of the day, we can talk about what do with the remaining of the charter
...

<bigbluehat> DOM related functions by tilgovi:

<bigbluehat> https://github.com/tilgovi/dom-seek

azaroth: The description of client-side API in the charter.. is slightly different

<bigbluehat> https://github.com/tilgovi/dom-node-iterator

<bigbluehat> https://github.com/tilgovi/dom-anchor-text-quote

<bigbluehat> https://github.com/tilgovi/dom-anchor-fragment

<bigbluehat> https://github.com/tilgovi/dom-anchor-text-position

scribe: So, do we think that .. a Python implementation of that would not be very useful.

shepazu: If we can get one browser implementation, and one polyfill

ivan: that would be ideal
... process wise

azaroth: A server-side findtext in any language...

shepazu: No, it is a client-side API

azaroth: Unless under robustness.

bigbluehat: Things being conflated in findtext

shepazu: If you ar etalking abotu the fragment identifier, but I was talkiung about the findtext API

bigbluehat: I agree, but I don't think that's how it was presented earlier. If we can separate that..

shepazu: Fragment identifier being completed in a year is not clear.

ivan: I have no doubt that we can do that in a year

bigbluehat: That's the easy part

shepazu: it took the media fragment group 3 years to do that.. so, sure we can do that in a year, but i don't htink it is trivial

<azaroth> http://www.w3.org/annotation/charter/

shepazu: Can we get it wide spread instead of the implementations?

bigbluehat: I don't think we can put it on our shipping b/c of that
... hopefully changing towards more implementable

shepazu: I think Charles brought up an interesting point. I don't know how to represent a column in HTML.

azaroth: How to annotate a column

bigbluehat: You can do that in a browser right.. clicking random cells

shepazu: Multoiple discontinous selections
... Actually ou could use findtext

tzviya: ARIA and CSS are worlds are apart
... Between CSS and ARIA roles you could solve that.. It is a very long conversation.
... So CSS selectors.. you can select whatever, and use ARIA roles to say this has roles...
... col doesn't exist, but colgroup exists
... the table models are most robustly defined, b/c it is really hard to navigate a table
... so sit down and talk to ARIA people.
... ARIA/Annotations could be really accessible. Doesn't say much now, but it can be extremely valuable for a13y

shepazu: Sure.. lets talk about how that world fits in
... the ability to serialize out what you selected or having being given selection and recalling that in the document
... i don't htink ARIA is in the right place. I tis a way oyt express something.. but I could be wrong

azaroth: Back to high level topic
... we can finish the model
... there are some outstanding issues we can deal with tomorrow

ivan: i think we should wait for the others, not sending CR by Dec.

bigbluehat: Discussing before sending to the list

azaroth: For the serializations however, JSON-LD... but HTML something comes up

shepazu: I'll show you later, but I have a prototype.. HTML addresses some of the things brought up
... for any out of band content
... comment, footnote
... if example it were to be a footnote, a single fn can be referenced in multiple places in the doc
... if we had a native note element, I'll ahve a mapping between a note lemenet.
... something I've been tinkering with
... Two way mapping

ivan: Either using RDFa or using even extra attributes like ITS did
... there is an existing mechanism to go through
... I am concerned by adding new elements
... we don't have an extension model in HTML
... the web components is for the time being is up in the air - only one company doing?

shepazu: This WG won't necessarily accomplish everything in a year.. so lets concentrate our energy.
... I propose, in parallel, we can start on work that could be done.
... we hav ea list of areas, but not a list of specs.
... so a set a of specs in a year-two would be great
... we might not re-charter, unless we hav ea concrete list of stuff (which can map to the spec)
... they might say that if something doesn't match a spec, it might get dropped
... at least the ground for the next charter
... Does anyone object to that approach?

bigbluehat: Don't mind if we have clear deliverrables, but some things will fall through the cracks

shepazu: I want to make sure that... this work might take longer.

ivan: lets separate these discussions
... for tomorrow, what to do for the coming yera
... provided that this will get done, how much energy will the other stuff wake .. considering that most active are probably 10 people

shepazu: Do we have a general agreement on what we can deliver ..

bigbluehat: We need to repost the 6 points in the charter and map it to what we can deliver
... some stuff is super big

shepazu: So, the way that we struck upon is 'selection' is a pseudo-element in CSS. So, the range we get from the API is ... will provide as a range, once we have it, it iwill register in the document. and name it.
... a pseudo-element ... [csarven: I lost it]

azaroth: Data model, protocol, vocabulary (poart of data model), JSON-LD (part of protocl), not yet decided, search or notifications as par tof the protocol, plain text API

ivan: what about URL?

bigbluehat: if anyone wanted to do it go to IETF and then we can incorporate it

shepazu: We can certainly try, erikmannens you can test how long it took to do media fragments

<bigbluehat> seems DPUB really wants it...so...we should talk! :)

ivan: XPointer like framework...

shepazu: hash something

azaroth: The point is IETF media fragments, .. and existing defs how those specific...

tzviya: Can we have a meeting with DPUB? Sounds like a large deliverable.

shepazu: Probably small with a lot of fight :)

bigbluehat: the call time is imporant/limited.. and fast track that to get the specs out. That's primary ..

azaroth: Thanks all

Summary of Action Items

Annotation WG F2F, Sapporo, 1st day

26 Oct 2015

Attendees

Contents