See also: IRC log
ivan: one of two staff contacts in the group. Also leading
DPUB. One of the reasons I"m in this group is because digital publishing
community has major use for annotations. Not at w3c, but ietf, already
using first draft of annotation model, big use case.
... Also part of DPUB IG
... Other interest, Semantic Web Activity lead for 7 years.
clapierre: ... With Benetech, also in DPUB and co-chair of accessibility task force of that group. Interest in how to use annotations, for disabled community
azaroth: at Stanford University, one of the two chairs. In 2009 there were two projects: web annotations and humanities, and one focussed on science annotation, found out about each other, merge goals to start CG. In 2013 completed CG work and last year with Ivan and Doug's assitance we started WG. My interest is from that history, but also my academic background is in humanities, phd in medieval french. Imagine a non-scholar trying to read a french
manuscript, having annotations to describe whats' going on is important
Jeff Xu: from Rakuten/Kobo. Annotations important to share annotations with users, and between user and publisher
shepazu: We need to put in our use cases the idea of
sharing annotations between users and publishers
... Doesn't immediately occur to people that annotations can be shared
with publishers
ivan: Isn't it in DPUB use cases?
shepazu: two kinds ofo publishers. Of a blog/website. But
also publisher of site where you're reading an ebook
... Need to make sure people understand two kinds of use cases
csarven: visiting student at MIT, may join W3C as well. PhD
student at Bonn. Why I'm here relates to my research on scholarly
publications and how to keep annotations around both from authors,
reviewers, and any commenter on the web
... Try not to make major distinctions between them other than roles.
Been keeping an eye on this WG on mailing list and github. Overlapping
interests with DPUB that I'm also trying to follow. Also in SocialWG.
All these things overlapping.
Eric Mannens: From iMinds. Semantics group there. Been with W3C since 2006
scribe: Primarily involved in semantics and media fragments
... Afterwards in Prov WG, and now in annotations and publishing because
we have two big projects on that
... One Flemish one with publishers, there, and a European one called
(?)
... with Felix who will join later
... Guys from my team in this group. Open source framework to publish
ebooks, completely based on EPUB3 and HTML5
... Definitely going to implement your spec as one of the reference
implementations
... I always have to look for new stuff for my team, so will be in and
out today
rhiaro_: Amy Guy, University of Edinburgh PhD student and visiting student at MIT, in SocialWG
miyazaki: from Japan Broadcasting Corporation. First time
at TPAC, observer in this meeting. Research Engineer, in charge of
constructing RDF database of TV programmes
... Very interested in semantic web technology and social media
... Interested in how to handle people's review about TV programmes
... How to structure and so on
takeshi: From Sony, Japan. Sony has released device named
digital paper, so you can take notes like you are writing on the paper
on the device
... It is based on PDF. My motivation is to replace that with web
... I have implemented a prototype into the device
... Can't bring the device
... Also spend many years in ebook industry. Also contributed to epub
format. And printing industry background.
shepazu: Rob failed to mention that he's the editor of two
for the specs from this WG
... Web Annotation data model spec, and also Web Annotation protocol
spec, which is based on LDP
... Basically the notions of how to publish annotations to different
servers, write API for the web
... Might be useful for us to go through individual items in out
charter.
... I'm Doug Schepers, staff contact for this group. Instigator of
larger idea of web annotations beyond data model stuff in CG. Bring
together group that solves lots of different parts of the problem.
... Also staff contact for SVG WG and Accessibility. Also Web Audio API
WG. Touch Events WG. Web Payments WG.
... Appreciate everyone showing up, we should have more people later.
azaroth: Thanks everyone. Until we are more familiar, just say who you are and/or your handle on IRC
ivan: we are expecting one more person in half an hour
shepazu: I meant to mention, I'm also the editor of one of
the specs - Find Text API that was just published as FPWD
... Any questions or comments, feel free to ask
azaroth: One note about the WG - we are public, all
communication is done publicly
... Try to be somewhat less formal than other WGs and just roll with it
and see how things go
... Try to make most appropriate use of time and conversations
... Might be slower, but we expect deliverables will be better
<azaroth> Agenda: https://www.w3.org/annotation/wiki/Meetings#Monday_26_October
azaroth: Today the majority of the agenda is focussed
around client APIs. First FindText work that Doug has been working on
... We have a joint meeting with Web Platform about that at 1130
... Before that discussion particularly about i18n
... After lunch, a meeting with Felix around translation
... Then after that, we have to have tests in place for all of our work.
Testing an abstract data model is somewhat complex. We have an IE, Chris
Berg, who is going to be leading testing.
... But if we could discuss how we want to go about testing for all of
the different APIs and models and so on, we could make some good
progress
... A note from the programme, the break is actually between 3 and 4,
but that's when we're meeting with DPUB
... So not sure of exact time, but we have joint meeting with DPUB,
particularly around use cases
... I'm editor of DPUB note on annotation use cases, which can feed into
this group
... Some blank time, probably that will get taken up with discussions
... Towards the end of the day we want to work on next steps for the
client APIs
... The charter has a broad pool of client side API deliverables
... Essentially says create some specifications that help browsers to
create and deliver annotations
... The FindText API is one of those, but there may be others that can
be worked onw ithin that scope
... We do have the beginnings of a second one called DOM Annotations,
but after some initial work things stalled a little bit
... Would be good to discuss what would be useful, and who we might be
able to reach out to help us
... Wrap up at the end of the day. Any questions or thoughts about
today's agenda?
ivan: Want to talk about URIs sometime?
shepazu: While we're talking about items in charter
... Also could during FindText
ivan: We shoudl put it on the table as something we plan to do, important for DPUB
azaroth: at least discuss it before lunch
ivan: the agenda now says as if the FindText goes into i18n, but maybe it's worth for people who are not familiar to have 10 minutes intro to FindText
bigbluehat: Benjamin Young with Hypothes.is
... Hypothes.is is a nonprofit working to bring web anntotation back to
the web. Offer a browser extension and bookmarklet, and embed for
publishers. BSD licence, Python, angularjs
... I'm coeditor of data model spec now
azaroth: just discussing what else needs to be on the
agenda for today
... Agenda for tomorrow. Focussed around data model and protocol
... Starting with protocol because that's more important to make
progress on. CG gave us a headstart with the model, and because we have
a bunch of people aligned with SocialWG
... 3 parts ot protocol. REST (CRUD), is built on top of LDP. Also for
the SocialWG we want to use AS2 Collections and Pages to be able to
break up the response
... Two areas we have made less progress, notifications from one system
to another than annotation has been created, modified or deleted
... We hope work in SocialWG will help us there
... last TPAC we had a good conversation around using AS2
shepazu: Seems like a natural mechanism
azaroth: After lunch, third part of protocol is search
... If you have millions of annotations across all resources on the web,
how do you find those you're interested in
... I have some ideas around that which I was writing up on the plane
... The model, alignment with SocialWG
... Less around here are some new features that are annotations, more
here's what is settling on social side, and what we're settling on, and
how they can work together
... After the break, continuing to work on further features if we still
have energy
... Next steps, how far along with deliverables we are, who we need help
from to get there
ivan: Also decide whether we need another f2f, let's not leave that to the last minute
azaroth: unscheduled meeting that we should try to schedule
is to talk with TAG about protocol issues
... Erik Wilde brought up some concerns around how protocol works, most
of which are derived from LDP
... So given that LDP is a full recommendation, it's a little bit
problematic to say we find issues with it
... But we want to be as valuable as possible
ivan: I wouldn't think you want to go there, if Erik has problems with LDP we shouldn't be the ones playing for Erik, it's not our role
shepazu: He had one thing that was not specific to LDP, which was we are saying that something is an annotation server, as opposed to a generic server, and he thought that was not a good design choice
bigbluehat: also our use of server singular instead of
servers, as LDP can be spread across multiple machines
... Editorial tweak on our part. To say URLs can be all over on the web.
As long as your credentialing will let you move across machines,
... Initially saying annotation client and annotation server makes it
sound like a two piece things, Just some clarification
shepazu: You can say that about any LDP application
bigbluehat: we can clarify what distinguishes an
annotations one
... Just Link Headers
shepazu: Ralph is domain lead of domain under which this group operates, information and knowledge domain
azaroth: WG has six deliverables
... first being a data model for annotations, which we now have a second
draft of
... Derived from CG and then has been discussed thoroughly within WG
... Some of the areas that have changed around how it interacts with
other specs, eg. some of the specs that the CG used were not full recs,
so we can't refer to them normatively from rec, so we needed to remove
it
... Over the last few months we've talked about how to have multiple
roles, one for each resource used within the annotation
... Tied to the data model is a vocabulary for describing the data model
... Vocabulary is in RDF
... Important to note that we rely heavily on JSON-LD as a way to have
the RDF graph model be something that is understandable and
implementable by people who do not have a full RDF stack
... One of our main driving principles is that the results should be
useable without relying on RDF specific technology. You should be able
to write JS in browser and work with JSON that comes back from the
server. Time will tell how successful we are with that
... Something we're trying to keep in mind
... The model is a graph based model, using RDF, but the way we expect
most people interact with it is via a specific JSON-LD serialization
ivan: is it the intention that the vocabulary will be published as a separate document?
azaroth: at the moment the vocabulary, the serialization
and the data model are all rolled together into the data model spec,
Annotation-Model
... There has been some limited discussion about having multiple
documents, one for model, vocab, and serialization
... Tradeoffs have been that having multiple documents means you need to
read multiple documents, with lots of references between them that gets
complex
... But if it's all in one, it's more complicated for people who just
want to see certain examples
... Bit of a pedagogical issue, rather than a technical one
shepazu: My intention is that the serialization would not
be a single spec, but rather a set of specs
... Eg. as HTML, or as exif data in an image. Different ways of
portraying the same data that would map back to the same terminology
azaroth: at the moment we've been focussing on JSON-LD, but
there likely will be other serializations
... First three, still work, but reasonably well
... Fourth is protocol, how do you transfer annotations from client to
server or server to server
... based around LDP, also hopefully Collections from AS2
... Five and six are closely related. Client side API and robust linking
anchoring
... Client side API helps browser or user agent create and consume
annotations once they have them via the protocol or some event
... So the current work is around the FindText API (previously
rangefinder) which allows you to do find in page iwth a bunch of
additional cool features
shepazu: fuzzy matching
... defines a set of parameters around which you can do fuzzy matching.
... Robust Link Anchoring is a more complex topic. The first spec as Rob
said is the FindText API and that deals simply with text
... BUt if you were annotation eg. an image, there should be a way of
getting at a particular part of an image, FindText does not deal with
that but robust link anchoring does
... Once you ahve the FindText API, that opens up the door to having a
URL scheme that using fragment ids you can say...
... Say that you have a selection of text that you want to search for
... Say it's repetitive, song lyrics for example
... So you want to say, even though this particular text appears three
times, I want the third instance specifically
... So in addition to saying this specific string, you also say these
are the 32 characters before and after, the prefix and the suffix
... Given those three things, prefix, suffix and selection, you can have
a URL that says # something
... haven't decided how to do it yet. Browser takes parameters and finds
the instance you're looking for
... If you wanted to select a passage and send a link to a friend, you
can send a URL and your friend's browser takes them to the exact place
... It's not the most elegant but we can't think of a more elegant way
... Obviously once you have those things, you can use those for
annotations
azaroth: at a slightly higher level, the robust link
anchoring topic is, given a resource
... how do I get the representation that I want
... and how do I get the bit that I'm talking about
... Issue around dynamic pages. Eg. js app, makes dynamic changes to
page. You annotate something, what information does the client need to
reconstruct the state of the page to make the annotation make sense
ivan: at some point we should look at these six, and plan
what is realistic and what is not in the coming year
... FindText is great, but personally I don't believe that we will have
the time and energy to do anything else under the 6th point
shepazu: not sure I agree, but okay
ivan: that's my opinion. Same for serializations. I don't
see us doing everything needed for rec - spec, testing, etc - within
less than 1 year
... so we have to be realistic about what we can achieve
... maybe we should say that for certain entries here, we propose an
extension or a new WG, but we have to be realisitic. We should try to
find some time to discuss that.
shepazu: one last thing about robust anchoring
... We talked about it largely in terms of text and images, but this
also applies to media resources. Using media fragments for example to
get a particular point at a video
... You can also include a particular location in a video
... All things that can and should be annotable using the data model
... How the robust anchoring links with the data model, it stores all
the individual things as parameters
... For example for text, the selection, prefix and suffix, maybe some
other bits, those can be recomposed into a URL, but in the annotation
they're stored as individual pieces
ivan: may require some adjustment between the two
... the current selectors we have in the document may not cover all the
things the FindText API can do
... we may need to push additional terms into the data model
bigbluehat: is robust anchoring the ability to re-anchor across media types?
ivan: I think the idea is that if you get an annotation with a target uri, and somebody changes the text, you could still find the text. Robust against change of the media.
azaroth: the exact change is not well defined. For example if you have a resource that does conneg for plain text, html, pdf, the URI would be the same and the text is there, but the content negotiatble representations, one annotation should be able to re-anchored across all of those representations. OR is it for specific representations
bigbluehat: that definitely needs clarifying
... The scenario that hypothes.is have, is publishing as html, epub, pdf
... want annotations across all of them
... Textually the ranges are the same, but scenarios of anchoring them
are pretty different
<azaroth> proposed RESOLUTION: Minutes of last call are approved: http://www.w3.org/2015/10/21-annotation-minutes.html
azaroth: any objections?
RESOLUTION: Minutes of last call are approved: http://www.w3.org/2015/10/21-annotation-minutes.html
<shepazu> http://w3c.github.io/findtext/
shepazu: Editor's Draft ^
... Has latest changes since publication
... Published as FPWD last week
... A little unusual in that we have a liaison in our charter with the
WebApps WG to publish this document together
... In the time I was working on it, there were plans to merge WebApps
with HTML WG to form Web Platoform WG
... We put out a cfc for the FPWD of FindText
... And from the time the cfc started to the time it ended the new WG
launched, so through some quirk of fate this is now published by
WebAnnotations and is the first spec published by Web Platform WG
... Web Platform is working on all of the big clientside APIs, plus HTML
... So it's good that we have their attention. I talked informally to
somebody from apple who works on safari, and today I bumped into
somebody from MS who works on Edge (replacement for IE)
... Both of them said that so far as they could tell without having
looked at it too deeply they thought that FindText seemed like a good
idea and they're interested in implementing it
... That would be fabulous, and get the WG the attention of the use case
that we're trying to do
... While not diminishing the other things, the anchoring that is
enabled by FindText, along with the data model, those parts are the core
of annotations
... THe publishing stuff is all useful, but those two pieces are the
core, if we can get attention for those two pieces we are in very good
shape
... We also got the attention of the i18n WG
... Any time you're working with text you need to make sure it's
internationalized
... about a year ago the i18n WG started working on a spec called
charmodnorm
... character model for the web normalization
... worked on by Addison Philips(?) Amazon
... solves so many of the problems we should have run into, we don't
have to translate unicode stuff, already a spec for this, timing really
fortunate
... and the fact we were working on FindText got them interested
ivan: that document is a note or rec to be? Timing?
shepazu: Rec-to-be. Don't know about timing. Probably hand in hand with FindText
ivan: otherwise we run into stupid administrative issues
shepazu: they raised several issues on github
... those issues I've started resolving them, some are easy some more
tricky, all them are about my own ignorance about i18n
... Just educating myself about the right way to approach a problem
<azaroth> Github Issues link: https://github.com/w3c/findtext/issues
shepazu: There will be a process of negotiation between us
and i18n about which parts of defining text search in FindText and which
CharModNorm
... CharModNorm applies search to broader set of resources
... FindText specifically developer API, and beyond i18n because there
are things around edit distance
... That's the background of this thing. Seems like it's goign to get
some momentum. Might change dramatically, but the barebones are here.
... Is anybody interested in hearing how this api works?
... I'll briefly tell you
... Three ways you can provide feedback on the spec
... Either send an email to the mailing list
... public-annotation
... file a bug on github
... Or leave an annotation directly on the spec
... Make an account, select some text and leave annotation
... They're sent to mailing list
... API has several parameters. Pass them in as a JSON object
... Example. Here's a poem, selected because it has the words 'rage
rage' several times. So how would you find the fourth instance of 'rage
rage'
... EXAMPLE 1 ... pass in string to FindText
... call searchAll()
... find third match if you're looking for third instance
<scribe> New arrivals: Richard Ischida (r12a), Dave Clarke, Felix Sasaki (fsasaki)
shepazu: Here's another example of a search that will find
that string
... Intialize FindText object with thsi JSON object, with text and
prefix
... This is the specific selection that we're looking for
... So, the kind of parameters you can have
... text and textDistance
... Edit distance is an algorithmic way of saying how two words are
related mathematically
... eg. dog -> fog, edit distance is one, have to change one letter
... fog -> frog, have to add a character, so edit distance is one
... edit distance dog -> frog, change and add, = 2
... when you're talking about typos.. on a string this small this is
significant. When you're talking about longer strings it becomes more
useful
... when you're talking about typos and they miss one letter
... still robust
... you can still match, especially when you have prefix and suffix
... turns out to be a very efficient way of searching for differences
... edit distance is absolute number of changes
... if I say I want an edit distance of one, that means I will allow one
change, doesn't matter length of string
... Quite likely if you didn't find match on first pass with FindText
API you might increase edit distance until you find a match
... once you get to a distdistance of 15-20% it's very likely this thing
doesn't exist in this document any more
... but first you try to have robust anchoring
... selection is the target text you're looking for
... textDistance is edit distance
... prefix and suffix, both of which have edit distance
... scope is an element that says the content I'm looking for must be
within this element
ivan: DOM element?
shepazu: yes
... DOM API, to operate on webpages
... So let's say that I make a webapp and my webapp is a text editing
app
... So I have an editing area and a bunch of words in there. I have the
world 'file'
... and in the UI for my app I have a bunch of menu options, and one of
them is 'file'
... so if I want to search this document, if I want my users to be able
to search this document, I don't want them to find the UI instance of
the word file
... the thing within the content area
... eg. google docs gives you its own find dialog
... another use case is that you might say I know it's in this chapter,
which is represented by this element
... can be used to make more efficient searches
takeshi multiple elements?
shepazu: no, you should set parent
... range says where should I start this search
... similar to scope but different use case
... caseFolding, unicodeNormalization, set of choices
... wrap, do you want to wrap around the document
... so if you start from a start position do you want to go all the way
around. Maybe not necessary.
... The other stuff is all related to the search operation itself
... The way it works, turn the entire document into a string, normalize
it, collapse the white space, search on this long string that is the
text of the document
... Once you find a candidate match, you return it as a range, where is
it in the DOM
... Not simply where is it in the text
... Allows you to treat element boundaries.. you ignore element
boundaries when you're doing a FindText API search
... DOM API that operates on text, and returns DOM range
... A range may span multiple elements
... That's basically how the API works
... an algorithm of how it operates, not for implementation, just
explanation of results
... Finally, the notion that you would have this URL syntax, each of
these parameters is something you would set in this URL syntax
... Each URL is effectively a findText operation
Jeff_Xu: based on text structure?
shepazu: yes
Jeff_Xu: in html structure, if there's some element in front of keyword, but moved to somewhere else with CSS..?
shepazu: doesn't account for that
ivan: if you generate content by CSS, you don't find it
shepazu: there is discussion now that generated content in
CSS should also be accessible
... generating content should be treated as some part of the object
model, whether DOM or some higher level, should be serialized as part of
it
... however we solve it for accessibility we should solve it as that
case
azaroth: will make github issue to track that
shepazu: need to transfer issues from spec to github
... Some of the issues on spec are me thinking out loud
ivan: css stuff is a good question
shepazu: occurred to me before, but not resolved yet
... there's generated content, but also the css has moved the text to
appear to be in another part of the document. Nothing to be done for
that.
... the nice thing, even if the text is not there, the rendering of the
text is different than the DOM order of the text, it will always be that
way
... so when you come back to the document, it will still consider that
to be part of the document in that order
ivan: one thing to make note of, we may want to talk to CSS
people
... (?) project that has started, to try to open what rendering engine
does
... may be that we have another version of findText that works on the
CSS object model that they produce
... which takes care of these rearrangements
shepazu: needs to be dealt with. Not the only thing we need
to talk about with CSS
... also, once you have that range, how can you style it
... once you find the result, how you highlight it once you have the
result
... one thing you'd do today is take the range, surround with span with
class
ivan: doesn't always work
... the range spans over two paragraphs
shepazu: have to chunk it. It's ugly. If you have a hundred
annotations you don't want to do that
... need to be able to style ranges arbitrarily
... outside scope of this, inside scope of robust anchoring discussion
clapierre: do we have to worry about aria hidden role?
shepazu: could ask same question about visibility
... same class of question
clapierre: css visibility is not visible in the DOM
shepazu: Display none is not
... we need to deal with all of that
... there was an issue that was raised that said.. when I talk about
that I say when you serialize HTML DOM into text, I suggest using the
serialization in the DOM4 spec, somebody gave feedback that said no you
should use one of these other serialization methods
... maybe one of those others will deal with it. All good points.
ivan: we discussed with webform guys, small issue about promises
azaroth: welcome to folks in i18n WG
... how would you like to go through issues?
shepazu: before we got to individual issues, meta question
to mailing list
... You guys decided to file github issues, which is fine. In addition
to that, PRs also welcome
... You can just fix my spec and send an email or describe it in PR
... chances are unless there's a fundamental disagreement I'll simply
take a PR
Richard Ishida: we prefer github, much easier to handle conversations
scribe: what we'd also like to do is get a tag that says i18n that we can attach to issues so we can track and get notified
ivan: adding a label to the issues? easy to do
shepazu: issue 4, one of the more complex ones
... didn't know how to handle it in the spec
... the issue is, in the character counts of ranges should they be
unicode code points or graphemes or whatever
... I think that you guys were suggesting unicode code points as your
preference?
r12a: In javascript, if you have a.. you know supplementary
characters?
... unicode can encode around a million code points, there are 65536
slots in the basic multilingual plane
... utf16 you only need 2 bytes for each of those
... if you go above that for some of the newer characters then you need
4 bytes to encode them in utf16
... that is two code units
... in javascript doesn't know how to handle the higher level characters
very well
... so you end up with two things that are not actually code points
... it shouldn't be like that, it should be a single code point unit
... that leads to this question about whether we should do code unites
or code points
... I think that you should do code points
ivan: is it hte same in ECMAscript6?
shepazu: same question
... I believe they have a way of dealing with this in ECMAscript6
r12a: I believe, but not up to date
ivan: we may want ot say we rely on ECMAscript6
takeshi: current model
shepazu: I think they add capability to deal with code points
takeshi: add capabaility for additional characters (..?..)
shepazu: I think you can deal with it in other ways not
just regex
... I think this is an i18n issue, so we should do unicode code points -
that's yoru recommendation?
r12a: that's mine, not necessarily...
shepazu: you guys should tell us how to do it
r12a: the other question was about graphene clusters
... in unicode you can encode e with acute accent as a single character,
or individually
... supposed to be equivalent, but perceived by user to be a single
character
... what's perceived to be a single unit of text is potentially much
more complicated
... could be two or three characters
... but there's this concept of graphene cluster which is used for
editing process maybe a delete would delete 3 characters instead of one
... grapheme cluster boundaries intead of code point boundaries
... I don't think you should specify this in terms of grapheme clusters,
partly because they don't solve all the problems at the moment. Maybe in
ambiguous to users what's happening
shepazu: serialization and normalization should take care of that?
r12a: no, a grapheme cluster is how human perceives clusters
ivan: UX.. when they select on screen, what to they select
r12a: it varies
ivan: if you select something in web browsers, what do they select?
r12a: i think it still varies
... I don't know the details
shepazu: we should test that
ivan: what we should do is whatever the web browers do
bigbluehat: we should spec it based on the dominant case inb rowsers and say this is what we want going forwards
r12a: if all the browsers agree but are all doing it the wrong way...
shepazu: I'm sympathetic to both of those positions, we
should do what's possible, we have the issue, we shoudl test it and find
out if theyr'e doing the right thing or if there is a right thing. We
should mov eon.
... issue 5, avoid listing whitespace characters
... we can change definition of whitespace character
... issue 6, let people say case folding or not
... params none, ascii, unicode, language-sensitive
... they say we should say what language
... I think it should be the language of the docuemnt
... the api doesn't need to say that, the document already says that
... the algorithm should include that information, but not necessarily a
parameter
... I know there can be mixed language docuemnts
... I think that shoudl be dealt with in the algorithm
... Not necessarily as a spearate parameter
... I'm okay with the spec dealing with it, not the API
r12a: the qeustion is, if you want to search for a particular word in a document, you maybe be able to get the computed language of the docuemnt text, but how do you get the search text?
shepazu: the document says what language it's in, and it
knows when it's serializing it it knows when it should be doing case
folding, so when it serializes into string it knows how to perform the
operation
... not just document based
r12a: that's for the text in the document, but if a user types in in a field their search text..
shepazu: that's a UI decision
bigbluehat: the api needs to know what it's being given
r12a: are you looking for a turkish word?
shepazu: maybe we should be looking at it... afraid to make the api larger and more complex. Need to make sure we need it.
ivan: There's one.. not necessarily only capitlisation, but
in some cases when I search French, I can type in the search term
without the accent and it will find the relevant french term without the
accents
... so the found term might differ from the search term only by the
missing accents
... which is not necessarily same as editing distance
r12a: that's in another issue
<annbass> (this is how I figure out where the appropriate accents go, on French words!)
shepazu: issue 10, option to have ascii case folding is not
useful. I'm fine with that.
... issue 7 is you'd like us to use charmodnorm as normative reference
for ... I'll ask for clarification in the issue
... I'm find with referencing charmodnorm
... issue 8, four parameters, including canconica, compabitibilty and
all
... I included all because of my ignorance... It hink we need back and
forth so I understand the issue and we'll resolve it
... This is for unicode normalization
... there are multiple kinds
... Before I didn't have anything about that, I just picked one. These
guys think that I should.
... Still don't know if this is something browsers are wililng to do
... I included it cos i18n said I should, we'll have back and forth
... issue 10, ascii case folding option is not useful
... that's fine
... when I was reading charmodnorm it seems like it would be useful
r12a: charmodnorm addresses two different scnarios, element
names and markup, and the other is natural language coment, th estuff
people actually read
... ascii matching is useful for thing slike css identifiers
... typically css doesn't concern itself with ascii case, but you can't
extend that through other languages tahn english
... so it's only in the case where you're talking about syntactic
content that ascii casefolding is really useful
... never useful in natural language processing
shepazu: I'm fine with that
ivan: we may want to search in html content that includes a
pre with javascript content
... for those you fall back to the ascii
... you're not in natural language
shepazu: I don't think you will have aconflict
azaroth: a high level question, what would the default be?
just unicode?
... what would be specified?
... if the text was ascii and you wanted to have casefolding, for
example searching all lowercase, you owuld say casefolding is unicode
rather than (?)
... it's not that all casefolidng would go away, just that that option
would go away
... unicode is a superset of ascii therefore we don't need ascii
shepazu: we can drill into issues later
... issue 11, unicode equivalent type not all clear
... I added all, you guys informed me there's already a way of doing
most promiscious match
r12a: there maybe additional issues
shepazu: raise as other issues
... issue 12, order of case fold and normalizaiton in algorithm, should
reverse, I'll change that
... the algorithm is going to change dramtically anyway
... the last one is issue 13, several different ones, I'd rather you
broke these out into multiple issues
<fsasaki> [just for the minutes as background for the case folding discussion: unicode provides a case property http://unicode.org/faq/casemap_charprop.html and that is for all characters not just the ascii set]
shepazu: One of them is the oe for german transliteration
... We need to break down into several things I can understand
r12a: there are all sorts of things you can do to match
text. Ignoring accents is a common one. THere are other things depending
on language
... SOme might be syntactic stuff like fights vs fight
... Recognising grammatical language specific differences
... it hought we had that in charmodnorm already but we don't
... we don't know all the answers yet, but know that there are a lot
more
clapierre: includes punctuation?
<fsasaki> http://www.w3.org/TR/xpath-full-text-10/#ftmatchoptions
shepazu: edit distance would handle a lot of things like that
fsasaki: maybe you're aware of xpath full text
specification
... various options related to language, stemming, useful to look at as
background
shepazu: I do have to say that as a design decision we want
to keep this as simple as possible but no simpler
... Not sure how much the browser vendors will be willing to implement,
two different normalization algorithms, we should dtalk to them to find
out
r12a: what's important is some of these things like dia(?)
stripping(?)
... important for users
... normalization stuff people don't normally know about
... some things people know that they want to do this type of search
shepazu: there should be an option to strip out all dia(?)tics
azaroth: now we relocate
... 1f middle hall a
WebPlatform meeting, then lunch
<fsasaki> http://www.w3.org/2015/10/its2-and-web-annotation/?full#1
tag set often used to state something should be translated or not
Translate, Text Analysis, also includes system for annotating parts of text as language components
showing examples of HTML and XML usage
<span translate="no" its-ta-ident="ref="http://..." its-ta-class-ref="http://...">W3C</span>
translate is from HTML itself
its-ta-* are from ITS
growing interest in using these types of annotations in other formats
RDF, JSON, etc
ITS 2.0 ontology exists
translate attribute becomes an RDF property (for example)
for JSON there is an `annotations` key which contains the ITS annotations
JSON ITS interest is from the localization community primarily
what role could the Web Annotation model have to fulfill these requirements?
what things can Web Annotation use from ITS?
could Web Annotation benefit from the data types in ITS?
it is not the purpose of todays discussion to file issues for Web Annotation, just to inform the group of the ITS efforts with regards to annotations
it is possible via ITS to have the annotations separate from the original content
via Natural Language Processing Interchange Format (NIF)
NIF developed by various EU projects
there are related APIs by these projects--for example FREME API
@types examples include `nif:String` `nif:Context`
methods for text selection (`anchorOf`, `beginIndex`, `endIndex`)
also has a "confidence" rating generated by the tool
@id is a URI which includes a #char=27,30 (for example) fragment selector
perhaps there are general mechanisms that can be shared
interest in feedback on the ITS work from this WG
ivan: I would like to go back to the markup examples
... from the annotation model point of view these examples show one key
discrepency
... in the Web Annotation model there must be an identifier of the
separate content
... in the examples, the annotations are inline, they do not have
identifiers, they do not have targets
... that being said this markup approach has many usages
... so do we want to deal with scenarios where there is no target?
... should we have annotations with no target identifiers (i.e. the
target is a blank node)
... there do seem to be scenarios where this would be useful
... in the CSV working group, the targeting is useful
... but I can be forced into a redundancy
... I would have to make an identifier that is circular
... that may be an aspect worth considering
fsasaki: the markup example can not be easily expressed in Web Annotation?
ivan: it can be anchored via our selectors
... but when you want to use JSON (etc), then you force the developer to
use redundant data
azaroth: the higher level point would be what use cases are
there where the information is being provided by someone other than the
content provider?
... of course if you own the content, you can put in whatever sort of
markup you want
... but that scenario is less about what we're building
... which is more about content the person annotating does not control
... does ITS have these use cases?
fsasaki: yes. that's also why exploring Web Annotation is
interesting
... someone produces HTML content, and someone else wants to enrich it
with separate language related content
... if I were to put all that content into the HTML, then it would be
overload
... there can also be overlaps with what's created
... there's no demand for hierarchies
... it also works if you do not own the content, but only export it for
translation
... scenarios where lots of people need to look into the content
azaroth: so in the translate case...
... is this the translation? or is this just related to the act of
translation?
... here is something that could be translated?
... vs. something that points to the translation?
fsasaki: this could also relate to company policy and process---this should be translated, this should be translated
azaroth: so the target would be the W3C in this case?
ivan: not necessarily, it's the specific found instance of
the word
... which is where FindText API could come in
azaroth: the body of the annotation would state what is and is not annotatable
ivan: yes. that sounds right
fsasaki: it would be provided with ITS as additional RDF triples
azaroth: in the current working graph, it would need to be separate from the Annotation expression, but it is possible
ivan: minting a URI for the annotation itself seems unnecessary in the ITS use cases
azaroth: a URI for the annotation is a SHOULD
... the more interest point here is the NIF URI pattern
... it tries to encode quite a bit of data
fsasaki: it's actually changed. it's also separate as
`beginIndex` and `endIndex`
... there's also a `wasConvertedFrom` which explains where the content
came from--HTML, etc.
azaroth: from my perspective, it looks like it can all be
expressed in Web Annotation currently
... what we don't currently have is confidence score
fsasaki: this `taConfidence` is ITS specific
... there's also a place to put a URI for the actual tool that produced
the ITS annotations
shepazu: there used to be a confidence score in an earlier
version of the FindText API
... I took it out because it was application specific
... by which I mean use case specific
fsasaki: `taConfidence` is about "text analysis confidence"
... there's also provenance information which can be stored
... originally we'd wanted confidence statements everywhere, but they
can prove pointless without provenance info
azaroth: it seems like a useful exercise to pick some
selected examples
... and see what's missing in Web Annotation to express ITS content
... and then for us (WG & ITS folks) to see if that expression is
useful outside of the NIF context
... if so, then we may want to change the model---otherwise, they could
be presented as available extensions for those unique use cases
fsasaki: there's an upcoming meeting in November--a hackathon--where these would be useful
azaroth: yeah, that would be great
... canonical examples of all the features you'd like to discuss
... we can iterate those from there
fsasaki: adding these items to the body seems to make sense
... ITS can be used without NIF
... and it may be that those make more sense in Web Annotation--perhaps
as an extension
... NIF is just one use case for ITS annotations
azaroth: these examples may also prove useful for analyzing our "motivations" / "roles" list
<fsasaki> http://www.w3.org/TR/its20/#basic-concepts-datacategories
azaroth: are there examples of ITS and NIF expressions that
you could link us too?
... for example. we have an "identifying" motivation, but not a
"translating" one
... some alignment there would be helpful
... from the DPUB side, we also have some areas where this would be
useful
... if you had one example per category and what they'd be used for
... then we can see which make sense as motivations/roles
ivan: i'm a little skeptical that these may not be general
enough
... but perhaps as extensions
azaroth: motivations are skos concepts
... if your data categories could be defined as those, then we'd have a
very easy time mapping them
fsasaki: we're currently working on the vocabulary, so this
is good timing
... some of these are text analysis specific
... but some may be broadly applicable
... we've also had a discussion about directionality
... here's a string, here's a substring within it that has a
directionality from right-to-left
... it could be used to provide helpful information for using to solve
for these scenarios
... we're not sure what the mechanism could be, but it is something we
are exploring
... and also JSON annotation aspect
... sometimes people put HTML into JSON to provide language
information---which looks really bad
<azaroth> scribenick: azaroth
bigbluehat: We don't cyrrently have a json selector. Some hacks, like json pointer, but not a finished spec.
ivan: who?
bigbluehat: IETF. Also in json schema, wihich has also expired. If we can't point in to it, then we can't annotate json objects. Need to look at this
ivan: Make it clear in the model how it can be extended
bigbluehat: Not sure if it's in our charter to stir up the specs?
Ivan: I think we can just point to them. Especially at IETF.
<scribe> scribenick: bigbluehat
<scribe> scribenick: bigbluehat
azaroth: it's 2 pm. DPUB is showing up after the
break--because coffee
... the topic next on the agenda is testing
... perhaps we stay with i18n topics instead since we have these folks
here
shepazu: here's my reaction. let's storm the castle!
... I was somewhat encouraged, and somewhat discouraged
... we had affirmative, encouraging feedback from Travis at Microsoft
shepazu: we got feedback from someone at Apple, which was
less overtly encouraging
... but he didn't say we're not going to expose FindText
... more, I don't like the shape of the API
... he said he wanted fewer options, but then he also wanted to add a
feature
... it seems encouraging that they gave us feedback at all
... we should hold fast on the most important part--which is the edit
distance, imo
ivan: so, we had a small discussion at lunch, and I was
wondering if it makes sense to have different API entries
... one that would say programmatically give us what you do already
... a find or search API which exposes what browsers already implement
... but which you cannot currently access from JS
shepazu: I think that would tactically be a mistake
... the current implementation does not solve our usecase
... and without exposing the edit distance to implementation, then we
don't get what we need for our use case
azaroth: it solves giving the client api content about what is selected
shepazu: those aren't related in my view
azaroth: we could reasonably get a find api that exposes
finding a range of text in the browser
... the edit distance is a robustness question
ivan: if I have an ebook, for example, the content is frozen
shepazu: you mean, the edition, on that ebook reader, but when you exchange it with someone else, then they won't re-anchor
ivan: sure, but there are also use cases where content is
fixed, and for those, the lack of robustness is not a problem
... text books for example
bigbluehat: robustness can go on top of find api
ivan: the edit distance was the only thing a new or brady
concept for browsers
... the rest, sure, but could probably be done by regex
shepazu: maybe I had a different take
... leaving out this edit distance thing--which is different, etc--the
rest could be done with regex
... I believe his point was about regex, not about edit distance
ivan: separating the two things may make sense, it drives
them to expose search--which is currently not done for developers
... maybe first they will do the simple thing without edit distance
... and then later do the robustness thing
shepazu: ok. if that's what we can get---the find text
api--then....well....every browser does it difference
... I think we should not so easily give up robustness
... we need to push the issue of robust anchoring
David_clarke: i'm not sure the web apps audience fully
understood the value of robust anchoring
... sell them the value proposition first, rather than selling the
solution first
shepazu: paul cotton from Microsoft asked where our use
cases were--which we didn't really present
... and while we provide some in the spec, the interface was mostly
skimmed
ivan: there's a URI issue--the whole mechanism actually
serves two purposes
... 1. finding the text
... 2. serializing the identification of that text
... the identifier aspect is also very important
... and it's not in the document
shepazu: we should annotate the spec to point to use cases
... I think Paul Cotton has a really good point about having use cases
bigbluehat: An opportunity to split the doc into sections
to give info on use cases here, and outside. Such that browser vendors
could see which parts are native, and can give developers access to
... we can start them off in that direction. There won't be consensus
around that, even though there's similarity from the user perspective
... the robustness part shouldn't get caught up in that, and could have
its own set of use cases on top of that
shepazu: I would like to not separate them so they can be
considered as a whole, rather than what they do today.
... The solutions from the whole problem set will be different than if
you start with the narrow set
... Would like the conversation to happen.
... in the context of the specification so we have actionable things to
do. No one else has volunteered to work on it, and we have their
attention
... lets engage directly
bigbluehat: If we mix in robustness, which they see as out
of scope, it'll slow down exposing the find stuff which would be a huge
value add for any implementer
... the people to win over for robustness would be content editable
people
... they deal with it all the time
... translation, internationalization, etc. Find text with static text
is different
shepazu: Would rather we didn't throw the major use case
out before we try to engage
... they haven't said no yet
ivan: Don't know all the use cases off the top of my head,
we have two documents, one from DPUB one here. I'd like to see what the
% of use cases that are based on dynamic content
... and what on static
... Something that would help to decide which way to go
Antonio: Can you explain why the dichotomy matters?
shepazu: FindText now serves static case
Ivan: I can use it for annotations, but if I annotate
content that can change, then we need the robustness
... that's where edit distance comes in
shepazu: If you'regoing to design something for a limited use case that just does find in page, you might not have the extra params, and then if the API isn't extensible, you'd never get edit distance
bigbluehat: Can design concurrently, to make sure that the
extensibility is there
... core of what Im saying is that vendors don't have a motivation
... find text to me seems obvious
... if we can ship that sooner, and it makes lives easier...
shepazu: How things are serialized could be a bigger,
longer term problem
... I think there's disagreement on how you serialize out the bit ...
convert DOM to text
ivan: what do we need to test?
... more than one answer ... model, protocol and findtext are all very
different
... what do we test for the model?
ivan: theoretically, what I want to see, is that if two implementations create two graphs, what do they have to do in order to test?
azaroth: so in the LDP group, we provided a set of tests
that you could run internally against your implementation
... there was no centralized validation service
... each test was implemented in Java
... it would check to see if each direct container had hasMemberOf
relation (etc)
... and it would tell you if you had it properly
ivan: we went through the whole RDFa testing, that was an
easy one, because...
... this is the way HTML looks like, this is the graph I have to produce
... we created the markup, and then checked the triples it produced
... the starting point involved a bunch of HTML files
... what is the starting point for Web Annotation?
azaroth: client or server?
ivan: I don't care. however the annotation is made, it
produces a graph, and I have to see if it's correct
... I can't properly say what the process is.
azaroth: brain storming. we could provide a set of human
readable set of annotation scenarios
... create a comment on this URL, and then check their annotation with
the testing tool--there would be at least one right answer
... however, there will be more than one right way to do that
... unless we are very very clear about which one we want
... annotate this URL with some text--does it have a language? does it
have an string body? a remote body URI?
bigbluehat: we're going to need lots of tests....
... very specific to the very scenarios
ivan: if we make it too detailed, we loose it's value by
being over specific
... but if it's too general, then only a human can validate it
azaroth: the distinction is perhaps the difference between syntactic and semantic
bigbluehat: using json-ld playground is a good testing
ground.
... would be nice to produce along those lines
... may not count as good validation but help decide how much to
implement that
... taking the json-ld playground and adding text graph output.
... only compacted or only in quads
ivan: combining automatic comparison of the graphs was not
something they used because ... complicated to setup a processor /
install.. practical issues
... in the RDFa case it worked really well b/c you gave the graph to the
processor and at the end it produced an output
... playground leaves it back to the human in the end.
bigbluehat: I normally copy/paste around.. and add things to the @context if they don't exist ... in order to validate to check the basic cases at least
ivan: In the case of RDFa we had various versions (b/c of
HTML5..) between 200-300 tests per category
... a full RDFa processor is probably too complex
... if we come up with a reasonable set of things e.g., 60-80 tests
roughly.
... looking at tests by humans is not unreasonable in that case
... I think it is doable
bigbluehat: If I put the hypothesis tests.. 2/3 tests.. at
least for a first test is helpful to understand what is close to an
implementation
... there is also a visualizer to help create
ivan: If you need help, there are people.
... eric How do you test implementations?
erikmannens: I have to check with the guys.
... it is at the protocol level
ivan: we will come back to the protocol level :)
erikmannens: will come back to you
takeshi: it is not necessary to implement the selector for
instance for us (Sony)'
... rdfdiff helps us with this
... as of today, basic processor to test / how to validate the output...
not sure if the output form the JS is correct or o not, but at least
what's stored in the DB is correct.
ivan: when you say check.. how is the check done?
azaroth: IEEE (?) ... they use a tool to give a URL..
essentially creating an internal object structure.. it has its own way
of going at it to check
... here is an example of an annotation of this feature and that it. If
all those working great, and here is what's misformed then ...
ivan: but yo ucheck as a human?
azaroth: No.
... tries to generate the structure internally.
ivan: conceptually speaking it is a dedicated ..(?)
azaroth: The take away from earlier work is .. you've got something missing from your implementation, and here is what is required
ivan: we would put an informal annex for framing. we then run the framing algo for the playground. can the playground actually compare that?
bigbluehat: ours just shows it to you.. matching at the graph level is not clear
ivan: if we use straining it is easy for the human
bigbluehat: .. we don't have to give it to the graph if we
use keys.
... Doug we want to make sure the triples are right (i.e., prior
discussion)
ivan: generated structures are relatives are small for the
tests so humans can make the comparison fairly easy
... formally speaking we have two approaches.
bigbluehat: what we (hypothesis?) don't have is prebuilt
code
... I'd like to see something like JSON playground.
shepazu: arne't quads in turtle?
bigbluehat: no
shepazu: what about json-ld (re: N3.js)?
erikmannens: will check
ivan: check if N3.js understand nquads and JSON-lD
shepazu: what is its output?
bigbluehat: It has an internal model
shepazu: you can write a parsing engine and... test against that.
bigbluehat: we don't want to writ einto other people's code necessarily
ivan: with N3 we can check the Turtle serialization..
ivan: Annotation, DPUB
clapierre: Accessibility of DPUB
Karen: also on DPUB
azaroth: Annotation
Jeff: DPUB IG
bigbluehat: hypothes.is project..
... read ebooks occcassionally
rhiaro: U of Edinburgh, social Web WG
Ann: free agent / Social
erikmannens .. open source .. to publish ebooks
Antonio: W3C JP
Olivier: BBC
Ralph: W3C
shepazu W3C
takeshi Sony
azaroth: We brainstorm for Annotation/DPUB
... DPUB produced a set of use cases (sometime last year) which we
looked at in the ANnotation WG. Those use-cases are somewhat transformed
into deliveranbles.
... recently, some UC for accessibility
<tzviya> DPUB Annotations Use Cases: https://www.w3.org/dpub/IG/wiki/UseCase_Directory#Social_Reading_and_Annotations
azaroth: we could go into more detail on those
<Ralph> Digital Publishing Annotation Use Cases [W3C IG Note 2014-12-04]
azaroth: we don't have to walkthrough Annotation UC as published
Brady: DPUB
... ... since we are not familiar with what you published..
... we would like to talk about accessibility UC
azaroth: we have 3 publications. data model, protocol,
client API
... protocol, we've been discussing but haven't produced a new draft.
... tomorrow we can go over
... the model changes
... we can go over what each sections want to do
... trying to produce normative requirements
... now we have each section with short intro and use-case
... rather than a prose
... with JSON-LD/Turtle/Diagram
... technical change tho the model is the role ot be associated with the
"bodies"
... changes the way tags are produced
... here is a resource and it is "tagging" ... as opposed to a tag
... also for editing.
... the protocol is based around essentially REST and HTTP. Update and
Delete annotations. We have notions for a Container
... and y[et to be published, but hopefully after tomorrow.. an
integrated way to say a list of annotations or a collection of
annotations that can be broken into pages
... annotations or lists are great of interest to (idpf)
... At the moment we are trying to align that UC with the protocol
shepazu FindText API is basically exposing 'find in page'
shepazu: functionality to as a DOM API
... the current idea is to include a 'edit distance', a fuzzy matching.
... there was some feedback to handle this with regex
... the basic idea is to let the API to pass in the params and so it
ends up with a text result from the document and that returned to you as
a range
... one of the features in a page API is a prefix and suffix.
... in case you have multiple instances of a word
... which instance is of the selection that you are trying to find
... and given that, you have a URL syntax, a fragment identifier , this
string to be set of creative to search in the page.
... so you could use it as a primary identifier
ivan: the identifier represent the search
shepazu: terms of the search
... certainly a long string
ivan: well we have CFI
shepazu: CFI
... lets say we are working in a browser, I select something, and store
the selection, and the prefix.. and some other things, and store them as
annotation, and in this case you might not have a body..
... so, then you might share that or keep it for yourself. for the
latter, the next time yo ugo to the next version of the book, the
annotation still stands
ivan: to be very precise, the reference to the selection,
the identifier ot the selection, is the spark of the annotation
... annotation means I give you the id to what I annotate and what it is
annotating
<scribe> Unknown: how much sof the source document is part of the selection
Brady: I ask from digital publication and ebooks, there are
significant restrictions as to how much you can copy
... at some point you have to stop wha tyou can select and annotate
azaroth: .,.. There are two types: here is an exact match, tand the second is a char offset
Brady: at least for CFI, for char offsets, there are a number of implementation issues
ivan: one of the reasons why this is coming up
Brady: difficult problem
... still painful
shepazu: lets talk about it "off camera"
... or "in camera"
... This was tried. hathitrust case with storage and fair use
... that doesn't remove you from contractual cases, but copyright
... lets not go into that here
bigbluehat we don't have the perfect solution but not telling you about it :P
Brady: I guess it just takes money :P
ivan: for DPUB people is important.. the model document, the Annotation POV... and epub world there shoul dbe a change reference.. (csarven: okay I lost all the references here)_
shepazu: Rather than storing strings, we store hashes
... we have thought about the problem but no solution
Range finder was discussed.. which I assume is just text
^^ tzviya
shepazu: maybe range finder or.. to be this text or
non-text to be part of content
... I might have a picture or a face to be highlighted. I think that's a
larger thning. different things for different media types. as long as
covered by media annotations
... we decided to start smaller, and solve the already smaller difficult
task of text search
Brady: can I select 498 image?
azaroth: in current state or..?
Brady: or resource
... I have this collection of images and I want to make sure that I go
to the right page
azaroth Which scope?
bigbluehat Within the scope of page 5
Brady: In my case, it will be page media
... and image might appear on different pages so i want to know which
bigbluehat We call it scope currently.
shepazu: There could be sophisticated URI, selecting part
of the image being stored so that it can be matched against other things
on the page
... an app to compare this fragment on the page to compare with other
images to see if it is included
Brady: I want to bookmark page 50, and create an image only
for that
... i want to be able to find that on a device now
bigbluehat We are talking about an XPath selector
scribe: any fragment selector
... also exploring CSS/Xpath for canonical selectors
ivan: the model is such that for generic term "selector",
and can be extended
... it is not there yet, but XPath can be such selector
bigbluehat CFI already is
ivan: CSS can be one of the selectors
bigbluehat: you can encode a complicated selector and never put it in the URI
ivan: oi think the CSS selector is under utilized.
... reselector is extremely powerful
... using that to locate the target of an annotation
azaroth: Continue on for 5 more minutes?
tzviya Talking about.. our vision for what we'd like to see .. portable web pub. offline being not first-class
tzviya: a lot of work going on in CSS WG
... and ARIA
... issues that this raises.. identifiers are prominent.
... we really focused on heading in that direction
... a lot of the work is involved in international publication
... updating EPUB and what W3C is coming form
bigbluehat: From the W3C side, multiple things expressible in a single package.
ivan: That's a question of the packaging
bigbluehat: I mention it in relation to the protocol.
ivan: Don't open all the worms
tzviya we explored a lot of packaging option
tzviya: definitely open
... talking about it for long time
... right now, to read publications online
... publication object model is a proposal
... welcome to join
... we need to have some sort of an object model, what should be based
on is open
... and conversation with service workers
ivan: to update what's happening is that. there is a
packaging format i.e., EPUB, zip based.
... we know there is some work in W3C, mainly in TAG to create a
packaging format.
... the info I got so far is still a bit distorted.
... unclear what the outcome of that work will be
<azaroth> web packaging: http://www.w3.org/TR/web-packaging/
ivan: from our POV, publishing community can say give us a packaging format..
Brady: from my perspective, we proposed multitype mime.
... every time we proposed it people were confused and asked why didn't
you use zip? So, we used zip.
... now we have toolchains, and sorts of code around ZIP.
... if we can turn back time, multipart mime.
... and reality on the web, i prefer service workers
:)
Brady: at least for the srevice workers, and not having a
package makes more sense
... especailly for delivery, and in that case, we don't care about
streamability
tzviya: I try doing with multipart mime, but then when you try making it work, there is not much out there.. only some old stuff from stackexchange...
bigbluehat when not doing W3C stuff, .. i do couchDB stuff to stream JSON
bigbluehat: for couchDB that's super useful b/c we don't have to read the whole disk
Brady: It sounds really interesting, but not interesting enough to do anything about it (so says some people)
azaroth: This has some impact on this WG, b/c it has
offline reading modes
... we should be able to accommodate
<tzviya> DPUB TPAC Agenda https://www.w3.org/dpub/IG/wiki/Oct_2015_F2F_Logistics_and_Details#Schedule
azaroth: at the moment we are looking at ActivityStreams
... to be discussed tomorrow with Social WG people
ivan: In a sense, the way we ... where we are getting ...
the usage of service workers, it can fool the main reading system that
it can believe that everything is accessible through Web (on/offline)
... b/c the service work will catch the HTTP request and deal with it
(cached or not)
... for the time being, the service worker is... - I don't want to say
scifi - only one browser implements it
<bigbluehat> Web Packing format (based on MIME Multipart) http://www.w3.org/TR/web-packaging/
ivan: so, some fuzziness there
... that may be that annotation work may be released of this issue
Jeff: .. [csarven: sorry, couldn't track this]
ivan: each annotation can have a role
... We want to annotate @alt
tzviya So it is not visible content
shepazu I'm not sure how we can do that
scribe: I can't think of how a UX would expose that
bigbluehat: The browser can switch off [stuff]
shepazu: From a UX perspective it makes sense
bigbluehat: If you can turn the images off, it is possible
tzviya: This is beyond the scope of DPUB
shepazu: I'm skeptical
having assumed a web resource
bigbluehat: DOM expression to the user.
charles: maybe @alt is in that content stream
... the visual representation is not the only representation the browser
can make, e.g., it can be audible
shepazu: I think tzviya was talking about.. what's exposed to the API
shepazu: , not ncessarily what's in a web page
... there are things that can be done for @alt e.g., screen reader, but
hard for a browser to do it
Jeff: can a blind user create it and a visual user can find it?
bigbluehat: It really depends on what the visual text is given
shepazu: There is a visual aspect to text
shepazu: ultimately all text can be considered part of
range, so not sure if you can't the text. It is possible, but probably
has some issues which needs to be figured out
... what you are selecting as a range is the main issue
ivan: The <details> in HTML5, ..
Ralph: Without talking about interfaces, ... whcih is then available the user not using that system
Jeff: so we can annotate anything
shepazu: Anything. Even scene descriptions
ivan: We should have a way to incorporate this
... and not throw away
azaroth: also check the model so that we can.. .make an example of @alt. and how to represent that.
azaroth: is there anything further from DPUB side to
discuss?
... welcome to hangout with us.. next topic is.. what we can accomplish
before next TPAC for client side
Karen: what would be the top implementations to expect?
shepazu: Realistic for us to say... a data model - finish that with multiple representations/serializations
shepazu: Some find text, polyfill thing? possibly protocol
as well
... browser extensions
... supporting annotation data model, findtext as well
... we can hope that a browser supporting some parts of this
... that's not yet clear
... within a year we could at least get a findtext implemented in a
browser - even if it is a stripped down version
... b/c webannotation is a bunch of moving parts, so I don't think we
are goin gto have the full annotation system in a year
... if you want the implkementation, hypothes.is will probably have it
in ayear.. but that's a browser extension
bigbluehat: the selectors we output match spec.
bigbluehat: if we get xpath in , and define how we do lists
in..
... I have written some translation code with WebAnnotation, and give it
to Hypothes.is group...
... the annotations are in public and public domain
... the hope is that, we get up to the protocol as well. least likely to
get done, probably because not done as a draft yet either.
... that part is moving already internally due to other forces
... there are libs that we have ... if we do CSS/XPath, there is plenty
of code laready
... if we ship nothing in a year, we can confidently deliver the data
model at least
... RadiumJS and (?) JS
ivan: essentially implementations may appear in readers
bigbluehat: these are usually last mile problems, but somebody has to tie them together
Karen: Academic publishers/Journals ...
... I will take what you have discussed and go over it.. headline would
be; academic journals will be soon able to such and such from W3C's
stuff
shepazu: Are you afraid that they'll talk down to you? ;P
shepazu: The way to store and share annotations - that's the data model - the way to write annotations - that's the protocol - I'm not saying teach them these terms...
shepazu: the way to write and store.. in the cloud.
... open up annotations .. ther eare certainly ways to talk abotu these
things like the real people talk
azaroth: Anything else? Overlapping with DPUB?
azaroth: remaining of the day, we can talk about what do
with the remaining of the charter
...
<bigbluehat> DOM related functions by tilgovi:
<bigbluehat> https://github.com/tilgovi/dom-seek
azaroth: The description of client-side API in the charter.. is slightly different
<bigbluehat> https://github.com/tilgovi/dom-node-iterator
<bigbluehat> https://github.com/tilgovi/dom-anchor-text-quote
<bigbluehat> https://github.com/tilgovi/dom-anchor-fragment
<bigbluehat> https://github.com/tilgovi/dom-anchor-text-position
scribe: So, do we think that .. a Python implementation of that would not be very useful.
shepazu: If we can get one browser implementation, and one polyfill
ivan: that would be ideal
... process wise
azaroth: A server-side findtext in any language...
shepazu: No, it is a client-side API
azaroth: Unless under robustness.
bigbluehat: Things being conflated in findtext
shepazu: If you ar etalking abotu the fragment identifier, but I was talkiung about the findtext API
bigbluehat: I agree, but I don't think that's how it was presented earlier. If we can separate that..
shepazu: Fragment identifier being completed in a year is not clear.
ivan: I have no doubt that we can do that in a year
bigbluehat: That's the easy part
shepazu: it took the media fragment group 3 years to do that.. so, sure we can do that in a year, but i don't htink it is trivial
<azaroth> http://www.w3.org/annotation/charter/
shepazu: Can we get it wide spread instead of the implementations?
bigbluehat: I don't think we can put it on our shipping b/c
of that
... hopefully changing towards more implementable
shepazu: I think Charles brought up an interesting point. I don't know how to represent a column in HTML.
azaroth: How to annotate a column
bigbluehat: You can do that in a browser right.. clicking random cells
shepazu: Multoiple discontinous selections
... Actually ou could use findtext
tzviya: ARIA and CSS are worlds are apart
... Between CSS and ARIA roles you could solve that.. It is a very long
conversation.
... So CSS selectors.. you can select whatever, and use ARIA roles to
say this has roles...
... col doesn't exist, but colgroup exists
... the table models are most robustly defined, b/c it is really hard to
navigate a table
... so sit down and talk to ARIA people.
... ARIA/Annotations could be really accessible. Doesn't say much now,
but it can be extremely valuable for a13y
shepazu: Sure.. lets talk about how that world fits in
... the ability to serialize out what you selected or having being given
selection and recalling that in the document
... i don't htink ARIA is in the right place. I tis a way oyt express
something.. but I could be wrong
azaroth: Back to high level topic
... we can finish the model
... there are some outstanding issues we can deal with tomorrow
ivan: i think we should wait for the others, not sending CR by Dec.
bigbluehat: Discussing before sending to the list
azaroth: For the serializations however, JSON-LD... but HTML something comes up
shepazu: I'll show you later, but I have a prototype.. HTML
addresses some of the things brought up
... for any out of band content
... comment, footnote
... if example it were to be a footnote, a single fn can be referenced
in multiple places in the doc
... if we had a native note element, I'll ahve a mapping between a note
lemenet.
... something I've been tinkering with
... Two way mapping
ivan: Either using RDFa or using even extra attributes like
ITS did
... there is an existing mechanism to go through
... I am concerned by adding new elements
... we don't have an extension model in HTML
... the web components is for the time being is up in the air - only one
company doing?
shepazu: This WG won't necessarily accomplish everything in
a year.. so lets concentrate our energy.
... I propose, in parallel, we can start on work that could be done.
... we hav ea list of areas, but not a list of specs.
... so a set a of specs in a year-two would be great
... we might not re-charter, unless we hav ea concrete list of stuff
(which can map to the spec)
... they might say that if something doesn't match a spec, it might get
dropped
... at least the ground for the next charter
... Does anyone object to that approach?
bigbluehat: Don't mind if we have clear deliverrables, but some things will fall through the cracks
shepazu: I want to make sure that... this work might take longer.
ivan: lets separate these discussions
... for tomorrow, what to do for the coming yera
... provided that this will get done, how much energy will the other
stuff wake .. considering that most active are probably 10 people
shepazu: Do we have a general agreement on what we can deliver ..
bigbluehat: We need to repost the 6 points in the charter
and map it to what we can deliver
... some stuff is super big
shepazu: So, the way that we struck upon is 'selection' is
a pseudo-element in CSS. So, the range we get from the API is ... will
provide as a range, once we have it, it iwill register in the document.
and name it.
... a pseudo-element ... [csarven: I lost it]
azaroth: Data model, protocol, vocabulary (poart of data model), JSON-LD (part of protocl), not yet decided, search or notifications as par tof the protocol, plain text API
ivan: what about URL?
bigbluehat: if anyone wanted to do it go to IETF and then we can incorporate it
shepazu: We can certainly try, erikmannens you can test how long it took to do media fragments
<bigbluehat> seems DPUB really wants it...so...we should talk! :)
ivan: XPointer like framework...
shepazu: hash something
azaroth: The point is IETF media fragments, .. and existing defs how those specific...
tzviya: Can we have a meeting with DPUB? Sounds like a large deliverable.
shepazu: Probably small with a lot of fight :)
bigbluehat: the call time is imporant/limited.. and fast track that to get the specs out. That's primary ..
azaroth: Thanks all
Summary of Action Items