14:44:30 ivan has changed the topic to: Meeting Agenda 2022-01-14: https://lists.w3.org/Archives/Public/public-epub-wg/2022Jan/0000.html
14:44:31 Chair: dauwhe
14:44:31 Date: 2022-01-14
14:44:31 Agenda: https://lists.w3.org/Archives/Public/public-epub-wg/2022Jan/0000.html
14:44:31 Meeting: EPUB 3 Working Group Telco
14:44:31 Regrets+ toshiaki
15:01:29 scribe+
15:04:29 dauwhe: today we have 3 issues which were all filed by TAG, but all on the same theme of privacy and security
15:04:52 Topic: TAG's Privacy and Security comments
15:04:57 ... that was the main concern of PING of TAG, that we have not said much about the threat model involved in epub, about handling PII
15:05:02 https://github.com/w3c/epub-specs/issues/1957
15:05:05 ... our goal is to address these
15:05:12 TOPIC: TAG Review Issues
15:05:42 dauwhe: this first issue is about PII, could we do something to discourage collection of PII, can we recommend RS are clear with customers about what is being collected? https://github.com/w3c/epub-specs/wiki/Privacy-and-Security-for-EPUB3
15:06:10 ... wendyreid wrote a summary of the issues for us, and some of our possible responses
15:06:56 ... the industry has seemed to settle on policy of user agreements, but the general public is probably not aware of how much information in being collected in this process
15:07:47 ivan: just to make it clear from the w3c point of view is that we need to provide a section that documents problems and guidelines for what authors and RS should do to address the problems
15:08:10 ... there is no requirement to change the spec
15:08:21 wendyreid: i can give us an overview
15:08:57 ... the way I have written this is that because we are talking about privacy and security, there are two parts to each of content authors and rs
15:09:14 ... for both security and privacy, i wanted to lay out our objectives
15:09:49 ... preserve confidentiality, content integrity, transparency
15:10:39 ... threat modelling for content: falsification of creator information, remote resources, etc.
15:11:39 ... recommendations (mostly around privacy aspect): protect users from threats, avoid collection of data, "content processors" should be careful 15:12:21 ack dl
15:12:34 ... for RS threats and recommendations are also set out, with threats like rs spoofing
15:12:54 https://github.com/w3c/epub-specs/issues/1959
15:13:06 dlazin: from disclosure point of view, Apple now requires that apps have a user data collection "nutrition label" in the app store
15:13:41 ... not all rs are apps (i.e. some are stand alone devices), but this means that there are already some higher level requirements in many cases
15:13:59 ... but also, the most common RSes come pre-installed on device, not via app store
15:14:39 wendyreid: other considerations like CCPA also come into play here, but these recommendations apply to
15:14:54 ack ric
15:15:04 dlazin: we could model our response based on apple's list of things they do with user data
15:15:24 rickj: the more specific we get with the current state of things, the more revs of this we will have to do
15:15:45 ack dug
15:15:53 ... so we should err on the side of common sense, and maybe reference other places that deal with this sort of thing, but not get too specific
15:16:23 duga: we should focus on epub, even though rs do lots of things that don't specifically have to do with epub format
15:16:53 ... odd for an epub format spec to try to tell rs what to do with other formats, or as a UA generally
15:17:17 PING Target Privacy Threat Model: https://w3cping.github.io/privacy-threat-model/
15:17:38 ... also, the rs privacy policy doesn't apply to the publisher - e.g. if a publisher includes a tracking pixel, rs can't control that
15:18:05 dauwhe: the industry has gotten in trouble before - e.g. ADE sending unencrypted user information back to Adobe
15:18:27 ... i looked up the policies of a few major epub retailers
15:18:49 ... e.g. Apple says they anonymize everything, but Kobo doesn't
15:19:07 ack dauwhe
15:19:13 ack iv
15:19:14 ... so I'm a little less concerned with how other specs handle privacy, because there are specific user expectations about privacy when it comes to books
15:19:48 ivan: we have 2 specs, content and rs. And wendyreid separated the threat model into these 2 parts.
15:20:14 ... in the rs part, we already say things about origin and other security related policies, so we aren't complete silent
15:21:03 ... indeed there are areas where spec is silent, but the general expectations that a user should have over privacy are probably in scope
15:21:04 ack wen
15:21:11 https://html.spec.whatwg.org/multipage/webstorage.html#privacy
15:21:48 wendyreid: one thing that separates us from general web browser is that there are book related affordances that RS are expected to do for user, but some of these rely on collection of user data
15:22:00 ... but users don't think this way, they just expect that these features are there
15:22:16 ... e.g. collection of data for annotations, could syncing
15:22:38 ... so we would be doing our due diligence by providing some guidance to implementers
15:23:08 ... the reality is that these recommendations aren't normative anyway, but we are being good global citizens by doing so
15:23:13 ack ric
15:23:37 rickj: as a reminder, the epub marketplace is also the dominant format for education, an the customer is not the user. It's the institution.
15:23:58 ack geo
15:24:02 ... so we need to be careful when we say things that affect that use case
15:24:21 GeorgeK: there are also rs that track the individual student and how much time they spend reading, progress, etc. and report back
15:24:29 ... and many times teachers and parents can see that info
15:25:08 ack tz
15:25:11 ... i wonder if one of our suggestions would be to have the privacy policy available in a rs, e.g. in the help section, or if there is anything in the content that is phoning home, then it would be the publisher who informs people about that
15:25:59 tzviya: we need to keep in mind that we tend think of epubs as separate from the web, but there are a lot of websites that do similar things. We're not that different
15:26:38 ... but the "nutrition label" might solve the problem, by clarifying the user's position without scaring them
15:26:58 ... better than a user agreement, where user knows that they just have to click to agree or else the app won't work
15:28:11 ivan: UX people have come up with a vocab that describes the a11y issues that might be present in a given book, can we do something similar?
15:28:28 ... but most of the privacy issues are on the rs side rather than the content side, so it may not be that helpful
15:28:38 ack ivan
15:28:39 ack iv
15:29:33 ivan: since wendyreid has already started, I think this text should become part of the spec
15:29:56 ... so next step should be to open a PR to incorporate it
15:30:25 ... re. applications disclosing privacy features in general, maybe we should incorporate the labels that we want RS to provide
15:30:27 ack wen
15:30:51 wendyreid: I considered including specific examples of data collection behaviours, or how to communicate to users when these things happen
15:31:51 ... but I being specific might lead people to think that the examples constitute a closed list, when the recommendations are more like principles
15:32:00 ack dl
15:32:18 dlazin: do we know what the TAG wants? or can we ask?
15:32:44 ... most specs don't touch this. Really we just want to satisfy them.
15:33:05 ack tz
15:33:19 ... can they give us an example of what other specs have done in response to similar concerns. Rather than producing something on our own and risking it not being what they want
15:33:57 tzviya: privacy is increasingly important now. We're not just doing this to check off the privacy review box
15:34:08 Examples for privacy section: https://www.w3.org/TR/did-core/#privacy-considerations and for security section: https://www.w3.org/TR/did-core/#security-considerations
15:34:16 ack iv
15:34:28 ivan: the DID spec recently addressed similar privacy concerns
15:34:51 ... i.e. what implementors should be aware of when they try to implement, what privacy pitfalls are they likely to encounter, etc.
15:35:10 ... we also went through a similar process with audiobooks and pub manifest
15:35:27 https://www.w3.org/TR/audiobooks/#security-privacy <-- quite minimal
15:35:48 https://www.w3.org/TR/pub-manifest/#security-privacy <-- more informative
15:35:51 ... i don't quite agree that we're just doing this to satisfy TAG
15:36:28 ack dau
15:36:29 dauwhe: I can reply in the issues with a link to the document that we already have?
15:36:31 ack ha
15:36:36 ivan: we'll let them know once we have a PR
15:37:19 Hadrien: we have very very different rs, and for some of them, the fact that rs need to be distributed already applies some requirements (e.g. apps distributed via app store)
15:37:26 ... similar thing will happen with Play store
15:37:56 ... but no analogy process on the web - maybe just a privacy section of the page
15:38:09 q?
15:38:20 ack t
15:38:22 ... we could have best practices section about what to disclose to users, but not sure we can go much further than that
15:39:00 ack iv
15:39:02 Web Authentication has several sections on privacy https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-authenticator, https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-client, https://www.w3.org/TR/webauthn-2/#sctn-privacy-considerations-rp
15:39:15 tzviya: Web Auth has several sections on policy that might be similar to what we need
15:39:31 ivan: i wonder whether there are things specific to security that we need to call out
15:39:54 ... we know most of rs have been quite averse to using scripts, some don't allow it at all
15:39:59 ... mostly due to security concerns
15:40:18 ... so having a fairly good idea of why rs shouldn't allow scripts might be helpful
15:40:34 ack we
15:40:43 ... maybe say that content authors should really consider whether they need to include scripts in their content
15:41:42 wendyreid: i think the best we can do is identify some common threats that arise because of the way the spec is written, and the way content is likely to be written
15:42:18 ... in terms of recommendations, we could recommend virus checking as part of ingest, checking origin or links
15:42:36 ... security is tricky because we can make recommendations, but it will ultimately come down to the authors
15:43:00 dauwhe: one of the big problems with security is that Hachette might write the script, but then Google executes it, knowing nothing about it
15:43:11 ack dau
15:43:13 ack iv
15:43:40 ivan: you could say that a content creator on the web writes scripts, and then the browser has to execute it
15:43:55 ... but for ebooks, once you put something in content, those books won't be automatically updated
15:44:19 ... so a malicious script could stay other there for a very long time, whereas on a website, the content can be updated
15:44:53 ... the fact that the book becomes its own entity is a difference between book and website - might be worth pointing out, as a reason not to include scripts
15:45:19 ack ge
15:45:28 ... e.g. old versions of incorporated js libraries incorporated in ebooks
15:45:56 GeorgeK: what a user is reading is certainly private. Governments knowing what people are reading is something we should point out
15:46:12 ack ri
15:46:14 ... whether or not people are using assistive tech is also sensitive, we should call that out
15:46:50 rickj: perhaps we don't bifurcate by content and rs, but rather content creation and distribution channel
15:47:00 ack ch
15:47:27 CharlesL: the other thing is that the content creator could have done everything right, and then someone in the middle injects malicious code into the epub and repacks it
15:47:40 ... right now we don't have anything to do with signing ebooks, etc.
15:47:52 ack ha
15:47:56 ... so it really falls onto whoever is ingesting this at the end to make sure the content is safe
15:48:14 Hadrien: the top reason why rs don't like js in content is that it can mess with what rs does
15:48:45 ... rs will most likely always inject js to get desired result, which js in content can mess up
15:49:15 ack du
15:50:30 duga: we don't do js because of security, and because it's a pain. When a rs implements something in the webview they are limited in the resources they have available. But a browser operates at a higher level
15:51:34 ... in terms of serving content, when Hachette writes a book with a script, it's different whether that script is run via rs or via browser. When run via rs, the origin is Google.
15:51:48 ... very different security and privacy model
15:52:18 ack iv
15:52:58 ivan: about signatures on epubs, there could be. Would it be a good thing to say in spec that we suggest that publishers make sure of signatures
15:53:33 ... of course, all the intermediates that modify content during ingestion will be against that, but there are all sorts of technology out there for providing signatures
15:53 