01:53:59 <RRSAgent> RRSAgent has joined #webfonts
01:53:59 <RRSAgent> logging to https://www.w3.org/2019/09/03-webfonts-irc
01:54:01 <trackbot> RRSAgent, make logs world
01:54:01 <Zakim> Zakim has joined #webfonts
01:54:03 <trackbot> Meeting: Web Fonts Working Group Teleconference
01:54:03 <trackbot> Date: 03 September 2019
01:54:10 <myles> ScribeNick: myles
01:54:17 <chris> rrsagent, this meeting spans midnight
01:54:25 <rsheeter> rsheeter has joined #webfonts
01:55:10 <chris> chair: vlad
01:55:11 <myles> Vlad: one more administrative item: The future working group telecons
01:55:28 <myles> Vlad: Zoom was a temporary measure. Is everyone happy with Zoom?
01:55:44 <chris> Meeting: Web Fonts WG f2f meeting
01:55:48 <myles> ned: I'm not the only one who has concerns
01:55:53 <myles> Garret: I have no preference
01:56:05 <ned> ned has joined #webfonts
01:56:22 <myles> chris: The reason WebEx away is some group of people found the password for our previous bookings, and were racking up $xx,xxx of phone bills, so my account got cancelled.
01:56:49 <myles> chris: My certificate expired, so it will take a while. I'm in Japan until the 26th
01:57:01 <KenADBE> KenADBE has joined #webfonts
01:57:04 <myles> Vlad: I'm perfectly happy to continue using my Zoom account. I like Zoom. So far it's been reliable.
01:57:29 <myles> myles: Maybe we can switch to WebEx once it becomes possible
01:57:33 <myles> Vlad: Okay.
01:57:37 <myles> Vlad: same time, same day?
01:58:25 <myles> Vlad: We had a doodle poll to see which times were worked well, and 9:00 am PDT was best for everyone
01:58:53 <myles> ned: We expected more international attendees. We couldn't find a time that worked for everyone, or someone who indicated a preference doesn't regularly join
01:59:00 <myles> Vlad: One person is that. That person is an outlier.
01:59:12 <myles> Vlad: This time resonated the most with everyone.
01:59:41 <myles> rsheeter: Monday at 9:00am PDT is best for me
01:59:49 <myles> Vlad: Okay, we'll continue with this time.
02:00:04 <myles> Vlad: We can keep this until the end of the year, hopefully WebEx will be working then.
02:00:09 <myles> Topic: Evaluation report
02:00:32 <myles> Vlad: Because chris has been out of the loop for a number of times, I was expecting you wanted to pick up work. So let's have a recap
02:00:45 <myles> chris: I did get a recap on a recent call. But any additional information would be good
02:00:53 <myles> Vlad: Would it be useful if we had a quick summary right here?
02:00:55 <myles> chris: yes
02:01:38 <myles> Vlad: I wanted to recap work that we've done. Myles?
02:03:34 <rsheeter> chris: file attachments will be kept by archive, inline/html may be dropped
02:03:39 <myles> Vlad: Most of the data was inline images aren't available on the mailing list
02:03:45 <myles> ScribeNick: rsheeter
02:03:51 <rsheeter> (so myles images may be gone from archive)
02:03:52 <chris> Once again, aplologies for the W3C list archive trashing emails with embedded images. This affects many WGs. Profuse apologies for our 1990s email archive software
02:04:34 <rsheeter> myles: I made 2 reports to group. 1) possibility of range request solution 2) size of glyph closure (by count of glyph id)
02:08:28 <rsheeter> https://lists.w3.org/Archives/Public/public-webfonts-wg/2019Feb/0028.html *does* show some images, seemingly contrary to expectation
02:09:54 <rsheeter> chris: if mail client sends a text alternate archive will favor
02:11:15 <rsheeter> myles: a few interesting conclusions. 1) CJK fonts are usually big, others aren't.
02:11:34 <rsheeter> garret: emoji
02:12:21 <rsheeter> ned: not sure if we should consider emoji in scope
02:12:55 <rsheeter> rsheeter has joined #webfonts
02:13:30 <rsheeter> myles: seems like mostly people use images for this use case
02:14:01 <rsheeter> ken: trajan has a color engraved (svg) edition
02:15:17 <rsheeter> myles: it's important to range request approach
02:16:03 <rsheeter> garret: not just emoji, also for icon fonts
02:16:51 <rsheeter> ned: maybe only support fonts with a single glyph representation
02:17:35 <rsheeter> rod: kick this down the road to post simulation?
02:17:42 <rsheeter> chris: it's still doable
02:18:20 <chris> ranges on sbix is no worse that ranges on the svg or COLR tables
02:19:17 <rsheeter> vlad: first goal is to analyze for user experience
02:19:59 <rsheeter> vlad: once we have an approach we can start to impose limitations (say no bitmap font support)
02:20:49 <rsheeter> ken: on the emoji committee people specifically want same rendering platform to platform
02:21:13 <rsheeter> rod: people ask us for exactly that. we tell them to go away because we can't do it efficiently.
02:21:44 <rsheeter> ned: you still need system support at some level
02:22:53 <rsheeter> ned: there are few emoji fonts so there is little data
02:23:44 <Garret> present+
02:23:47 <rsheeter> vlad: once we narrow to which solution gives best user experience we can discuss limitations
02:23:48 <rsheeter> present+
02:24:24 <Andy> Andy has joined #webfonts
02:24:40 <chris> present+
02:25:27 <KenADBE> present+
02:25:36 <rsheeter> myles: ignoring emoji, large == CJK (more or less)
02:25:53 <rsheeter> myles: the larger the font the lower the % taken up by non-outline data
02:26:28 <rsheeter> (based on the whole font, not one reduced to fit the page)
02:27:19 <rsheeter> myles: therefore range request makes sense, download everything non-glyph plus the set of glyphs you need
02:27:51 <rsheeter> chris: does Korean use lots of standalone glyphs
02:28:21 <rsheeter> ken: basic korean usually precomposed, very large set of additional chars
02:29:13 <rsheeter> ken: example of pan-CJK table sizes, non-glyph is relatively small
02:29:52 <rsheeter> subject: Potential Approaches
02:30:39 <rsheeter> garret: subset & patch approach. Client knows what they have and what they need, server sends a binary patch. Probably by computing subset-before, subset-after, and then diff between them to send back to the client.
02:30:56 <rsheeter> garret: subsetter lives on server, can support closure and thus all possible glyphs accessible
02:31:29 <rsheeter> myles: range request approach. Requires font pre-processing to be ready to stream. Puts everything except outlines at beginning, no interdependent glyphs.
02:32:07 <rsheeter> myles: browser downloads / parses until it hits first outline. now it has all the shaping info + the locations of the glyphs and can load them as needed.
02:32:18 <rsheeter> chris: if it's woff2 then you toss loca?
02:32:27 <rsheeter> garret: woff2 probably doesn't work with this approach
02:33:54 <rsheeter> myles: can either do block based, e.g. so range of input bytes can only alter know range of output bytes. Or drop compression, you still come out ahead.
02:34:04 <chris> s/subject:/topic:/g
02:34:33 <rsheeter> (various ideas about how to compress bits and pieces to allow range requests)
02:34:49 <rsheeter> ken: how are the missing glyphs represented on the client?
02:34:58 <rsheeter> myles: implementation detail
02:35:26 <rsheeter> ken: Adobe solution makes initial instantiation of font with base glyphs plus zeroed out missing ones, then can patch in others easily.
02:36:01 <rsheeter> myles: browser/text stack interaction details should be out of scope for spec. Just the transfer mechanism.
02:37:10 <rsheeter> myles: because we download all the shaping rules up front we can identify needed glyphs rather than closure off codepoints
02:37:43 <rsheeter> garret: set of glyphs for closure can be larger, have data for later today on how this performs on real content.
02:38:09 <rsheeter> vlad: we have documents describing prior approaches in reasonable detail
02:38:30 <rsheeter> vlad: Hybrid Approach.
02:39:36 <rsheeter> vlad: glyph closure can be hard to compute given features, ex small caps, stylistic sets, arabic positional, etc
02:41:07 <rsheeter> vlad: idea is to combine everything for shaping coming first, then letting client shape and select glyphs so it can ask for glyphs rather than codepoints (by something like subset+patch).
02:41:17 <rsheeter> vlad: means two - serial - calls to get content on the screen
02:41:30 <rsheeter> vlad: positive is you ask for significantly less data
02:42:00 <rsheeter> vlad: asking for glyphs reduces privacy consideration so you aren't leaking codepoints [at least not as directly]
02:42:36 <rsheeter> garret: can do first request on codepoints, include *all* layout data + those codepoints to remove the serialization concern. Then requests 2+ are glyph based.
02:43:12 <rsheeter> garret: enthused about testing these approaches
02:43:26 <rsheeter> myles: if we design hybrid right it can work even with servers that don't support subset+patch
02:44:06 <rsheeter> myles: for example, if we sent an optional header as a sentinel it could respond subset+patch, otherwise we just stream
02:44:15 <rsheeter> jason: would that adequately obfuscate content on page?
02:44:34 <rsheeter> myles: we need to obfuscate (likely request more content) in all cases
02:44:56 <rsheeter> rod: it would require preprocessing the font?
02:45:30 <rsheeter> myles: yes.
02:45:56 <rsheeter> ned: has precedent, ex streamable pdf has a special table of contents
02:47:10 <rsheeter> ken: adobe sends default glyphs based on language on first response. this may service the obfuscation need.
02:47:53 <rsheeter> garret: I think maybe concern is mostly rare characters that are strongly suggestive. Latest proposal was to obfuscate only low frequency codepoints.
02:48:26 <rsheeter> myles: degree of obfuscation is an interesting question, should specify minimum
02:48:54 <rsheeter> rod: we saw orders of magnitude between common/rare
02:49:37 <rsheeter> ned: domain specific characters, ex chemistry, are strongly indicative
02:49:52 <rsheeter> rod: what is our threat model here?
02:50:33 <rsheeter> ned: I think we assume requests are readable in transit
02:51:06 <rsheeter> vlad: threat may apply to all solutions, in which case it doesn't alter the approach selection?
02:51:24 <rsheeter> jason: I think it could change the selection
02:52:21 <rsheeter> myles: New Approach!
02:53:10 <rsheeter> myles: subset+patch client has to recognize it needs new chars , interact with server, client has to decompress
02:53:52 <rsheeter> myles: browser has to know when set of chars needed changes. browser needs way to replace existing font with new one (can do today with css font loading).
02:54:19 <rsheeter> myles: does that mean all we need is for browser to offer a js callback with the codepoint info?
02:54:40 <rsheeter> grieger: I think TachyFont already does this, roughly
02:54:48 <rsheeter> myles: can't do it without a DOM scan today
02:55:07 <rsheeter> grieger: performance implications of js were a concern
02:55:16 <rsheeter> myles: web assembly?
02:55:36 <rsheeter> garret: interested in trying subset+patch in web asm :)
02:56:33 <rsheeter> jason: it's nice that today webfonts doesn't require js
02:57:16 <rsheeter> vlad: we have prior art on why this group didn't want a js solution
02:58:38 <rsheeter> garret: performance and caching implications, especially cross-domain
02:58:54 <rsheeter> rod: not having to load the scaffolding is a significant win
02:59:24 <rsheeter> jason: js on low end devices may not be ideal (expensive processing)
03:00:00 <rsheeter> myles: we should test it if we claim performance
03:00:08 <rsheeter> garret: (agreed)
03:01:16 <rsheeter> myles: imagine you can preprocess a font to make it more amenable to streaming. Output is still a valid opentype font file. So ... browser could just fetch the parts it wants today.
03:01:25 <rsheeter> myles: possible problem with dynamic text
03:01:41 <rsheeter> myles: so add a flag to try to stream (opt-in)
03:02:02 <rsheeter> rod: agree, css opt-in
03:02:42 <rsheeter> garret: rod and I discussed having this in the src: property
03:04:02 <rsheeter> (discussion of needing opt-in, which everyone seems to think is desired)
03:04:09 <chris> CSS Fonts 4 has the supports() addition to src. https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing
03:04:48 <rsheeter> myles: so then I can implement range requests today!
03:05:43 <rsheeter> myles: format is purely for recognition, no influence on how we interpret the bytes that are delivered
03:06:25 <rsheeter> ned: at last I can serve truetype fonts only to browsers that support woff!
03:08:44 <rsheeter> (discussion of compression of range request approach)
03:09:28 <rsheeter> vlad: Monotype has modified microtype express that allows rendering from the compressed datastream with huffman encoding using known tokens.
03:11:14 <rsheeter> (digression into video streaming compression)
03:12:27 <rsheeter> myles: I now have actually time scheduled for this!
03:12:41 <rsheeter> garret: vlad can you show us your toys?
03:13:13 <rsheeter> vlad: yes, ISO standard
03:14:36 <Vlad> The first font compression format standardized by ISO allows rendering font glyphs from a compressed data stream: ISO/IEC 14496-18
03:15:51 <rsheeter> vlad: by end of this meeting I would like to have a timeline of what should happen next and who will do each thing
03:15:58 <myles> ScribeNick: myles
03:16:17 <myles> Vlad: I think by the end of this meeting we should have a timeline with names attributed to each item
03:19:06 <Vlad> ISO/IEC 14496-18: https://www.iso.org/standard/40151.html
03:30:47 <jpamental> jpamental has joined #webfonts
03:30:53 <Garret> Garret has joined #webfonts
03:30:59 <myles> myles has joined #webfonts
03:35:31 <myles> hi
03:35:40 <myles> Garret: I'm sharing the document!
03:35:51 <Garret> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit
03:36:04 <ned> ned has joined #webfonts
03:36:12 <Andy> Andy has joined #webfonts
03:36:30 <myles> Garret: I wrote this proposal! To try to outline what an analysis framework would look like. First, we want to list success criteria, what we want to measure
03:36:46 <myles> Garret: 1) We want to load font data fairly. As much as possible, we want to load only the data for the current page.
03:37:18 <myles> Garret: 2) We want to minimize user-perceptible delay
03:37:57 <myles> Garret: "user perceptible" means we don't care what the timeline of what events happen when, but we only care about how long it takes until the user's content renders
03:39:05 <myles> Garret: We need input data to drive the analysis. We should have a set of page view sequences. We don't need to know the URLs, but instead just need to know which characters are rendered with which font.
03:39:38 <myles> Garret: Once we have that, then we need a corpus of fonts. The hope here is to cover lots of languages, font technologies, OpenType, TrueType, etc. We want the whole spectrum of fonts in use. Maybe this means emoji and icon fonts too.
03:40:29 <myles> Garret: For the analysis: for each font font family, go through the page sequence, calculate how much network delay is imposed for each view, and assume that fonts are cached between page views. This will tell us at minimum how many requests and how many bytes will cost.
03:40:38 <myles> chris: This doesn't include any styling information.
03:40:45 <myles> Garret: Yes, we want to simplify this.
03:41:44 <myles> Garret: We might need to capture the set of font features that are needed
03:41:53 <myles> myles: really?
03:42:11 <myles> jpamental: Does this happen before or after the CSS is loaded and parse? That could contain content.
03:42:26 <myles> Garret: Figuring how how we turn a set of URLs into a code point set is an open problem.
03:42:37 <myles> Garret: We'll try to include things like layout features.
03:44:48 <myles> rsheeter: we will probably need to expose used glyph information too
03:45:46 <myles> Garret: So we'll be listing the number of network requests and the size of each one. Then, we turn that into an estimated network delay. To do this, we will do some handwaving, and assume some network parameters, like throughput, RTT, etc. We would probably have multiple profiles and run this same analysis for each profile (e.g. see what it would be like on 3G, broadband, etc.)
03:46:34 <myles> myles: Can we make sure this is experimentally gathered
03:46:48 <myles> Garret: WebPageTest has a data set that can do this
03:47:28 <myles> Garret: The page view sequences will only include sequences which happened multiple times (For anonymization)
03:48:58 <myles> Garret: Then we need to aggregate all the delays across all the page views. To do this, we need to use a cost function. We haven't proposed a particular one, but here are some characterizations: 1) It is always increasing 2) it isn't linear (hitting 3s is real bad b/c browsers show fallback fonts) 3) the total number of bytes is secondary, but still somewhat important. Even if the delays are nice, people pay $$ for bytes
03:50:03 <myles> rsheeter: We did an exercise for finding solutions that are bad solutions but our metrics would say are very good.
03:50:54 <myles> Garret: Proposal: There's some time point P which is the time to load the page. If the time is less than that, the cost is 0. Beyond that, it increases exponentially. Perhaps the group could come up with something better, but this is a starting point
03:51:30 <myles> Garret: We also are interested in bytes transferred. So we can calculate an efficiency metric by comparing number of bytes downloaded with the "optimal" amount
03:52:03 <myles> Garret: The network delay: There's some base round trip time, plus some time-per-byte. We can just add them.
03:52:09 <myles> Myles: is this experimetnally gathered?
03:52:14 <myles> Garret: We can gather some data for this
03:53:18 <myles> Garret: In addition to the proposed PFE methods, we should test using whole font files, unicode-range and language-based subsetting, and Adobe's solution
03:54:02 <myles> Garret: If we learned more about Adobe's solution, we can feed that into the analysis
03:54:08 <myles> KenADBE: I just put in a request for that
03:54:58 <myles> Garret: HTTP range transfers, and two more "optimal" solutions which give us a baseline. This computes the optimal subset, and assign this to each page by just dividing it by the number of pages
03:57:20 <myles> chris: This incremental thing is also a privacy leak. If I make a page with a few interesting characters on it, and I'm monitoring what gets requested, i know the user's browsing history
03:57:27 <myles> <all>: yep
03:57:53 <myles> Garret: We want to represent arabic and indic, lots of types of fonts, and CJK too.
03:58:03 <myles> Garret: We also might want to test variable and non-variable fonts as well
03:58:23 <myles> jpamental: This technology is what will enable variable fonts to work
03:58:31 <myles> chris: Being able to strip out unused axes might be helpful
03:58:37 <myles> KenADBE: What about font collections?
03:58:44 <myles> Garret: Yes, we should add .ttc files to this too
03:59:08 <myles> jpamental: From the web design world, most of my audience is western, but there is interest in what we're doing combined with variable fonts is through the roof.
03:59:18 <myles> rsheeter: We have a self interest for this too.
03:59:44 <myles> rsheeter: It can end up being a huge penalty to send a whole variable font when only one point on the design space is used
03:59:50 <myles> jpamental: Variable fonts aren't actually that big
03:59:53 <myles> rsheeter: yes they are
04:00:02 <rsheeter> (at least some of them)
04:00:11 <myles> chris: If you've got variable fonts, and it's CFF2, then it may be more or less amenable
04:00:24 <myles> chris: I think CFF2 is more tractable for that approach
04:00:32 <myles> ned: It requires the same transformations, but the modifications ....
04:00:38 <myles> chris: It doesn't duplicate a bunch of stuff that's already in the other tables
04:01:19 <myles> ned: wellllllllll..... traditionally, type 1 multiple master fonts baked everything into the glyph description. And CFF 2 uses the same mechanisms, but it removes the data and puts it in a side table in CFF 2. It's a sidecar table within the CFF 2 SFNT. We would need to address that as well.
04:01:37 <myles> Garret: The google fonts collection is a good starting point, but we'll need more. We have holes in our collection
04:02:31 <myles> Vlad: Which fonts can be donated? We need some rare scripts, certain shaping types.... I heard that most of the elaborate and exotic fonts are present in the Noto fonts family. If we want to make sure it works everywhere, we can use Noto
04:03:19 <myles> Garret: We have basically no CFF fonts.
04:03:38 <myles> Garret: Our technology stack didn't support CFF until recently
04:03:51 <myles> Garret: Noto CJK is one of the first ones
04:04:58 <myles> myles: what about font files that are actually used on the web?
04:05:08 <myles> rsheeter: Yes, we should determine what are the top used fonts on the web
04:05:25 <myles> KenADBE: You would permission to use arbitrary fonts from the web.
04:06:36 <myles> myles: Even if we couldn't use the "real" fonts because of licensing, could we at least characterize the fonts that are used so we could try to make our own collection mirror the real collection
04:07:21 <myles> jpamental: Aspriationally, this solution will enable, or at least not restrict, people from designing more freely. I want people to be able to use more weights than just 2.
04:07:29 <myles> rsheeter: Yes, this shouldn't be too difficult to show.
04:08:45 <myles> <discussion about font licencing>
04:09:06 <KenADBE> In case I haven't shared it before, our Source Han Serif promotional website makes use of our dynamic augmentation → https://source.typekit.com/source-han-serif/
04:09:30 <myles> Garret: Is this the wya we want to do this analysis?
04:09:45 <myles> Vlad: this analysis covers everything that matters
04:09:59 <myles> chris: this is the right way
04:11:36 <myles> myles: We need to actually measure some of these scenarios, on real networks. If they match what the analysis produces, then that's good. if not, we need to go back and redesign something
04:11:46 <myles> Garret: rsheeter: yes, good idea
04:12:06 <myles> Garret: Practical concerns: 1) Repository location, and 2) assigning owners
04:12:16 <myles> chris: W3C has github repos
04:12:30 <myles> Garret: programming language?
04:12:37 <myles> Garret: Python?
04:12:51 <myles> ned: Python already has a bunch of libraries for namipualtin font data
04:13:53 <myles> Garret: it's easy to run C code from python, so we can do that (to run the harfbuzz stuff)
04:15:06 <myles> ned: if there is code that is tested that is performance sensitive, we shouldn't use python. But if you just want a bunch of tools for examining font data, then Python is great.
04:15:19 <myles> Garret: For thing like "what code points does this font support" that's easy
04:15:22 <myles> ned: Do we want to write a new toolkit?
04:15:36 <myles> ned: let's just use FontTools
04:16:15 <myles> ned: font production engineers use python.
04:16:27 <rsheeter> rod: +1 for this toolchain
04:17:40 <myles> rsheeter: What happens if we can't make the browsing data public?
04:17:47 <myles> myles: We just figure out a way to get other data
04:18:07 <myles> Garret: We can run the analysis on the internal data set and external one
04:18:18 <myles> chris: And they should match. If they don't, we throw out the private data
04:18:44 <myles> chris: It would be nice if font foundries could run these tools on their own font corpuses
04:19:19 <myles> Vlad: Most groups in the W3C strive to operate in public. But lots of groups have some private discussions
04:19:31 <myles> Vlad: If we cannot share our data set publicly, maybe that's okay?
04:19:57 <myles> chris: Nowadays, that would require extraordinary persuasive power to make a group that is member-only
04:21:22 <myles> chris: In general, open publishing and verication is the way that science is going. I would like us to proceed in that manner.
04:22:16 <myles> rsheeter: So private data would be useful, but non-sufficient
04:23:49 <myles> ned: we have a tool that would be run on a private data set. If that tool can be distributed and run on any data set, private or public, then theoretically there's no need for this group to publish results that rely on a private data set
04:24:23 <myles> ned: if we can run that tool on a private and a public data set, and show that there's no divergence, then that shows the public data set is sufficient
04:26:33 <myles> KenADBE: we serve both open source fonts and proprietary fonts. The public data set would include those open source fonts. if we see the numbers are off for the proprietary fonts, then ....
04:26:46 <myles> ned: Then we need to see what the difference is between the proprieratry fonts and the public fonts.
04:28:02 <myles> chris: If both data sets get the same results, that's good to know. Otherwise, we'd still like to know
04:28:45 <myles> chris: The evaluation report is the justification for the work we do here. We have to prove that our work is viable
04:29:21 <myles> Vlad: I'm trying to open up Monotype's font corpus for this analysis. But this will absolutely not be available publicly.
04:29:57 <myles> Garret: it's valuable for Monotype to measure their internal data set and share their results
04:30:57 <myles> jpamental: During this process, if everyone gets different results, we need to work more on the tool.
04:31:05 <myles> Garret: Anyone voluteering?
04:31:28 <myles> <all> ....
04:31:31 <myles> <all> .... lunchtime!
04:33:17 <chris> rrsagent, make minutes
04:33:17 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris
04:34:06 <KenADBE> https://www.yelp.com/biz/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA?osq=kua%27aina
04:34:33 <KenADBE> https://www.yelp.com/map/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA
04:36:36 <Zakim> Zakim has left #webfonts
06:16:34 <Garret> Garret has joined #webfonts
06:18:12 <myles> myles has joined #webfonts
06:20:02 <Andy> Andy has joined #webfonts
06:21:06 <rsheeter> rsheeter has joined #webfonts
06:21:54 <Vlad> Vlad has joined #webfonts
06:22:06 <Vlad> trackbot, start meeting
06:22:09 <trackbot> RRSAgent, make logs world
06:22:09 <Zakim> Zakim has joined #webfonts
06:22:10 <trackbot> Meeting: Web Fonts Working Group Teleconference
06:22:10 <trackbot> Date: 03 September 2019
06:23:02 <jpamental> jpamental has joined #webfonts
06:23:58 <myles> ScribeNick: myles
06:24:38 <myles> Garret: as a follow-on to Myles's analysis about glyph closures, I did something similar that is more realistic that looks at text on real webpages.
06:24:42 <chris> chris has joined #webfonts
06:25:11 <chris> zakim, remind us in 5 hours to go home
06:25:11 <Zakim> ok, chris
06:25:41 <myles> Garret: I simulated the glyph needed for sample page views, by looking at both glyphs and glyph closures. For the layout approach, we shape all the words in the text (using harfbuzz), and collected the set of unique glyph IDs. For the closure approach, computed the unique set of code points in the text, and the size of the glyph closure
06:25:54 <myles> I grabbed a collection of wikipedia pages, and looked at 6 fonts.
06:26:01 <myles> Garret: This is a small sample, but it's still interesting.
06:26:30 <myles> The result of the analysis: The number of glyphs that are needed, and estimated the size of a font would be with the two different approaches.
06:26:31 <myles> Garret: So this is a mini analysis
06:26:57 <KenADBE> KenADBE has joined #webfonts
06:27:13 <myles> Garret: For the layout approach, I estimated the font size by taking the size of everything other than outlines, and added only the outlines that were needed. For the glyph closure size, we just count the glyph subset
06:27:57 <myles> Garret: So we need 64 glyphs if we're shaping, and if we're looking at closure, we need 90 glpyhs. All the non-glyph tables 46 kb and the glyf table is 125 kb in the original font
06:28:05 <myles> Garret: The font using the layout approach is 46kb + 9kb = 56kb
06:28:30 <myles> Garret: The font using the glyph closure size, the glyph closure is 17kb, because we don't need the non-glyph tables. We subset all the tanles
06:28:30 <myles> *all the tables
06:29:03 <myles> Garret: If we go to a second page. Here, we need 8 more glyphs for both methods. This increases both fonts by 1 kb
06:29:18 <myles> Garret: Caveats: This is a very small data set, so take it as you will. I used Wikipedia pages, which are not representative.
06:29:30 <myles> Garret: I only tested with a small handful of fonts
06:29:36 <myles> Garret: I didn't consider font compression
06:30:38 <myles> Garret: for languages, I tested English, Arabic (Persian), and Indic (Devanagari)
06:31:29 <myles> Garret: If we run through 6 page views, this is the number of glyphs for each approach. The whole font file is significantly larger than just the glyphs we need. The closure-based method universally needs more code points than the glyph method, but they were roughly the same. The worse case was 50% more glyphs
06:31:59 <myles> Garret: Then we looked at the final font size. For latin, the closure approach produced a significantly smaller font file. For Open Sans, <missed>
06:32:05 <myles> For almost all fonts, there's a reduction in font size
06:32:30 <myles> Garret: The closure size is smaller, because the non-glyf tables were subset.
06:32:45 <myles> Garret: e.g Roboto has a large GPOS table: 24KB.
06:33:08 <myles> Garret: Even though we sent more glyphs, we overall won out because the other tables were smaller.
06:33:41 <myles> Garret: Arabic: Somewhat similar results. Definitely more glyphs for the closure approach. But still a reduction in the final font size.
06:34:31 <myles> Garret: The final font sizes in Arabic are similar to the Latin results
06:35:10 <myles> Garret: In Amiri, GPOS is huge
06:35:42 <myles> Garret: but we only pull out a fraction of it into the result font
06:36:05 <myles> Garret: lastly: Devanagari. But here, the glyph closure explodes even higher. Approaches a 3x increase.
06:36:40 <myles> Garret: This has the opposite behavior. The layout is universally winning. This is likely because this font is glyph dominated and the closure explodes more extremely.
06:37:36 <myles> Garret: for Devanagari, a lot of the shaping rules are in the shaping software, not the GPOS tables, which means the GPOS tables are smaller than for, e.g. Arabic, where it's all in the GPOS tables.
06:38:03 <myles> Garret: These indic fonts benefit a lot from Compression. 600 KB compresses to 150KB. Compression will play a big part in the final story.
06:38:18 <myles> jpamental: This underscores the need for any viable solution to work with compression.
06:38:40 <myles> Garret: I suspect there are a lot of similar glyph shapes or outline shapes in the font. Those compress well
06:39:22 <myles> Garret: Key things: 1) The good news: Both methods were better than sending the whole font (even for Latin). 2) The question of closure vs layout depends on the question of script.
06:39:54 <myles> myles: No CJK?
06:40:30 <myles> Garret: nope, CJK fonts are basically all glyph data, and usually very little layout rules, so both approaches will probably be close to a tie.
06:40:34 <myles> KenADBE: Except for pan-CJK fonts, because they have extensive LOCL GSUB features.
06:40:46 <myles> KenADBE: Also, some of the higher end Japanese fonts will have significantly large GSUB features.
06:40:49 <myles> Garret: We don't have any fonts like that
06:41:03 <myles> KenADBE: I can't make it public, it's proprietary
06:41:54 <myles> KenADBE: I also made a font that has the same shaping rules as such a font, but has blank glyph outlines. This makes it not proprietary.
06:42:30 <myles> ned: We're talking about fonts where we wouldn't be using subroutines anyway, to build a font where each glyph contains the numeric represntation of the glyph ID. Then you could have the layout features that substitute for different glyphs,and the outlines would be different enough.
06:43:03 <myles> rsheeter: Or we could take the glyphs from another font
06:43:23 <myles> ned: yes, that sounds better
06:43:30 <myles> Garret: I can run some CJK numbers
06:44:30 <myles> Garret: another takeaway: There is some value in transfering tables other than glyf. We should look into doing it. If we could work that into the range approach, it would be beneficial. We would have to find tables that dont participate in shaping but are still per-glyph.
06:44:53 <myles> Garret: This tells me a patch based approach, codepoints or glyphs, could do better than a code points based approach. We need the full analysis framework to draw some conclusions
06:49:26 <myles> Garret: I have a spreadsheet of the raw data. If you want, I can share it.
06:49:55 <myles> Garret: Other interesting things: You can see how this behaves over a sequence. So closure will usually grab almost everything up front, and have small additional glyphs added. But the layout approach takes less up front and requests less down the line.
06:50:01 <myles> Garret: Similar with file size
06:50:06 <myles> Garret: I'll share this with the WG
06:51:54 <myles> myles: CJK fonts are large, so we need a solution for these big fonts. How do we know whether we need a single solution or two solutions, one for large fonts and one for medium fonts?
06:52:02 <myles> Garret: rsheeter: That will fall out of the performance analysis
06:52:45 <myles> ned: You mentioned roboto having a large GPOS table. Our system font's positioning information is 75% of the font. If you're talking about the limits of a font design, it is possible to have a latin font with a lot shaping rules
06:53:24 <myles> jpamental: It does bring up something. Unless we do the same analysis on something like proxima nova, or other really well developed professional fonts that are really have been worked on and have a ton of kerning pairs, there are going to be fonts not ....
06:53:35 <myles> rsheeter: I'd love to hear nominations for specific fonts.
06:53:58 <myles> jpamental: It was itneresting that different fonts have wildly differing glyph counts. Any font used for print that has been slaved on would be a candidate for having a ton of that information
06:56:43 <myles> myles: A solution that works really well for big fonts but not that well for small fonts may be more valuable than a solution that works really well for small fonts but not that well for big fonts
06:57:52 <myles> Vlad: 250 milliseconds spent to transfer subset or 300 milliseconds spent to transfer whole font. We have to include this in the analysis
06:59:14 <myles> myles: What I was trying to say is that we should be measuring absolute values like time and bytes, not percentages
06:59:42 <myles> jpamental: If other common web things like DNS prefetch and those kinds of things you can do to get assets to load faster, and the way HTTP2 works now. I wonder if that scenario has changed in what you were describing, Vlad
06:59:46 <myles> Vlad: I don't know. maybe?
07:00:06 <myles> Garret: network optimization in browsers are fairly complex. DNS prefetch prediction, etc.
07:00:37 <myles> jpamental: Looking at the work that harry roberts does, what he's laid out for good performance practices. I don't think people are following those for web font hosting. We should make sure that we're setting up a test that takes those things into account and make that part of a solution.
07:00:52 <myles> Vlad: my one purpose is we need to be careful to not see the results the way we want to see them.
07:02:05 <myles> jpamental: I'm thinking about making sure tha thowever we set up the test, we set it up using all of the current best possible practices, rather than dropping a line of CSS. If we want to see if this can be a high performance thing, where 100ms might be enough for someone to make a decision about this, let's make sure that we're testing a solution that uses all of the possible performacne tricks that a web developer could put into pracitce
07:02:30 <myles> Garret: We're not literally going to fire up network requests
07:02:31 <myles> myles: We should literally be firing up network requests
07:02:59 <myles> jpamental: If we're going to be testing this in different scenarios, we want to cover those things so that.... if what we're trying to do is improve the state of the art, we should set the bar at the state of the art.
07:03:03 <myles> Vlad: let's move on.
07:03:16 <myles> rsheeter: Let's talk about variable fonts
07:03:32 <rsheeter> https://docs.google.com/spreadsheets/d/1WA5F9GBTqtIWCy1zo1xHz87_ohUWbxvJwDXcSuaLb_A/edit#gid=195687802
07:04:16 <myles> rsheeter: We took what variable fonts we could get our hands on, and tested with a partial instancer in font tools. We can see how much bigger a font gets if we delete axes. We learned that adding axes is really expensive. It varies by axis somewhat, but it can easily double the size of your font by adding an axis.
07:04:30 <myles> rsheeter: We should be contemplating the partial transfer of variable font axes.
07:05:20 <myles> jpamental: You can't remove the axis without putting it somewhere on the axis.... you have to snapshot the axis.
07:05:43 <myles> Garret: If you want the default position, you get lucky, but we coudl pin it somewhere in the space, and then later during the enrichment just throw that away.
07:06:01 <myles> rsheeter: Vertical fonts could save considerable bytes
07:06:05 <myles> jpamental: which fonts?
07:06:30 <myles> rsheeter: <lists fonts>
07:07:02 <myles> Garret: Oswald is a variable font, and it's one of our top fonts.
07:07:10 <myles> myles: how commonly used are these fonts on the web?
07:07:20 <rsheeter> https://fonts.google.com/specimen/Oswald
07:08:39 <chris> chris has joined #webfonts
07:08:42 <myles> jpamental: In my mind proves variable fonts worth in that scenario, where it can be a drop-in replacement for these other fonts.
07:09:08 <myles> Garret: Oswald is our 4th most popular font
07:09:27 <rsheeter> overall stats @ https://fonts.google.com/analytics (featuring Oswald!)
07:10:10 <Garret> https://github.com/google/fonts/tree/master/ofl/oswald
07:10:30 <myles> Vlad: A variable font can be used in 2 cases: 1) the same font is used in multiple different variation settings to make elaborate designs in a single page. 2) A font with optical size, where things may change if the window layout change, or if you change orientation from a portrait to landscape.
07:11:04 <myles> Vlad: If you serve the original default view, and no variation changes ever involved, that seems a win. But if I do move to landscape, that triggers variation changes. How will you respond. Will you calculate a new instance to send?
07:11:48 <myles> rsheeter: We're assuming at this point that the UA provides information to the server. The client could send a request for multiple desired points in the design space
07:12:46 <myles> rsheeter: it's worth considering if we can progressively enhance based on variable data.
07:13:07 <myles> Garret: We could talk about the design doc about patch'n'subset, or I prepared a talk about Harfbuzz
07:13:33 <myles> Vlad: the design document would be good. I just want to make sure that we create a timeline.
07:15:26 <myles> Vlad: for the byte range and font preprocessor. Do you have a timeline, Myles?
07:15:37 <myles> Myles: I will match whatever Google does on the test harness
07:15:42 <myles> Garret: maybe a quarter or two?
07:16:14 <myles> Myles: March sounds probably okay
07:17:14 <myles> rsheeter: We need to know how to do compression with the byte range method.
07:17:21 <myles> myles: yes, that has to be done first. Maybe a monthish?
07:18:06 <myles> Garret: the work on the data set will happen in parallel with the work on the framework. The data set should be done before the implementation is finished.
07:21:25 <rsheeter> scribenick: rsheeter
07:21:39 <rsheeter> Topic: Subset & Patch design doc
07:21:55 <Garret> https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit#
07:22:47 <rsheeter> garret: a protocol for simulation, not a final protocol. So uses shortcuts, ex protobufs which we wouldn't use in a final protocol.
07:23:22 <rsheeter> garret: imagining post requests because the body can be substantial
07:23:52 <rsheeter> garret: has fingerprints (hash) so we can confirm both sides are operating on the same thing
07:23:58 <rsheeter> myles: call it something else
07:24:33 <rsheeter> (discussion suggesting hash or checksum)
07:24:39 <rsheeter> vlad: font id!
07:26:02 <rsheeter> garret: request has hash of original font, hash of current client font, sets of codepoints desired & needed, hash of alternate indexing for compression
07:26:25 <rsheeter> myles: what are the hashes on first request?
07:26:36 <rsheeter> garret: omitted (discussion of proto optional fields)
07:27:19 <rsheeter> garret: response has patch, codepoint remapping
07:27:53 <rsheeter> (discussion of sending full codepoint set vs unicode-range to help client with font fallback)
07:28:37 <rsheeter> garret: first request gives a full font, subsequent ones give a patch onto a full font.
07:29:09 <rsheeter> garret: discussed how we handle discrepancies between client and server
07:29:34 <rsheeter> myles: this seems complex, why would we get two independent implementations?
07:29:47 <rsheeter> garret: don't think it's as bad as it looks
07:30:21 <rsheeter> rod: compare to brotli, which is quite complex and has two implementations
07:30:37 <rsheeter> andy: what's the point of the remapping
07:30:48 <rsheeter> garret: shrinks the size of the codepoint set
07:31:17 <rsheeter> garret: also allows grouping for obfuscation
07:31:42 <rsheeter> garret: recommended a specific fingerprint
07:32:13 <rsheeter> myles: what happens when the server sends something unexpected?
07:32:38 <rsheeter> garret: checksums allow detection, client should essentially reset / start from scratch
07:33:20 <rsheeter> garret: appreciated if any missing error handling is suggested
07:33:26 <rsheeter> myles: what happens when it fails again?
07:33:52 <rsheeter> garret: should work
07:33:58 <rsheeter> myles: eventually won't, will have to fail over
07:34:09 <rsheeter> garret: also supports upgrade to the underlying font (optional)
07:34:27 <rsheeter> myles: seems unnecessary
07:34:58 <rsheeter> myles: they don't change very often
07:35:08 <rsheeter> andy, garret: we change them from time to time
07:35:26 <rsheeter> jason: very few people / sites would notice
07:36:13 <rsheeter> rod: we (google fonts) probably change more than most and it's still relatively rare
07:36:49 <rsheeter> jason: desirable to patch forward
07:37:32 <rsheeter> garret: hopefully public implementation helps with complexity
07:37:41 <rsheeter> myles: concerned about complexity for UA
07:37:51 <rsheeter> myles: has Chrome seen this?
07:38:03 <rsheeter> garret: not *this* but we've talked about PFE with them
07:38:16 <rsheeter> garret: we can tradeoff complexity against efficiency to some extent
07:39:16 <rsheeter> (discussion of over-flexibility of all optional, is an aspect of simulation approach using protobufs not a final solution)
07:39:45 <rsheeter> myles: this is a unique thing to be specifying
07:39:48 <rsheeter> rod: what aspect?
07:39:53 <rsheeter> myles: binary rpc protocol
07:40:12 <rsheeter> (discussion of binary vs text in http2)
07:41:24 <rsheeter> ned: woff2 is binary
07:41:28 <rsheeter> myles: it's not communication
07:41:51 <rsheeter> ned: concern is binary communication?
07:42:03 <rsheeter> myles: just want to suggest it's complex
07:42:27 <rsheeter> andy: we avoid complexity in internal font handling in return
07:43:41 <rsheeter> andy: can imagine extending have/want on codepoints to more
07:43:53 <rsheeter> (discussion of glyph ids, codepoints, variable space, etc)
07:44:29 <rsheeter> myles: dislike telling the server what you already have, request size grows
07:44:37 <rsheeter> garret: set compression is meant to alleviate this
07:45:11 <rsheeter> garret: hitting 1 bit per codepoint, can reduce further via blocking
07:45:20 <rsheeter> garret: it actually declines as you get more codepoints
07:45:34 <rsheeter> rod: tl;dr runs are cheap
07:46:14 <rsheeter> myles: what's next
07:46:22 <rsheeter> garret: I can talk about harfbuzz if of interest
07:47:19 <rsheeter> rod: hb-subset of interest because we believe it alleviates concerns with fonttools subsetter speed
07:48:40 <rsheeter> rod: public link to presentation?
07:48:43 <rsheeter> garret: will send later
07:49:50 <rsheeter> garret: subsetter takes codepoints, glyph ids, layout features, etc as input. Reduces font to only what's requested to be retained.
07:50:33 <rsheeter> garret: glyph closure is a key step, walking through layout tables, composite glyphs, etc to identify the complete set of glyph ids we need to keep
07:50:59 <rsheeter> myles: can it cycle?
07:51:05 <rsheeter> ned: yes. but parsers guard against it.
07:51:18 <rsheeter> garret: also cmap 14 can flip glyphs
07:51:25 <rsheeter> ken: two types in that table
07:51:32 <rsheeter> garret: GSUB!
07:51:41 <rsheeter> garret: the most expensive table to process
07:52:08 <rsheeter> garret: find all the reachable lookups, iteratively applied to glyph set until glyph set stops change [or we hit a stop limit]
07:52:29 <rsheeter> garret: lookups can recurse / invoke each other, track through all these
07:52:56 <rsheeter> myles: can go forever?
07:53:11 <rsheeter> garret: yes! but there are op limits in implementation
07:53:15 <rsheeter> rod: fuzzer helps
07:53:31 <rsheeter> garret: remap glyph ids, unless specifically requested otherwise
07:54:02 <rsheeter> garret: then we actually prune table by table, keeping only data associated with glyph ids [usually] that are actually retained
07:54:21 <rsheeter> garret: some tables work as pairs, classic example is loca/glyf
07:54:36 <rsheeter> garret: other than layout (G*) it's usually simple
07:54:56 <rsheeter> garret: G* tables are complex because it's a packing problem
07:55:12 <rsheeter> garret: now we have all the subset tables, glue them together and declare victory
07:55:32 <rsheeter> garret: that's generic, what's unusual about HarfBuzz?
07:55:46 <rsheeter> garret: typically library parses font into an abstract representation
07:55:56 <rsheeter> (which often means copying data around, reading the whole font, etc)
07:56:38 <rsheeter> garret: harfbuzz mmaps and casts memory => data structures. No parsing, no memory for copies of structures. Because mmap'd we don't load parts of the font we don't use.
07:56:47 <rsheeter> (so subsetting things like CJK fonts is VERY efficient)
07:57:06 <rsheeter> garret: fun fact! magic endian conversion types
07:57:51 <rsheeter> garret: no need to decompile explicitly. Custom iterator (range in proposed C++ std) gives view of subset data.
07:58:13 <rsheeter> garret: FontTools subset cost is a function of the size of the input. Harfbuzz it's a function of the size of the output.
07:58:34 <rsheeter> (walkthrough example code for HDMX)
07:59:30 <rsheeter> (magic of C++ pretending to by Python :)
07:59:44 <rsheeter> garret: So, how does it perform?
07:59:54 <rsheeter> garret: Google Fonts uses this in production
08:00:15 <rsheeter> garret: FontTools CJK fonts 400-600ms, hb-subset 0.8-1.2ms
08:00:37 <rsheeter> garret: Noto Nastaliq [chosen for complex layout] 800ms with FontTools, hb-subset 4.6ms
08:00:54 <rsheeter> garret: So, how complete is hb-subset?
08:01:10 <rsheeter> garret: non-layout tables done for truetype and cff
08:01:20 <rsheeter> rsheeter: GSUB/GPOS in progress
08:01:25 <rsheeter> s/rsheeter/garret/
08:01:35 <rsheeter> garret: Adobe working on variation subsetting
08:01:43 <rsheeter> garret: live in prod at scale
08:01:56 <rsheeter> garret: intended to be complete (all G* layout types) by end of year
08:02:48 <rsheeter> rod: in subset(before), subset(after), compute delta the delta is actually the slow part (surprisingly, at least to me)
08:03:57 <rsheeter> jason: FontTools subset was breaking my italic axes two months ago
08:04:40 <rsheeter> garret: bug please
08:05:33 <rsheeter> rod: ideally a repro on an open source font
08:06:10 <rsheeter> garret: but please report it regardless
08:08:26 <rsheeter> ken: I have a font where 99% of the font is cmap 14 subtable
08:09:48 <chris> rrsagent, make minutes
08:09:48 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris
08:10:41 <chris> present+ myles
08:10:51 <chris> present+ Jason
08:10:55 <chris> Present+ Ned
08:11:01 <chris> Present+ Vlad
08:11:33 <chris> rrsagent, make minutes
08:11:33 <RRSAgent> I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris
08:48:46 <jpamental> jpamental has joined #webfonts
11:25:11 <Zakim> chris, you asked to be reminded at this time to go home
12:04:20 <Zakim> Zakim has left #webfonts