01:53:59 RRSAgent has joined #webfonts 01:53:59 logging to https://www.w3.org/2019/09/03-webfonts-irc 01:54:01 RRSAgent, make logs world 01:54:01 Zakim has joined #webfonts 01:54:03 Meeting: Web Fonts Working Group Teleconference 01:54:03 Date: 03 September 2019 01:54:10 ScribeNick: myles 01:54:17 rrsagent, this meeting spans midnight 01:54:25 rsheeter has joined #webfonts 01:55:10 chair: vlad 01:55:11 Vlad: one more administrative item: The future working group telecons 01:55:28 Vlad: Zoom was a temporary measure. Is everyone happy with Zoom? 01:55:44 Meeting: Web Fonts WG f2f meeting 01:55:48 ned: I'm not the only one who has concerns 01:55:53 Garret: I have no preference 01:56:05 ned has joined #webfonts 01:56:22 chris: The reason WebEx away is some group of people found the password for our previous bookings, and were racking up $xx,xxx of phone bills, so my account got cancelled. 01:56:49 chris: My certificate expired, so it will take a while. I'm in Japan until the 26th 01:57:01 KenADBE has joined #webfonts 01:57:04 Vlad: I'm perfectly happy to continue using my Zoom account. I like Zoom. So far it's been reliable. 01:57:29 myles: Maybe we can switch to WebEx once it becomes possible 01:57:33 Vlad: Okay. 01:57:37 Vlad: same time, same day? 01:58:25 Vlad: We had a doodle poll to see which times were worked well, and 9:00 am PDT was best for everyone 01:58:53 ned: We expected more international attendees. We couldn't find a time that worked for everyone, or someone who indicated a preference doesn't regularly join 01:59:00 Vlad: One person is that. That person is an outlier. 01:59:12 Vlad: This time resonated the most with everyone. 01:59:41 rsheeter: Monday at 9:00am PDT is best for me 01:59:49 Vlad: Okay, we'll continue with this time. 02:00:04 Vlad: We can keep this until the end of the year, hopefully WebEx will be working then. 02:00:09 Topic: Evaluation report 02:00:32 Vlad: Because chris has been out of the loop for a number of times, I was expecting you wanted to pick up work. So let's have a recap 02:00:45 chris: I did get a recap on a recent call. But any additional information would be good 02:00:53 Vlad: Would it be useful if we had a quick summary right here? 02:00:55 chris: yes 02:01:38 Vlad: I wanted to recap work that we've done. Myles? 02:03:34 chris: file attachments will be kept by archive, inline/html may be dropped 02:03:39 Vlad: Most of the data was inline images aren't available on the mailing list 02:03:45 ScribeNick: rsheeter 02:03:51 (so myles images may be gone from archive) 02:03:52 Once again, aplologies for the W3C list archive trashing emails with embedded images. This affects many WGs. Profuse apologies for our 1990s email archive software 02:04:34 myles: I made 2 reports to group. 1) possibility of range request solution 2) size of glyph closure (by count of glyph id) 02:08:28 https://lists.w3.org/Archives/Public/public-webfonts-wg/2019Feb/0028.html *does* show some images, seemingly contrary to expectation 02:09:54 chris: if mail client sends a text alternate archive will favor 02:11:15 myles: a few interesting conclusions. 1) CJK fonts are usually big, others aren't. 02:11:34 garret: emoji 02:12:21 ned: not sure if we should consider emoji in scope 02:12:55 rsheeter has joined #webfonts 02:13:30 myles: seems like mostly people use images for this use case 02:14:01 ken: trajan has a color engraved (svg) edition 02:15:17 myles: it's important to range request approach 02:16:03 garret: not just emoji, also for icon fonts 02:16:51 ned: maybe only support fonts with a single glyph representation 02:17:35 rod: kick this down the road to post simulation? 02:17:42 chris: it's still doable 02:18:20 ranges on sbix is no worse that ranges on the svg or COLR tables 02:19:17 vlad: first goal is to analyze for user experience 02:19:59 vlad: once we have an approach we can start to impose limitations (say no bitmap font support) 02:20:49 ken: on the emoji committee people specifically want same rendering platform to platform 02:21:13 rod: people ask us for exactly that. we tell them to go away because we can't do it efficiently. 02:21:44 ned: you still need system support at some level 02:22:53 ned: there are few emoji fonts so there is little data 02:23:44 present+ 02:23:47 vlad: once we narrow to which solution gives best user experience we can discuss limitations 02:23:48 present+ 02:24:24 Andy has joined #webfonts 02:24:40 present+ 02:25:27 present+ 02:25:36 myles: ignoring emoji, large == CJK (more or less) 02:25:53 myles: the larger the font the lower the % taken up by non-outline data 02:26:28 (based on the whole font, not one reduced to fit the page) 02:27:19 myles: therefore range request makes sense, download everything non-glyph plus the set of glyphs you need 02:27:51 chris: does Korean use lots of standalone glyphs 02:28:21 ken: basic korean usually precomposed, very large set of additional chars 02:29:13 ken: example of pan-CJK table sizes, non-glyph is relatively small 02:29:52 subject: Potential Approaches 02:30:39 garret: subset & patch approach. Client knows what they have and what they need, server sends a binary patch. Probably by computing subset-before, subset-after, and then diff between them to send back to the client. 02:30:56 garret: subsetter lives on server, can support closure and thus all possible glyphs accessible 02:31:29 myles: range request approach. Requires font pre-processing to be ready to stream. Puts everything except outlines at beginning, no interdependent glyphs. 02:32:07 myles: browser downloads / parses until it hits first outline. now it has all the shaping info + the locations of the glyphs and can load them as needed. 02:32:18 chris: if it's woff2 then you toss loca? 02:32:27 garret: woff2 probably doesn't work with this approach 02:33:54 myles: can either do block based, e.g. so range of input bytes can only alter know range of output bytes. Or drop compression, you still come out ahead. 02:34:04 s/subject:/topic:/g 02:34:33 (various ideas about how to compress bits and pieces to allow range requests) 02:34:49 ken: how are the missing glyphs represented on the client? 02:34:58 myles: implementation detail 02:35:26 ken: Adobe solution makes initial instantiation of font with base glyphs plus zeroed out missing ones, then can patch in others easily. 02:36:01 myles: browser/text stack interaction details should be out of scope for spec. Just the transfer mechanism. 02:37:10 myles: because we download all the shaping rules up front we can identify needed glyphs rather than closure off codepoints 02:37:43 garret: set of glyphs for closure can be larger, have data for later today on how this performs on real content. 02:38:09 vlad: we have documents describing prior approaches in reasonable detail 02:38:30 vlad: Hybrid Approach. 02:39:36 vlad: glyph closure can be hard to compute given features, ex small caps, stylistic sets, arabic positional, etc 02:41:07 vlad: idea is to combine everything for shaping coming first, then letting client shape and select glyphs so it can ask for glyphs rather than codepoints (by something like subset+patch). 02:41:17 vlad: means two - serial - calls to get content on the screen 02:41:30 vlad: positive is you ask for significantly less data 02:42:00 vlad: asking for glyphs reduces privacy consideration so you aren't leaking codepoints [at least not as directly] 02:42:36 garret: can do first request on codepoints, include *all* layout data + those codepoints to remove the serialization concern. Then requests 2+ are glyph based. 02:43:12 garret: enthused about testing these approaches 02:43:26 myles: if we design hybrid right it can work even with servers that don't support subset+patch 02:44:06 myles: for example, if we sent an optional header as a sentinel it could respond subset+patch, otherwise we just stream 02:44:15 jason: would that adequately obfuscate content on page? 02:44:34 myles: we need to obfuscate (likely request more content) in all cases 02:44:56 rod: it would require preprocessing the font? 02:45:30 myles: yes. 02:45:56 ned: has precedent, ex streamable pdf has a special table of contents 02:47:10 ken: adobe sends default glyphs based on language on first response. this may service the obfuscation need. 02:47:53 garret: I think maybe concern is mostly rare characters that are strongly suggestive. Latest proposal was to obfuscate only low frequency codepoints. 02:48:26 myles: degree of obfuscation is an interesting question, should specify minimum 02:48:54 rod: we saw orders of magnitude between common/rare 02:49:37 ned: domain specific characters, ex chemistry, are strongly indicative 02:49:52 rod: what is our threat model here? 02:50:33 ned: I think we assume requests are readable in transit 02:51:06 vlad: threat may apply to all solutions, in which case it doesn't alter the approach selection? 02:51:24 jason: I think it could change the selection 02:52:21 myles: New Approach! 02:53:10 myles: subset+patch client has to recognize it needs new chars , interact with server, client has to decompress 02:53:52 myles: browser has to know when set of chars needed changes. browser needs way to replace existing font with new one (can do today with css font loading). 02:54:19 myles: does that mean all we need is for browser to offer a js callback with the codepoint info? 02:54:40 grieger: I think TachyFont already does this, roughly 02:54:48 myles: can't do it without a DOM scan today 02:55:07 grieger: performance implications of js were a concern 02:55:16 myles: web assembly? 02:55:36 garret: interested in trying subset+patch in web asm :) 02:56:33 jason: it's nice that today webfonts doesn't require js 02:57:16 vlad: we have prior art on why this group didn't want a js solution 02:58:38 garret: performance and caching implications, especially cross-domain 02:58:54 rod: not having to load the scaffolding is a significant win 02:59:24 jason: js on low end devices may not be ideal (expensive processing) 03:00:00 myles: we should test it if we claim performance 03:00:08 garret: (agreed) 03:01:16 myles: imagine you can preprocess a font to make it more amenable to streaming. Output is still a valid opentype font file. So ... browser could just fetch the parts it wants today. 03:01:25 myles: possible problem with dynamic text 03:01:41 myles: so add a flag to try to stream (opt-in) 03:02:02 rod: agree, css opt-in 03:02:42 garret: rod and I discussed having this in the src: property 03:04:02 (discussion of needing opt-in, which everyone seems to think is desired) 03:04:09 CSS Fonts 4 has the supports() addition to src. https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing 03:04:48 myles: so then I can implement range requests today! 03:05:43 myles: format is purely for recognition, no influence on how we interpret the bytes that are delivered 03:06:25 ned: at last I can serve truetype fonts only to browsers that support woff! 03:08:44 (discussion of compression of range request approach) 03:09:28 vlad: Monotype has modified microtype express that allows rendering from the compressed datastream with huffman encoding using known tokens. 03:11:14 (digression into video streaming compression) 03:12:27 myles: I now have actually time scheduled for this! 03:12:41 garret: vlad can you show us your toys? 03:13:13 vlad: yes, ISO standard 03:14:36 The first font compression format standardized by ISO allows rendering font glyphs from a compressed data stream: ISO/IEC 14496-18 03:15:51 vlad: by end of this meeting I would like to have a timeline of what should happen next and who will do each thing 03:15:58 ScribeNick: myles 03:16:17 Vlad: I think by the end of this meeting we should have a timeline with names attributed to each item 03:19:06 ISO/IEC 14496-18: https://www.iso.org/standard/40151.html 03:30:47 jpamental has joined #webfonts 03:30:53 Garret has joined #webfonts 03:30:59 myles has joined #webfonts 03:35:31 hi 03:35:40 Garret: I'm sharing the document! 03:35:51 https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit 03:36:04 ned has joined #webfonts 03:36:12 Andy has joined #webfonts 03:36:30 Garret: I wrote this proposal! To try to outline what an analysis framework would look like. First, we want to list success criteria, what we want to measure 03:36:46 Garret: 1) We want to load font data fairly. As much as possible, we want to load only the data for the current page. 03:37:18 Garret: 2) We want to minimize user-perceptible delay 03:37:57 Garret: "user perceptible" means we don't care what the timeline of what events happen when, but we only care about how long it takes until the user's content renders 03:39:05 Garret: We need input data to drive the analysis. We should have a set of page view sequences. We don't need to know the URLs, but instead just need to know which characters are rendered with which font. 03:39:38 Garret: Once we have that, then we need a corpus of fonts. The hope here is to cover lots of languages, font technologies, OpenType, TrueType, etc. We want the whole spectrum of fonts in use. Maybe this means emoji and icon fonts too. 03:40:29 Garret: For the analysis: for each font font family, go through the page sequence, calculate how much network delay is imposed for each view, and assume that fonts are cached between page views. This will tell us at minimum how many requests and how many bytes will cost. 03:40:38 chris: This doesn't include any styling information. 03:40:45 Garret: Yes, we want to simplify this. 03:41:44 Garret: We might need to capture the set of font features that are needed 03:41:53 myles: really? 03:42:11 jpamental: Does this happen before or after the CSS is loaded and parse? That could contain content. 03:42:26 Garret: Figuring how how we turn a set of URLs into a code point set is an open problem. 03:42:37 Garret: We'll try to include things like layout features. 03:44:48 rsheeter: we will probably need to expose used glyph information too 03:45:46 Garret: So we'll be listing the number of network requests and the size of each one. Then, we turn that into an estimated network delay. To do this, we will do some handwaving, and assume some network parameters, like throughput, RTT, etc. We would probably have multiple profiles and run this same analysis for each profile (e.g. see what it would be like on 3G, broadband, etc.) 03:46:34 myles: Can we make sure this is experimentally gathered 03:46:48 Garret: WebPageTest has a data set that can do this 03:47:28 Garret: The page view sequences will only include sequences which happened multiple times (For anonymization) 03:48:58 Garret: Then we need to aggregate all the delays across all the page views. To do this, we need to use a cost function. We haven't proposed a particular one, but here are some characterizations: 1) It is always increasing 2) it isn't linear (hitting 3s is real bad b/c browsers show fallback fonts) 3) the total number of bytes is secondary, but still somewhat important. Even if the delays are nice, people pay $$ for bytes 03:50:03 rsheeter: We did an exercise for finding solutions that are bad solutions but our metrics would say are very good. 03:50:54 Garret: Proposal: There's some time point P which is the time to load the page. If the time is less than that, the cost is 0. Beyond that, it increases exponentially. Perhaps the group could come up with something better, but this is a starting point 03:51:30 Garret: We also are interested in bytes transferred. So we can calculate an efficiency metric by comparing number of bytes downloaded with the "optimal" amount 03:52:03 Garret: The network delay: There's some base round trip time, plus some time-per-byte. We can just add them. 03:52:09 Myles: is this experimetnally gathered? 03:52:14 Garret: We can gather some data for this 03:53:18 Garret: In addition to the proposed PFE methods, we should test using whole font files, unicode-range and language-based subsetting, and Adobe's solution 03:54:02 Garret: If we learned more about Adobe's solution, we can feed that into the analysis 03:54:08 KenADBE: I just put in a request for that 03:54:58 Garret: HTTP range transfers, and two more "optimal" solutions which give us a baseline. This computes the optimal subset, and assign this to each page by just dividing it by the number of pages 03:57:20 chris: This incremental thing is also a privacy leak. If I make a page with a few interesting characters on it, and I'm monitoring what gets requested, i know the user's browsing history 03:57:27 : yep 03:57:53 Garret: We want to represent arabic and indic, lots of types of fonts, and CJK too. 03:58:03 Garret: We also might want to test variable and non-variable fonts as well 03:58:23 jpamental: This technology is what will enable variable fonts to work 03:58:31 chris: Being able to strip out unused axes might be helpful 03:58:37 KenADBE: What about font collections? 03:58:44 Garret: Yes, we should add .ttc files to this too 03:59:08 jpamental: From the web design world, most of my audience is western, but there is interest in what we're doing combined with variable fonts is through the roof. 03:59:18 rsheeter: We have a self interest for this too. 03:59:44 rsheeter: It can end up being a huge penalty to send a whole variable font when only one point on the design space is used 03:59:50 jpamental: Variable fonts aren't actually that big 03:59:53 rsheeter: yes they are 04:00:02 (at least some of them) 04:00:11 chris: If you've got variable fonts, and it's CFF2, then it may be more or less amenable 04:00:24 chris: I think CFF2 is more tractable for that approach 04:00:32 ned: It requires the same transformations, but the modifications .... 04:00:38 chris: It doesn't duplicate a bunch of stuff that's already in the other tables 04:01:19 ned: wellllllllll..... traditionally, type 1 multiple master fonts baked everything into the glyph description. And CFF 2 uses the same mechanisms, but it removes the data and puts it in a side table in CFF 2. It's a sidecar table within the CFF 2 SFNT. We would need to address that as well. 04:01:37 Garret: The google fonts collection is a good starting point, but we'll need more. We have holes in our collection 04:02:31 Vlad: Which fonts can be donated? We need some rare scripts, certain shaping types.... I heard that most of the elaborate and exotic fonts are present in the Noto fonts family. If we want to make sure it works everywhere, we can use Noto 04:03:19 Garret: We have basically no CFF fonts. 04:03:38 Garret: Our technology stack didn't support CFF until recently 04:03:51 Garret: Noto CJK is one of the first ones 04:04:58 myles: what about font files that are actually used on the web? 04:05:08 rsheeter: Yes, we should determine what are the top used fonts on the web 04:05:25 KenADBE: You would permission to use arbitrary fonts from the web. 04:06:36 myles: Even if we couldn't use the "real" fonts because of licensing, could we at least characterize the fonts that are used so we could try to make our own collection mirror the real collection 04:07:21 jpamental: Aspriationally, this solution will enable, or at least not restrict, people from designing more freely. I want people to be able to use more weights than just 2. 04:07:29 rsheeter: Yes, this shouldn't be too difficult to show. 04:08:45 04:09:06 In case I haven't shared it before, our Source Han Serif promotional website makes use of our dynamic augmentation → https://source.typekit.com/source-han-serif/ 04:09:30 Garret: Is this the wya we want to do this analysis? 04:09:45 Vlad: this analysis covers everything that matters 04:09:59 chris: this is the right way 04:11:36 myles: We need to actually measure some of these scenarios, on real networks. If they match what the analysis produces, then that's good. if not, we need to go back and redesign something 04:11:46 Garret: rsheeter: yes, good idea 04:12:06 Garret: Practical concerns: 1) Repository location, and 2) assigning owners 04:12:16 chris: W3C has github repos 04:12:30 Garret: programming language? 04:12:37 Garret: Python? 04:12:51 ned: Python already has a bunch of libraries for namipualtin font data 04:13:53 Garret: it's easy to run C code from python, so we can do that (to run the harfbuzz stuff) 04:15:06 ned: if there is code that is tested that is performance sensitive, we shouldn't use python. But if you just want a bunch of tools for examining font data, then Python is great. 04:15:19 Garret: For thing like "what code points does this font support" that's easy 04:15:22 ned: Do we want to write a new toolkit? 04:15:36 ned: let's just use FontTools 04:16:15 ned: font production engineers use python. 04:16:27 rod: +1 for this toolchain 04:17:40 rsheeter: What happens if we can't make the browsing data public? 04:17:47 myles: We just figure out a way to get other data 04:18:07 Garret: We can run the analysis on the internal data set and external one 04:18:18 chris: And they should match. If they don't, we throw out the private data 04:18:44 chris: It would be nice if font foundries could run these tools on their own font corpuses 04:19:19 Vlad: Most groups in the W3C strive to operate in public. But lots of groups have some private discussions 04:19:31 Vlad: If we cannot share our data set publicly, maybe that's okay? 04:19:57 chris: Nowadays, that would require extraordinary persuasive power to make a group that is member-only 04:21:22 chris: In general, open publishing and verication is the way that science is going. I would like us to proceed in that manner. 04:22:16 rsheeter: So private data would be useful, but non-sufficient 04:23:49 ned: we have a tool that would be run on a private data set. If that tool can be distributed and run on any data set, private or public, then theoretically there's no need for this group to publish results that rely on a private data set 04:24:23 ned: if we can run that tool on a private and a public data set, and show that there's no divergence, then that shows the public data set is sufficient 04:26:33 KenADBE: we serve both open source fonts and proprietary fonts. The public data set would include those open source fonts. if we see the numbers are off for the proprietary fonts, then .... 04:26:46 ned: Then we need to see what the difference is between the proprieratry fonts and the public fonts. 04:28:02 chris: If both data sets get the same results, that's good to know. Otherwise, we'd still like to know 04:28:45 chris: The evaluation report is the justification for the work we do here. We have to prove that our work is viable 04:29:21 Vlad: I'm trying to open up Monotype's font corpus for this analysis. But this will absolutely not be available publicly. 04:29:57 Garret: it's valuable for Monotype to measure their internal data set and share their results 04:30:57 jpamental: During this process, if everyone gets different results, we need to work more on the tool. 04:31:05 Garret: Anyone voluteering? 04:31:28 .... 04:31:31 .... lunchtime! 04:33:17 rrsagent, make minutes 04:33:17 I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris 04:34:06 https://www.yelp.com/biz/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA?osq=kua%27aina 04:34:33 https://www.yelp.com/map/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA 04:36:36 Zakim has left #webfonts 06:16:34 Garret has joined #webfonts 06:18:12 myles has joined #webfonts 06:20:02 Andy has joined #webfonts 06:21:06 rsheeter has joined #webfonts 06:21:54 Vlad has joined #webfonts 06:22:06 trackbot, start meeting 06:22:09 RRSAgent, make logs world 06:22:09 Zakim has joined #webfonts 06:22:10 Meeting: Web Fonts Working Group Teleconference 06:22:10 Date: 03 September 2019 06:23:02 jpamental has joined #webfonts 06:23:58 ScribeNick: myles 06:24:38 Garret: as a follow-on to Myles's analysis about glyph closures, I did something similar that is more realistic that looks at text on real webpages. 06:24:42 chris has joined #webfonts 06:25:11 zakim, remind us in 5 hours to go home 06:25:11 ok, chris 06:25:41 Garret: I simulated the glyph needed for sample page views, by looking at both glyphs and glyph closures. For the layout approach, we shape all the words in the text (using harfbuzz), and collected the set of unique glyph IDs. For the closure approach, computed the unique set of code points in the text, and the size of the glyph closure 06:25:54 I grabbed a collection of wikipedia pages, and looked at 6 fonts. 06:26:01 Garret: This is a small sample, but it's still interesting. 06:26:30 The result of the analysis: The number of glyphs that are needed, and estimated the size of a font would be with the two different approaches. 06:26:31 Garret: So this is a mini analysis 06:26:57 KenADBE has joined #webfonts 06:27:13 Garret: For the layout approach, I estimated the font size by taking the size of everything other than outlines, and added only the outlines that were needed. For the glyph closure size, we just count the glyph subset 06:27:57 Garret: So we need 64 glyphs if we're shaping, and if we're looking at closure, we need 90 glpyhs. All the non-glyph tables 46 kb and the glyf table is 125 kb in the original font 06:28:05 Garret: The font using the layout approach is 46kb + 9kb = 56kb 06:28:30 Garret: The font using the glyph closure size, the glyph closure is 17kb, because we don't need the non-glyph tables. We subset all the tanles 06:28:30 *all the tables 06:29:03 Garret: If we go to a second page. Here, we need 8 more glyphs for both methods. This increases both fonts by 1 kb 06:29:18 Garret: Caveats: This is a very small data set, so take it as you will. I used Wikipedia pages, which are not representative. 06:29:30 Garret: I only tested with a small handful of fonts 06:29:36 Garret: I didn't consider font compression 06:30:38 Garret: for languages, I tested English, Arabic (Persian), and Indic (Devanagari) 06:31:29 Garret: If we run through 6 page views, this is the number of glyphs for each approach. The whole font file is significantly larger than just the glyphs we need. The closure-based method universally needs more code points than the glyph method, but they were roughly the same. The worse case was 50% more glyphs 06:31:59 Garret: Then we looked at the final font size. For latin, the closure approach produced a significantly smaller font file. For Open Sans, 06:32:05 For almost all fonts, there's a reduction in font size 06:32:30 Garret: The closure size is smaller, because the non-glyf tables were subset. 06:32:45 Garret: e.g Roboto has a large GPOS table: 24KB. 06:33:08 Garret: Even though we sent more glyphs, we overall won out because the other tables were smaller. 06:33:41 Garret: Arabic: Somewhat similar results. Definitely more glyphs for the closure approach. But still a reduction in the final font size. 06:34:31 Garret: The final font sizes in Arabic are similar to the Latin results 06:35:10 Garret: In Amiri, GPOS is huge 06:35:42 Garret: but we only pull out a fraction of it into the result font 06:36:05 Garret: lastly: Devanagari. But here, the glyph closure explodes even higher. Approaches a 3x increase. 06:36:40 Garret: This has the opposite behavior. The layout is universally winning. This is likely because this font is glyph dominated and the closure explodes more extremely. 06:37:36 Garret: for Devanagari, a lot of the shaping rules are in the shaping software, not the GPOS tables, which means the GPOS tables are smaller than for, e.g. Arabic, where it's all in the GPOS tables. 06:38:03 Garret: These indic fonts benefit a lot from Compression. 600 KB compresses to 150KB. Compression will play a big part in the final story. 06:38:18 jpamental: This underscores the need for any viable solution to work with compression. 06:38:40 Garret: I suspect there are a lot of similar glyph shapes or outline shapes in the font. Those compress well 06:39:22 Garret: Key things: 1) The good news: Both methods were better than sending the whole font (even for Latin). 2) The question of closure vs layout depends on the question of script. 06:39:54 myles: No CJK? 06:40:30 Garret: nope, CJK fonts are basically all glyph data, and usually very little layout rules, so both approaches will probably be close to a tie. 06:40:34 KenADBE: Except for pan-CJK fonts, because they have extensive LOCL GSUB features. 06:40:46 KenADBE: Also, some of the higher end Japanese fonts will have significantly large GSUB features. 06:40:49 Garret: We don't have any fonts like that 06:41:03 KenADBE: I can't make it public, it's proprietary 06:41:54 KenADBE: I also made a font that has the same shaping rules as such a font, but has blank glyph outlines. This makes it not proprietary. 06:42:30 ned: We're talking about fonts where we wouldn't be using subroutines anyway, to build a font where each glyph contains the numeric represntation of the glyph ID. Then you could have the layout features that substitute for different glyphs,and the outlines would be different enough. 06:43:03 rsheeter: Or we could take the glyphs from another font 06:43:23 ned: yes, that sounds better 06:43:30 Garret: I can run some CJK numbers 06:44:30 Garret: another takeaway: There is some value in transfering tables other than glyf. We should look into doing it. If we could work that into the range approach, it would be beneficial. We would have to find tables that dont participate in shaping but are still per-glyph. 06:44:53 Garret: This tells me a patch based approach, codepoints or glyphs, could do better than a code points based approach. We need the full analysis framework to draw some conclusions 06:49:26 Garret: I have a spreadsheet of the raw data. If you want, I can share it. 06:49:55 Garret: Other interesting things: You can see how this behaves over a sequence. So closure will usually grab almost everything up front, and have small additional glyphs added. But the layout approach takes less up front and requests less down the line. 06:50:01 Garret: Similar with file size 06:50:06 Garret: I'll share this with the WG 06:51:54 myles: CJK fonts are large, so we need a solution for these big fonts. How do we know whether we need a single solution or two solutions, one for large fonts and one for medium fonts? 06:52:02 Garret: rsheeter: That will fall out of the performance analysis 06:52:45 ned: You mentioned roboto having a large GPOS table. Our system font's positioning information is 75% of the font. If you're talking about the limits of a font design, it is possible to have a latin font with a lot shaping rules 06:53:24 jpamental: It does bring up something. Unless we do the same analysis on something like proxima nova, or other really well developed professional fonts that are really have been worked on and have a ton of kerning pairs, there are going to be fonts not .... 06:53:35 rsheeter: I'd love to hear nominations for specific fonts. 06:53:58 jpamental: It was itneresting that different fonts have wildly differing glyph counts. Any font used for print that has been slaved on would be a candidate for having a ton of that information 06:56:43 myles: A solution that works really well for big fonts but not that well for small fonts may be more valuable than a solution that works really well for small fonts but not that well for big fonts 06:57:52 Vlad: 250 milliseconds spent to transfer subset or 300 milliseconds spent to transfer whole font. We have to include this in the analysis 06:59:14 myles: What I was trying to say is that we should be measuring absolute values like time and bytes, not percentages 06:59:42 jpamental: If other common web things like DNS prefetch and those kinds of things you can do to get assets to load faster, and the way HTTP2 works now. I wonder if that scenario has changed in what you were describing, Vlad 06:59:46 Vlad: I don't know. maybe? 07:00:06 Garret: network optimization in browsers are fairly complex. DNS prefetch prediction, etc. 07:00:37 jpamental: Looking at the work that harry roberts does, what he's laid out for good performance practices. I don't think people are following those for web font hosting. We should make sure that we're setting up a test that takes those things into account and make that part of a solution. 07:00:52 Vlad: my one purpose is we need to be careful to not see the results the way we want to see them. 07:02:05 jpamental: I'm thinking about making sure tha thowever we set up the test, we set it up using all of the current best possible practices, rather than dropping a line of CSS. If we want to see if this can be a high performance thing, where 100ms might be enough for someone to make a decision about this, let's make sure that we're testing a solution that uses all of the possible performacne tricks that a web developer could put into pracitce 07:02:30 Garret: We're not literally going to fire up network requests 07:02:31 myles: We should literally be firing up network requests 07:02:59 jpamental: If we're going to be testing this in different scenarios, we want to cover those things so that.... if what we're trying to do is improve the state of the art, we should set the bar at the state of the art. 07:03:03 Vlad: let's move on. 07:03:16 rsheeter: Let's talk about variable fonts 07:03:32 https://docs.google.com/spreadsheets/d/1WA5F9GBTqtIWCy1zo1xHz87_ohUWbxvJwDXcSuaLb_A/edit#gid=195687802 07:04:16 rsheeter: We took what variable fonts we could get our hands on, and tested with a partial instancer in font tools. We can see how much bigger a font gets if we delete axes. We learned that adding axes is really expensive. It varies by axis somewhat, but it can easily double the size of your font by adding an axis. 07:04:30 rsheeter: We should be contemplating the partial transfer of variable font axes. 07:05:20 jpamental: You can't remove the axis without putting it somewhere on the axis.... you have to snapshot the axis. 07:05:43 Garret: If you want the default position, you get lucky, but we coudl pin it somewhere in the space, and then later during the enrichment just throw that away. 07:06:01 rsheeter: Vertical fonts could save considerable bytes 07:06:05 jpamental: which fonts? 07:06:30 rsheeter: 07:07:02 Garret: Oswald is a variable font, and it's one of our top fonts. 07:07:10 myles: how commonly used are these fonts on the web? 07:07:20 https://fonts.google.com/specimen/Oswald 07:08:39 chris has joined #webfonts 07:08:42 jpamental: In my mind proves variable fonts worth in that scenario, where it can be a drop-in replacement for these other fonts. 07:09:08 Garret: Oswald is our 4th most popular font 07:09:27 overall stats @ https://fonts.google.com/analytics (featuring Oswald!) 07:10:10 https://github.com/google/fonts/tree/master/ofl/oswald 07:10:30 Vlad: A variable font can be used in 2 cases: 1) the same font is used in multiple different variation settings to make elaborate designs in a single page. 2) A font with optical size, where things may change if the window layout change, or if you change orientation from a portrait to landscape. 07:11:04 Vlad: If you serve the original default view, and no variation changes ever involved, that seems a win. But if I do move to landscape, that triggers variation changes. How will you respond. Will you calculate a new instance to send? 07:11:48 rsheeter: We're assuming at this point that the UA provides information to the server. The client could send a request for multiple desired points in the design space 07:12:46 rsheeter: it's worth considering if we can progressively enhance based on variable data. 07:13:07 Garret: We could talk about the design doc about patch'n'subset, or I prepared a talk about Harfbuzz 07:13:33 Vlad: the design document would be good. I just want to make sure that we create a timeline. 07:15:26 Vlad: for the byte range and font preprocessor. Do you have a timeline, Myles? 07:15:37 Myles: I will match whatever Google does on the test harness 07:15:42 Garret: maybe a quarter or two? 07:16:14 Myles: March sounds probably okay 07:17:14 rsheeter: We need to know how to do compression with the byte range method. 07:17:21 myles: yes, that has to be done first. Maybe a monthish? 07:18:06 Garret: the work on the data set will happen in parallel with the work on the framework. The data set should be done before the implementation is finished. 07:21:25 scribenick: rsheeter 07:21:39 Topic: Subset & Patch design doc 07:21:55 https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit# 07:22:47 garret: a protocol for simulation, not a final protocol. So uses shortcuts, ex protobufs which we wouldn't use in a final protocol. 07:23:22 garret: imagining post requests because the body can be substantial 07:23:52 garret: has fingerprints (hash) so we can confirm both sides are operating on the same thing 07:23:58 myles: call it something else 07:24:33 (discussion suggesting hash or checksum) 07:24:39 vlad: font id! 07:26:02 garret: request has hash of original font, hash of current client font, sets of codepoints desired & needed, hash of alternate indexing for compression 07:26:25 myles: what are the hashes on first request? 07:26:36 garret: omitted (discussion of proto optional fields) 07:27:19 garret: response has patch, codepoint remapping 07:27:53 (discussion of sending full codepoint set vs unicode-range to help client with font fallback) 07:28:37 garret: first request gives a full font, subsequent ones give a patch onto a full font. 07:29:09 garret: discussed how we handle discrepancies between client and server 07:29:34 myles: this seems complex, why would we get two independent implementations? 07:29:47 garret: don't think it's as bad as it looks 07:30:21 rod: compare to brotli, which is quite complex and has two implementations 07:30:37 andy: what's the point of the remapping 07:30:48 garret: shrinks the size of the codepoint set 07:31:17 garret: also allows grouping for obfuscation 07:31:42 garret: recommended a specific fingerprint 07:32:13 myles: what happens when the server sends something unexpected? 07:32:38 garret: checksums allow detection, client should essentially reset / start from scratch 07:33:20 garret: appreciated if any missing error handling is suggested 07:33:26 myles: what happens when it fails again? 07:33:52 garret: should work 07:33:58 myles: eventually won't, will have to fail over 07:34:09 garret: also supports upgrade to the underlying font (optional) 07:34:27 myles: seems unnecessary 07:34:58 myles: they don't change very often 07:35:08 andy, garret: we change them from time to time 07:35:26 jason: very few people / sites would notice 07:36:13 rod: we (google fonts) probably change more than most and it's still relatively rare 07:36:49 jason: desirable to patch forward 07:37:32 garret: hopefully public implementation helps with complexity 07:37:41 myles: concerned about complexity for UA 07:37:51 myles: has Chrome seen this? 07:38:03 garret: not *this* but we've talked about PFE with them 07:38:16 garret: we can tradeoff complexity against efficiency to some extent 07:39:16 (discussion of over-flexibility of all optional, is an aspect of simulation approach using protobufs not a final solution) 07:39:45 myles: this is a unique thing to be specifying 07:39:48 rod: what aspect? 07:39:53 myles: binary rpc protocol 07:40:12 (discussion of binary vs text in http2) 07:41:24 ned: woff2 is binary 07:41:28 myles: it's not communication 07:41:51 ned: concern is binary communication? 07:42:03 myles: just want to suggest it's complex 07:42:27 andy: we avoid complexity in internal font handling in return 07:43:41 andy: can imagine extending have/want on codepoints to more 07:43:53 (discussion of glyph ids, codepoints, variable space, etc) 07:44:29 myles: dislike telling the server what you already have, request size grows 07:44:37 garret: set compression is meant to alleviate this 07:45:11 garret: hitting 1 bit per codepoint, can reduce further via blocking 07:45:20 garret: it actually declines as you get more codepoints 07:45:34 rod: tl;dr runs are cheap 07:46:14 myles: what's next 07:46:22 garret: I can talk about harfbuzz if of interest 07:47:19 rod: hb-subset of interest because we believe it alleviates concerns with fonttools subsetter speed 07:48:40 rod: public link to presentation? 07:48:43 garret: will send later 07:49:50 garret: subsetter takes codepoints, glyph ids, layout features, etc as input. Reduces font to only what's requested to be retained. 07:50:33 garret: glyph closure is a key step, walking through layout tables, composite glyphs, etc to identify the complete set of glyph ids we need to keep 07:50:59 myles: can it cycle? 07:51:05 ned: yes. but parsers guard against it. 07:51:18 garret: also cmap 14 can flip glyphs 07:51:25 ken: two types in that table 07:51:32 garret: GSUB! 07:51:41 garret: the most expensive table to process 07:52:08 garret: find all the reachable lookups, iteratively applied to glyph set until glyph set stops change [or we hit a stop limit] 07:52:29 garret: lookups can recurse / invoke each other, track through all these 07:52:56 myles: can go forever? 07:53:11 garret: yes! but there are op limits in implementation 07:53:15 rod: fuzzer helps 07:53:31 garret: remap glyph ids, unless specifically requested otherwise 07:54:02 garret: then we actually prune table by table, keeping only data associated with glyph ids [usually] that are actually retained 07:54:21 garret: some tables work as pairs, classic example is loca/glyf 07:54:36 garret: other than layout (G*) it's usually simple 07:54:56 garret: G* tables are complex because it's a packing problem 07:55:12 garret: now we have all the subset tables, glue them together and declare victory 07:55:32 garret: that's generic, what's unusual about HarfBuzz? 07:55:46 garret: typically library parses font into an abstract representation 07:55:56 (which often means copying data around, reading the whole font, etc) 07:56:38 garret: harfbuzz mmaps and casts memory => data structures. No parsing, no memory for copies of structures. Because mmap'd we don't load parts of the font we don't use. 07:56:47 (so subsetting things like CJK fonts is VERY efficient) 07:57:06 garret: fun fact! magic endian conversion types 07:57:51 garret: no need to decompile explicitly. Custom iterator (range in proposed C++ std) gives view of subset data. 07:58:13 garret: FontTools subset cost is a function of the size of the input. Harfbuzz it's a function of the size of the output. 07:58:34 (walkthrough example code for HDMX) 07:59:30 (magic of C++ pretending to by Python :) 07:59:44 garret: So, how does it perform? 07:59:54 garret: Google Fonts uses this in production 08:00:15 garret: FontTools CJK fonts 400-600ms, hb-subset 0.8-1.2ms 08:00:37 garret: Noto Nastaliq [chosen for complex layout] 800ms with FontTools, hb-subset 4.6ms 08:00:54 garret: So, how complete is hb-subset? 08:01:10 garret: non-layout tables done for truetype and cff 08:01:20 rsheeter: GSUB/GPOS in progress 08:01:25 s/rsheeter/garret/ 08:01:35 garret: Adobe working on variation subsetting 08:01:43 garret: live in prod at scale 08:01:56 garret: intended to be complete (all G* layout types) by end of year 08:02:48 rod: in subset(before), subset(after), compute delta the delta is actually the slow part (surprisingly, at least to me) 08:03:57 jason: FontTools subset was breaking my italic axes two months ago 08:04:40 garret: bug please 08:05:33 rod: ideally a repro on an open source font 08:06:10 garret: but please report it regardless 08:08:26 ken: I have a font where 99% of the font is cmap 14 subtable 08:09:48 rrsagent, make minutes 08:09:48 I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris 08:10:41 present+ myles 08:10:51 present+ Jason 08:10:55 Present+ Ned 08:11:01 Present+ Vlad 08:11:33 rrsagent, make minutes 08:11:33 I have made the request to generate https://www.w3.org/2019/09/03-webfonts-minutes.html chris 08:48:46 jpamental has joined #webfonts 11:25:11 chris, you asked to be reminded at this time to go home 12:04:20 Zakim has left #webfonts