Web Fonts Working Group Teleconference -- 03 Sep 2019

<myles> ScribeNick: myles

Vlad: one more administrative item: The future working group telecons
... Zoom was a temporary measure. Is everyone happy with Zoom?

<chris> Meeting: Web Fonts WG f2f meeting

ned: I'm not the only one who has concerns

Garret: I have no preference

chris: The reason WebEx away is some group of people found the password for our previous bookings, and were racking up $xx,xxx of phone bills, so my account got cancelled.
... My certificate expired, so it will take a while. I'm in Japan until the 26th

Vlad: I'm perfectly happy to continue using my Zoom account. I like Zoom. So far it's been reliable.

myles: Maybe we can switch to WebEx once it becomes possible

Vlad: Okay.
... same time, same day?
... We had a doodle poll to see which times were worked well, and 9:00 am PDT was best for everyone

ned: We expected more international attendees. We couldn't find a time that worked for everyone, or someone who indicated a preference doesn't regularly join

Vlad: One person is that. That person is an outlier.
... This time resonated the most with everyone.

rsheeter: Monday at 9:00am PDT is best for me

Vlad: Okay, we'll continue with this time.
... We can keep this until the end of the year, hopefully WebEx will be working then.

Evaluation report

Vlad: Because chris has been out of the loop for a number of times, I was expecting you wanted to pick up work. So let's have a recap

chris: I did get a recap on a recent call. But any additional information would be good

Vlad: Would it be useful if we had a quick summary right here?

chris: yes

Vlad: I wanted to recap work that we've done. Myles?

<rsheeter> chris: file attachments will be kept by archive, inline/html may be dropped

Vlad: Most of the data was inline images aren't available on the mailing list

<scribe> ScribeNick: rsheeter

(so myles images may be gone from archive)

<chris> Once again, aplologies for the W3C list archive trashing emails with embedded images. This affects many WGs. Profuse apologies for our 1990s email archive software

myles: I made 2 reports to group. 1) possibility of range request solution 2) size of glyph closure (by count of glyph id)

https://lists.w3.org/Archives/Public/public-webfonts-wg/2019Feb/0028.html *does* show some images, seemingly contrary to expectation

chris: if mail client sends a text alternate archive will favor

myles: a few interesting conclusions. 1) CJK fonts are usually big, others aren't.

garret: emoji

ned: not sure if we should consider emoji in scope

myles: seems like mostly people use images for this use case

ken: trajan has a color engraved (svg) edition

myles: it's important to range request approach

garret: not just emoji, also for icon fonts

ned: maybe only support fonts with a single glyph representation

rod: kick this down the road to post simulation?

chris: it's still doable

<chris> ranges on sbix is no worse that ranges on the svg or COLR tables

vlad: first goal is to analyze for user experience
... once we have an approach we can start to impose limitations (say no bitmap font support)

ken: on the emoji committee people specifically want same rendering platform to platform

rod: people ask us for exactly that. we tell them to go away because we can't do it efficiently.

ned: you still need system support at some level
... there are few emoji fonts so there is little data

vlad: once we narrow to which solution gives best user experience we can discuss limitations

myles: ignoring emoji, large == CJK (more or less)
... the larger the font the lower the % taken up by non-outline data

(based on the whole font, not one reduced to fit the page)

myles: therefore range request makes sense, download everything non-glyph plus the set of glyphs you need

chris: does Korean use lots of standalone glyphs

ken: basic korean usually precomposed, very large set of additional chars
... example of pan-CJK table sizes, non-glyph is relatively small

Potential Approaches

garret: subset & patch approach. Client knows what they have and what they need, server sends a binary patch. Probably by computing subset-before, subset-after, and then diff between them to send back to the client.
... subsetter lives on server, can support closure and thus all possible glyphs accessible

myles: range request approach. Requires font pre-processing to be ready to stream. Puts everything except outlines at beginning, no interdependent glyphs.
... browser downloads / parses until it hits first outline. now it has all the shaping info + the locations of the glyphs and can load them as needed.

chris: if it's woff2 then you toss loca?

garret: woff2 probably doesn't work with this approach

myles: can either do block based, e.g. so range of input bytes can only alter know range of output bytes. Or drop compression, you still come out ahead.

(various ideas about how to compress bits and pieces to allow range requests)

ken: how are the missing glyphs represented on the client?

myles: implementation detail

ken: Adobe solution makes initial instantiation of font with base glyphs plus zeroed out missing ones, then can patch in others easily.

myles: browser/text stack interaction details should be out of scope for spec. Just the transfer mechanism.
... because we download all the shaping rules up front we can identify needed glyphs rather than closure off codepoints

garret: set of glyphs for closure can be larger, have data for later today on how this performs on real content.

vlad: we have documents describing prior approaches in reasonable detail
... Hybrid Approach.
... glyph closure can be hard to compute given features, ex small caps, stylistic sets, arabic positional, etc
... idea is to combine everything for shaping coming first, then letting client shape and select glyphs so it can ask for glyphs rather than codepoints (by something like subset+patch).
... means two - serial - calls to get content on the screen
... positive is you ask for significantly less data
... asking for glyphs reduces privacy consideration so you aren't leaking codepoints [at least not as directly]

garret: can do first request on codepoints, include *all* layout data + those codepoints to remove the serialization concern. Then requests 2+ are glyph based.
... enthused about testing these approaches

myles: if we design hybrid right it can work even with servers that don't support subset+patch
... for example, if we sent an optional header as a sentinel it could respond subset+patch, otherwise we just stream

jason: would that adequately obfuscate content on page?

myles: we need to obfuscate (likely request more content) in all cases

rod: it would require preprocessing the font?

myles: yes.

ned: has precedent, ex streamable pdf has a special table of contents

ken: adobe sends default glyphs based on language on first response. this may service the obfuscation need.

garret: I think maybe concern is mostly rare characters that are strongly suggestive. Latest proposal was to obfuscate only low frequency codepoints.

myles: degree of obfuscation is an interesting question, should specify minimum

rod: we saw orders of magnitude between common/rare

ned: domain specific characters, ex chemistry, are strongly indicative

rod: what is our threat model here?

ned: I think we assume requests are readable in transit

vlad: threat may apply to all solutions, in which case it doesn't alter the approach selection?

jason: I think it could change the selection

myles: New Approach!
... subset+patch client has to recognize it needs new chars , interact with server, client has to decompress
... browser has to know when set of chars needed changes. browser needs way to replace existing font with new one (can do today with css font loading).
... does that mean all we need is for browser to offer a js callback with the codepoint info?

grieger: I think TachyFont already does this, roughly

myles: can't do it without a DOM scan today

grieger: performance implications of js were a concern

myles: web assembly?

garret: interested in trying subset+patch in web asm :)

jason: it's nice that today webfonts doesn't require js

vlad: we have prior art on why this group didn't want a js solution

garret: performance and caching implications, especially cross-domain

rod: not having to load the scaffolding is a significant win

jason: js on low end devices may not be ideal (expensive processing)

myles: we should test it if we claim performance

garret: (agreed)

myles: imagine you can preprocess a font to make it more amenable to streaming. Output is still a valid opentype font file. So ... browser could just fetch the parts it wants today.
... possible problem with dynamic text
... so add a flag to try to stream (opt-in)

rod: agree, css opt-in

garret: rod and I discussed having this in the src: property

(discussion of needing opt-in, which everyone seems to think is desired)

<chris> CSS Fonts 4 has the supports() addition to src. https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing

myles: so then I can implement range requests today!
... format is purely for recognition, no influence on how we interpret the bytes that are delivered

ned: at last I can serve truetype fonts only to browsers that support woff!

(discussion of compression of range request approach)

vlad: Monotype has modified microtype express that allows rendering from the compressed datastream with huffman encoding using known tokens.

(digression into video streaming compression)

myles: I now have actually time scheduled for this!

garret: vlad can you show us your toys?

vlad: yes, ISO standard

<Vlad> The first font compression format standardized by ISO allows rendering font glyphs from a compressed data stream: ISO/IEC 14496-18

vlad: by end of this meeting I would like to have a timeline of what should happen next and who will do each thing

<myles> ScribeNick: myles

Vlad: I think by the end of this meeting we should have a timeline with names attributed to each item

<Vlad> ISO/IEC 14496-18: https://www.iso.org/standard/40151.html

Garret: I'm sharing the document!

<Garret> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit

Garret: I wrote this proposal! To try to outline what an analysis framework would look like. First, we want to list success criteria, what we want to measure
... 1) We want to load font data fairly. As much as possible, we want to load only the data for the current page.
... 2) We want to minimize user-perceptible delay
... "user perceptible" means we don't care what the timeline of what events happen when, but we only care about how long it takes until the user's content renders
... We need input data to drive the analysis. We should have a set of page view sequences. We don't need to know the URLs, but instead just need to know which characters are rendered with which font.
... Once we have that, then we need a corpus of fonts. The hope here is to cover lots of languages, font technologies, OpenType, TrueType, etc. We want the whole spectrum of fonts in use. Maybe this means emoji and icon fonts too.
... For the analysis: for each font font family, go through the page sequence, calculate how much network delay is imposed for each view, and assume that fonts are cached between page views. This will tell us at minimum how many requests and how many bytes will cost.

chris: This doesn't include any styling information.

Garret: Yes, we want to simplify this.
... We might need to capture the set of font features that are needed

myles: really?

jpamental: Does this happen before or after the CSS is loaded and parse? That could contain content.

Garret: Figuring how how we turn a set of URLs into a code point set is an open problem.
... We'll try to include things like layout features.

rsheeter: we will probably need to expose used glyph information too

Garret: So we'll be listing the number of network requests and the size of each one. Then, we turn that into an estimated network delay. To do this, we will do some handwaving, and assume some network parameters, like throughput, RTT, etc. We would probably have multiple profiles and run this same analysis for each profile (e.g. see what it would be like on 3G, broadband, etc.)

myles: Can we make sure this is experimentally gathered

Garret: WebPageTest has a data set that can do this
... The page view sequences will only include sequences which happened multiple times (For anonymization)
... Then we need to aggregate all the delays across all the page views. To do this, we need to use a cost function. We haven't proposed a particular one, but here are some characterizations: 1) It is always increasing 2) it isn't linear (hitting 3s is real bad b/c browsers show fallback fonts) 3) the total number of bytes is secondary, but still somewhat important. Even if the delays are nice, people pay $$ for bytes

rsheeter: We did an exercise for finding solutions that are bad solutions but our metrics would say are very good.

Garret: Proposal: There's some time point P which is the time to load the page. If the time is less than that, the cost is 0. Beyond that, it increases exponentially. Perhaps the group could come up with something better, but this is a starting point
... We also are interested in bytes transferred. So we can calculate an efficiency metric by comparing number of bytes downloaded with the "optimal" amount
... The network delay: There's some base round trip time, plus some time-per-byte. We can just add them.

Myles: is this experimetnally gathered?

Garret: We can gather some data for this
... In addition to the proposed PFE methods, we should test using whole font files, unicode-range and language-based subsetting, and Adobe's solution
... If we learned more about Adobe's solution, we can feed that into the analysis

KenADBE: I just put in a request for that

Garret: HTTP range transfers, and two more "optimal" solutions which give us a baseline. This computes the optimal subset, and assign this to each page by just dividing it by the number of pages

chris: This incremental thing is also a privacy leak. If I make a page with a few interesting characters on it, and I'm monitoring what gets requested, i know the user's browsing history

<all> : yep

Garret: We want to represent arabic and indic, lots of types of fonts, and CJK too.
... We also might want to test variable and non-variable fonts as well

jpamental: This technology is what will enable variable fonts to work

chris: Being able to strip out unused axes might be helpful

KenADBE: What about font collections?

Garret: Yes, we should add .ttc files to this too

jpamental: From the web design world, most of my audience is western, but there is interest in what we're doing combined with variable fonts is through the roof.

rsheeter: We have a self interest for this too.
... It can end up being a huge penalty to send a whole variable font when only one point on the design space is used

jpamental: Variable fonts aren't actually that big

rsheeter: yes they are

<rsheeter> (at least some of them)

chris: If you've got variable fonts, and it's CFF2, then it may be more or less amenable
... I think CFF2 is more tractable for that approach

ned: It requires the same transformations, but the modifications ....

chris: It doesn't duplicate a bunch of stuff that's already in the other tables

ned: wellllllllll..... traditionally, type 1 multiple master fonts baked everything into the glyph description. And CFF 2 uses the same mechanisms, but it removes the data and puts it in a side table in CFF 2. It's a sidecar table within the CFF 2 SFNT. We would need to address that as well.

Garret: The google fonts collection is a good starting point, but we'll need more. We have holes in our collection

Vlad: Which fonts can be donated? We need some rare scripts, certain shaping types.... I heard that most of the elaborate and exotic fonts are present in the Noto fonts family. If we want to make sure it works everywhere, we can use Noto

Garret: We have basically no CFF fonts.
... Our technology stack didn't support CFF until recently
... Noto CJK is one of the first ones

myles: what about font files that are actually used on the web?

rsheeter: Yes, we should determine what are the top used fonts on the web

KenADBE: You would permission to use arbitrary fonts from the web.

myles: Even if we couldn't use the "real" fonts because of licensing, could we at least characterize the fonts that are used so we could try to make our own collection mirror the real collection

jpamental: Aspriationally, this solution will enable, or at least not restrict, people from designing more freely. I want people to be able to use more weights than just 2.

rsheeter: Yes, this shouldn't be too difficult to show.

<KenADBE> In case I haven't shared it before, our Source Han Serif promotional website makes use of our dynamic augmentation → https://source.typekit.com/source-han-serif/

Garret: Is this the wya we want to do this analysis?

Vlad: this analysis covers everything that matters

chris: this is the right way

myles: We need to actually measure some of these scenarios, on real networks. If they match what the analysis produces, then that's good. if not, we need to go back and redesign something

Garret: rsheeter: yes, good idea
... Practical concerns: 1) Repository location, and 2) assigning owners

chris: W3C has github repos

Garret: programming language?
... Python?

ned: Python already has a bunch of libraries for namipualtin font data

Garret: it's easy to run C code from python, so we can do that (to run the harfbuzz stuff)

ned: if there is code that is tested that is performance sensitive, we shouldn't use python. But if you just want a bunch of tools for examining font data, then Python is great.

Garret: For thing like "what code points does this font support" that's easy

ned: Do we want to write a new toolkit?
... let's just use FontTools
... font production engineers use python.

<rsheeter> rod: +1 for this toolchain

rsheeter: What happens if we can't make the browsing data public?

myles: We just figure out a way to get other data

Garret: We can run the analysis on the internal data set and external one

chris: And they should match. If they don't, we throw out the private data
... It would be nice if font foundries could run these tools on their own font corpuses

Vlad: Most groups in the W3C strive to operate in public. But lots of groups have some private discussions
... If we cannot share our data set publicly, maybe that's okay?

chris: Nowadays, that would require extraordinary persuasive power to make a group that is member-only
... In general, open publishing and verication is the way that science is going. I would like us to proceed in that manner.

rsheeter: So private data would be useful, but non-sufficient

ned: we have a tool that would be run on a private data set. If that tool can be distributed and run on any data set, private or public, then theoretically there's no need for this group to publish results that rely on a private data set
... if we can run that tool on a private and a public data set, and show that there's no divergence, then that shows the public data set is sufficient

KenADBE: we serve both open source fonts and proprietary fonts. The public data set would include those open source fonts. if we see the numbers are off for the proprietary fonts, then ....

ned: Then we need to see what the difference is between the proprieratry fonts and the public fonts.

chris: If both data sets get the same results, that's good to know. Otherwise, we'd still like to know
... The evaluation report is the justification for the work we do here. We have to prove that our work is viable

Vlad: I'm trying to open up Monotype's font corpus for this analysis. But this will absolutely not be available publicly.

Garret: it's valuable for Monotype to measure their internal data set and share their results

jpamental: During this process, if everyone gets different results, we need to work more on the tool.

Garret: Anyone voluteering?

<all> ....

<all> .... lunchtime!

<KenADBE> https://www.yelp.com/biz/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA?osq=kua%27aina

<KenADBE> https://www.yelp.com/map/%E3%82%AF%E3%82%A2%E3%82%A2%E3%82%A4%E3%83%8A-%E3%82%A2%E3%82%AF%E3%82%A2%E3%82%B7%E3%83%86%E3%82%A3%E3%81%8A%E5%8F%B0%E5%A0%B4%E5%BA%97-%E6%B8%AF%E5%8C%BA

<Vlad> trackbot, start meeting

<trackbot> Meeting: Web Fonts Working Group Teleconference

<trackbot> Date: 03 September 2019

<scribe> ScribeNick: myles

Garret: as a follow-on to Myles's analysis about glyph closures, I did something similar that is more realistic that looks at text on real webpages.
... I simulated the glyph needed for sample page views, by looking at both glyphs and glyph closures. For the layout approach, we shape all the words in the text (using harfbuzz), and collected the set of unique glyph IDs. For the closure approach, computed the unique set of code points in the text, and the size of the glyph closure

I grabbed a collection of wikipedia pages, and looked at 6 fonts.

Garret: This is a small sample, but it's still interesting.

The result of the analysis: The number of glyphs that are needed, and estimated the size of a font would be with the two different approaches.

Garret: So this is a mini analysis
... For the layout approach, I estimated the font size by taking the size of everything other than outlines, and added only the outlines that were needed. For the glyph closure size, we just count the glyph subset
... So we need 64 glyphs if we're shaping, and if we're looking at closure, we need 90 glpyhs. All the non-glyph tables 46 kb and the glyf table is 125 kb in the original font
... The font using the layout approach is 46kb + 9kb = 56kb
... The font using the glyph closure size, the glyph closure is 17kb, because we don't need the non-glyph tables. We subset all the tanles

*all the tables

Garret: If we go to a second page. Here, we need 8 more glyphs for both methods. This increases both fonts by 1 kb
... Caveats: This is a very small data set, so take it as you will. I used Wikipedia pages, which are not representative.
... I only tested with a small handful of fonts
... I didn't consider font compression
... for languages, I tested English, Arabic (Persian), and Indic (Devanagari)
... If we run through 6 page views, this is the number of glyphs for each approach. The whole font file is significantly larger than just the glyphs we need. The closure-based method universally needs more code points than the glyph method, but they were roughly the same. The worse case was 50% more glyphs
... Then we looked at the final font size. For latin, the closure approach produced a significantly smaller font file. For Open Sans, <missed>

For almost all fonts, there's a reduction in font size

Garret: The closure size is smaller, because the non-glyf tables were subset.
... e.g Roboto has a large GPOS table: 24KB.
... Even though we sent more glyphs, we overall won out because the other tables were smaller.
... Arabic: Somewhat similar results. Definitely more glyphs for the closure approach. But still a reduction in the final font size.
... The final font sizes in Arabic are similar to the Latin results
... In Amiri, GPOS is huge
... but we only pull out a fraction of it into the result font
... lastly: Devanagari. But here, the glyph closure explodes even higher. Approaches a 3x increase.
... This has the opposite behavior. The layout is universally winning. This is likely because this font is glyph dominated and the closure explodes more extremely.
... for Devanagari, a lot of the shaping rules are in the shaping software, not the GPOS tables, which means the GPOS tables are smaller than for, e.g. Arabic, where it's all in the GPOS tables.
... These indic fonts benefit a lot from Compression. 600 KB compresses to 150KB. Compression will play a big part in the final story.

jpamental: This underscores the need for any viable solution to work with compression.

Garret: I suspect there are a lot of similar glyph shapes or outline shapes in the font. Those compress well
... Key things: 1) The good news: Both methods were better than sending the whole font (even for Latin). 2) The question of closure vs layout depends on the question of script.

myles: No CJK?

Garret: nope, CJK fonts are basically all glyph data, and usually very little layout rules, so both approaches will probably be close to a tie.

KenADBE: Except for pan-CJK fonts, because they have extensive LOCL GSUB features.
... Also, some of the higher end Japanese fonts will have significantly large GSUB features.

Garret: We don't have any fonts like that

KenADBE: I can't make it public, it's proprietary
... I also made a font that has the same shaping rules as such a font, but has blank glyph outlines. This makes it not proprietary.

ned: We're talking about fonts where we wouldn't be using subroutines anyway, to build a font where each glyph contains the numeric represntation of the glyph ID. Then you could have the layout features that substitute for different glyphs,and the outlines would be different enough.

rsheeter: Or we could take the glyphs from another font

ned: yes, that sounds better

Garret: I can run some CJK numbers
... another takeaway: There is some value in transfering tables other than glyf. We should look into doing it. If we could work that into the range approach, it would be beneficial. We would have to find tables that dont participate in shaping but are still per-glyph.
... This tells me a patch based approach, codepoints or glyphs, could do better than a code points based approach. We need the full analysis framework to draw some conclusions
... I have a spreadsheet of the raw data. If you want, I can share it.
... Other interesting things: You can see how this behaves over a sequence. So closure will usually grab almost everything up front, and have small additional glyphs added. But the layout approach takes less up front and requests less down the line.
... Similar with file size
... I'll share this with the WG

myles: CJK fonts are large, so we need a solution for these big fonts. How do we know whether we need a single solution or two solutions, one for large fonts and one for medium fonts?

Garret: rsheeter: That will fall out of the performance analysis

ned: You mentioned roboto having a large GPOS table. Our system font's positioning information is 75% of the font. If you're talking about the limits of a font design, it is possible to have a latin font with a lot shaping rules

jpamental: It does bring up something. Unless we do the same analysis on something like proxima nova, or other really well developed professional fonts that are really have been worked on and have a ton of kerning pairs, there are going to be fonts not ....

rsheeter: I'd love to hear nominations for specific fonts.

jpamental: It was itneresting that different fonts have wildly differing glyph counts. Any font used for print that has been slaved on would be a candidate for having a ton of that information

myles: A solution that works really well for big fonts but not that well for small fonts may be more valuable than a solution that works really well for small fonts but not that well for big fonts

Vlad: 250 milliseconds spent to transfer subset or 300 milliseconds spent to transfer whole font. We have to include this in the analysis

myles: What I was trying to say is that we should be measuring absolute values like time and bytes, not percentages

jpamental: If other common web things like DNS prefetch and those kinds of things you can do to get assets to load faster, and the way HTTP2 works now. I wonder if that scenario has changed in what you were describing, Vlad

Vlad: I don't know. maybe?

Garret: network optimization in browsers are fairly complex. DNS prefetch prediction, etc.

jpamental: Looking at the work that harry roberts does, what he's laid out for good performance practices. I don't think people are following those for web font hosting. We should make sure that we're setting up a test that takes those things into account and make that part of a solution.

Vlad: my one purpose is we need to be careful to not see the results the way we want to see them.

jpamental: I'm thinking about making sure tha thowever we set up the test, we set it up using all of the current best possible practices, rather than dropping a line of CSS. If we want to see if this can be a high performance thing, where 100ms might be enough for someone to make a decision about this, let's make sure that we're testing a solution that uses all of the possible performacne tricks that a web developer could put into pracitce

Garret: We're not literally going to fire up network requests

myles: We should literally be firing up network requests

jpamental: If we're going to be testing this in different scenarios, we want to cover those things so that.... if what we're trying to do is improve the state of the art, we should set the bar at the state of the art.

Vlad: let's move on.

rsheeter: Let's talk about variable fonts

<rsheeter> https://docs.google.com/spreadsheets/d/1WA5F9GBTqtIWCy1zo1xHz87_ohUWbxvJwDXcSuaLb_A/edit#gid=195687802

rsheeter: We took what variable fonts we could get our hands on, and tested with a partial instancer in font tools. We can see how much bigger a font gets if we delete axes. We learned that adding axes is really expensive. It varies by axis somewhat, but it can easily double the size of your font by adding an axis.
... We should be contemplating the partial transfer of variable font axes.

jpamental: You can't remove the axis without putting it somewhere on the axis.... you have to snapshot the axis.

Garret: If you want the default position, you get lucky, but we coudl pin it somewhere in the space, and then later during the enrichment just throw that away.

rsheeter: Vertical fonts could save considerable bytes

jpamental: which fonts?

rsheeter: <lists fonts>

Garret: Oswald is a variable font, and it's one of our top fonts.

myles: how commonly used are these fonts on the web?

<rsheeter> https://fonts.google.com/specimen/Oswald

jpamental: In my mind proves variable fonts worth in that scenario, where it can be a drop-in replacement for these other fonts.

Garret: Oswald is our 4th most popular font

<rsheeter> overall stats @ https://fonts.google.com/analytics (featuring Oswald!)

<Garret> https://github.com/google/fonts/tree/master/ofl/oswald

Vlad: A variable font can be used in 2 cases: 1) the same font is used in multiple different variation settings to make elaborate designs in a single page. 2) A font with optical size, where things may change if the window layout change, or if you change orientation from a portrait to landscape.
... If you serve the original default view, and no variation changes ever involved, that seems a win. But if I do move to landscape, that triggers variation changes. How will you respond. Will you calculate a new instance to send?

rsheeter: We're assuming at this point that the UA provides information to the server. The client could send a request for multiple desired points in the design space
... it's worth considering if we can progressively enhance based on variable data.

Garret: We could talk about the design doc about patch'n'subset, or I prepared a talk about Harfbuzz

Vlad: the design document would be good. I just want to make sure that we create a timeline.
... for the byte range and font preprocessor. Do you have a timeline, Myles?

Myles: I will match whatever Google does on the test harness

Garret: maybe a quarter or two?

Myles: March sounds probably okay

rsheeter: We need to know how to do compression with the byte range method.

myles: yes, that has to be done first. Maybe a monthish?

Garret: the work on the data set will happen in parallel with the work on the framework. The data set should be done before the implementation is finished.

<rsheeter> scribenick: rsheeter

Subset & Patch design doc

<Garret> https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit#

garret: a protocol for simulation, not a final protocol. So uses shortcuts, ex protobufs which we wouldn't use in a final protocol.
... imagining post requests because the body can be substantial
... has fingerprints (hash) so we can confirm both sides are operating on the same thing

myles: call it something else

(discussion suggesting hash or checksum)

vlad: font id!

garret: request has hash of original font, hash of current client font, sets of codepoints desired & needed, hash of alternate indexing for compression

myles: what are the hashes on first request?

garret: omitted (discussion of proto optional fields)
... response has patch, codepoint remapping

(discussion of sending full codepoint set vs unicode-range to help client with font fallback)

garret: first request gives a full font, subsequent ones give a patch onto a full font.
... discussed how we handle discrepancies between client and server

myles: this seems complex, why would we get two independent implementations?

garret: don't think it's as bad as it looks

rod: compare to brotli, which is quite complex and has two implementations

andy: what's the point of the remapping

garret: shrinks the size of the codepoint set
... also allows grouping for obfuscation
... recommended a specific fingerprint

myles: what happens when the server sends something unexpected?

garret: checksums allow detection, client should essentially reset / start from scratch
... appreciated if any missing error handling is suggested

myles: what happens when it fails again?

garret: should work

myles: eventually won't, will have to fail over

garret: also supports upgrade to the underlying font (optional)

myles: seems unnecessary
... they don't change very often

andy, garret: we change them from time to time

jason: very few people / sites would notice

rod: we (google fonts) probably change more than most and it's still relatively rare

jason: desirable to patch forward

garret: hopefully public implementation helps with complexity

myles: concerned about complexity for UA
... has Chrome seen this?

garret: not *this* but we've talked about PFE with them
... we can tradeoff complexity against efficiency to some extent

(discussion of over-flexibility of all optional, is an aspect of simulation approach using protobufs not a final solution)

myles: this is a unique thing to be specifying

rod: what aspect?

myles: binary rpc protocol

(discussion of binary vs text in http2)

ned: woff2 is binary

myles: it's not communication

ned: concern is binary communication?

myles: just want to suggest it's complex

andy: we avoid complexity in internal font handling in return
... can imagine extending have/want on codepoints to more

(discussion of glyph ids, codepoints, variable space, etc)

myles: dislike telling the server what you already have, request size grows

garret: set compression is meant to alleviate this
... hitting 1 bit per codepoint, can reduce further via blocking
... it actually declines as you get more codepoints

rod: tl;dr runs are cheap

myles: what's next

garret: I can talk about harfbuzz if of interest

rod: hb-subset of interest because we believe it alleviates concerns with fonttools subsetter speed
... public link to presentation?

garret: will send later
... subsetter takes codepoints, glyph ids, layout features, etc as input. Reduces font to only what's requested to be retained.
... glyph closure is a key step, walking through layout tables, composite glyphs, etc to identify the complete set of glyph ids we need to keep

myles: can it cycle?

ned: yes. but parsers guard against it.

garret: also cmap 14 can flip glyphs

ken: two types in that table

garret: GSUB!
... the most expensive table to process
... find all the reachable lookups, iteratively applied to glyph set until glyph set stops change [or we hit a stop limit]
... lookups can recurse / invoke each other, track through all these

myles: can go forever?

garret: yes! but there are op limits in implementation

rod: fuzzer helps

garret: remap glyph ids, unless specifically requested otherwise
... then we actually prune table by table, keeping only data associated with glyph ids [usually] that are actually retained
... some tables work as pairs, classic example is loca/glyf
... other than layout (G*) it's usually simple
... G* tables are complex because it's a packing problem
... now we have all the subset tables, glue them together and declare victory
... that's generic, what's unusual about HarfBuzz?
... typically library parses font into an abstract representation

(which often means copying data around, reading the whole font, etc)

garret: harfbuzz mmaps and casts memory => data structures. No parsing, no memory for copies of structures. Because mmap'd we don't load parts of the font we don't use.

(so subsetting things like CJK fonts is VERY efficient)

garret: fun fact! magic endian conversion types
... no need to decompile explicitly. Custom iterator (range in proposed C++ std) gives view of subset data.
... FontTools subset cost is a function of the size of the input. Harfbuzz it's a function of the size of the output.

(walkthrough example code for HDMX)

(magic of C++ pretending to by Python :)

garret: So, how does it perform?
... Google Fonts uses this in production
... FontTools CJK fonts 400-600ms, hb-subset 0.8-1.2ms
... Noto Nastaliq [chosen for complex layout] 800ms with FontTools, hb-subset 4.6ms
... So, how complete is hb-subset?
... non-layout tables done for truetype and cff
... GSUB/GPOS in progress
... Adobe working on variation subsetting
... live in prod at scale
... intended to be complete (all G* layout types) by end of year

rod: in subset(before), subset(after), compute delta the delta is actually the slow part (surprisingly, at least to me)

jason: FontTools subset was breaking my italic axes two months ago

garret: bug please

rod: ideally a repro on an open source font

garret: but please report it regardless

ken: I have a font where 99% of the font is cmap 14 subtable

- DRAFT -

Web Fonts Working Group Teleconference

03 Sep 2019

Attendees

Contents

Evaluation report

Potential Approaches

Subset & Patch design doc

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output