<myles> ScribeNick: myles
Vlad: one more administrative
item: The future working group telecons
... Zoom was a temporary measure. Is everyone happy with
Zoom?
<chris> Meeting: Web Fonts WG f2f meeting
ned: I'm not the only one who has concerns
Garret: I have no preference
chris: The reason WebEx away is
some group of people found the password for our previous
bookings, and were racking up $xx,xxx of phone bills, so my
account got cancelled.
... My certificate expired, so it will take a while. I'm in
Japan until the 26th
Vlad: I'm perfectly happy to continue using my Zoom account. I like Zoom. So far it's been reliable.
myles: Maybe we can switch to WebEx once it becomes possible
Vlad: Okay.
... same time, same day?
... We had a doodle poll to see which times were worked well,
and 9:00 am PDT was best for everyone
ned: We expected more international attendees. We couldn't find a time that worked for everyone, or someone who indicated a preference doesn't regularly join
Vlad: One person is that. That
person is an outlier.
... This time resonated the most with everyone.
rsheeter: Monday at 9:00am PDT is best for me
Vlad: Okay, we'll continue with
this time.
... We can keep this until the end of the year, hopefully WebEx
will be working then.
Vlad: Because chris has been out of the loop for a number of times, I was expecting you wanted to pick up work. So let's have a recap
chris: I did get a recap on a recent call. But any additional information would be good
Vlad: Would it be useful if we had a quick summary right here?
chris: yes
Vlad: I wanted to recap work that we've done. Myles?
<rsheeter> chris: file attachments will be kept by archive, inline/html may be dropped
Vlad: Most of the data was inline images aren't available on the mailing list
<scribe> ScribeNick: rsheeter
(so myles images may be gone from archive)
<chris> Once again, aplologies for the W3C list archive trashing emails with embedded images. This affects many WGs. Profuse apologies for our 1990s email archive software
myles: I made 2 reports to group. 1) possibility of range request solution 2) size of glyph closure (by count of glyph id)
https://lists.w3.org/Archives/Public/public-webfonts-wg/2019Feb/0028.html *does* show some images, seemingly contrary to expectation
chris: if mail client sends a text alternate archive will favor
myles: a few interesting conclusions. 1) CJK fonts are usually big, others aren't.
garret: emoji
ned: not sure if we should consider emoji in scope
myles: seems like mostly people use images for this use case
ken: trajan has a color engraved (svg) edition
myles: it's important to range request approach
garret: not just emoji, also for icon fonts
ned: maybe only support fonts with a single glyph representation
rod: kick this down the road to post simulation?
chris: it's still doable
<chris> ranges on sbix is no worse that ranges on the svg or COLR tables
vlad: first goal is to analyze
for user experience
... once we have an approach we can start to impose limitations
(say no bitmap font support)
ken: on the emoji committee people specifically want same rendering platform to platform
rod: people ask us for exactly that. we tell them to go away because we can't do it efficiently.
ned: you still need system
support at some level
... there are few emoji fonts so there is little data
vlad: once we narrow to which solution gives best user experience we can discuss limitations
myles: ignoring emoji, large ==
CJK (more or less)
... the larger the font the lower the % taken up by non-outline
data
(based on the whole font, not one reduced to fit the page)
myles: therefore range request makes sense, download everything non-glyph plus the set of glyphs you need
chris: does Korean use lots of standalone glyphs
ken: basic korean usually
precomposed, very large set of additional chars
... example of pan-CJK table sizes, non-glyph is relatively
small
garret: subset & patch
approach. Client knows what they have and what they need,
server sends a binary patch. Probably by computing
subset-before, subset-after, and then diff between them to send
back to the client.
... subsetter lives on server, can support closure and thus all
possible glyphs accessible
myles: range request approach.
Requires font pre-processing to be ready to stream. Puts
everything except outlines at beginning, no interdependent
glyphs.
... browser downloads / parses until it hits first outline. now
it has all the shaping info + the locations of the glyphs and
can load them as needed.
chris: if it's woff2 then you toss loca?
garret: woff2 probably doesn't work with this approach
myles: can either do block based, e.g. so range of input bytes can only alter know range of output bytes. Or drop compression, you still come out ahead.
(various ideas about how to compress bits and pieces to allow range requests)
ken: how are the missing glyphs represented on the client?
myles: implementation detail
ken: Adobe solution makes initial instantiation of font with base glyphs plus zeroed out missing ones, then can patch in others easily.
myles: browser/text stack
interaction details should be out of scope for spec. Just the
transfer mechanism.
... because we download all the shaping rules up front we can
identify needed glyphs rather than closure off codepoints
garret: set of glyphs for closure can be larger, have data for later today on how this performs on real content.
vlad: we have documents
describing prior approaches in reasonable detail
... Hybrid Approach.
... glyph closure can be hard to compute given features, ex
small caps, stylistic sets, arabic positional, etc
... idea is to combine everything for shaping coming first,
then letting client shape and select glyphs so it can ask for
glyphs rather than codepoints (by something like
subset+patch).
... means two - serial - calls to get content on the
screen
... positive is you ask for significantly less data
... asking for glyphs reduces privacy consideration so you
aren't leaking codepoints [at least not as directly]
garret: can do first request on
codepoints, include *all* layout data + those codepoints to
remove the serialization concern. Then requests 2+ are glyph
based.
... enthused about testing these approaches
myles: if we design hybrid right
it can work even with servers that don't support
subset+patch
... for example, if we sent an optional header as a sentinel it
could respond subset+patch, otherwise we just stream
jason: would that adequately obfuscate content on page?
myles: we need to obfuscate (likely request more content) in all cases
rod: it would require preprocessing the font?
myles: yes.
ned: has precedent, ex streamable pdf has a special table of contents
ken: adobe sends default glyphs based on language on first response. this may service the obfuscation need.
garret: I think maybe concern is mostly rare characters that are strongly suggestive. Latest proposal was to obfuscate only low frequency codepoints.
myles: degree of obfuscation is an interesting question, should specify minimum
rod: we saw orders of magnitude between common/rare
ned: domain specific characters, ex chemistry, are strongly indicative
rod: what is our threat model here?
ned: I think we assume requests are readable in transit
vlad: threat may apply to all solutions, in which case it doesn't alter the approach selection?
jason: I think it could change the selection
myles: New Approach!
... subset+patch client has to recognize it needs new chars ,
interact with server, client has to decompress
... browser has to know when set of chars needed changes.
browser needs way to replace existing font with new one (can do
today with css font loading).
... does that mean all we need is for browser to offer a js
callback with the codepoint info?
grieger: I think TachyFont already does this, roughly
myles: can't do it without a DOM scan today
grieger: performance implications of js were a concern
myles: web assembly?
garret: interested in trying subset+patch in web asm :)
jason: it's nice that today webfonts doesn't require js
vlad: we have prior art on why this group didn't want a js solution
garret: performance and caching implications, especially cross-domain
rod: not having to load the scaffolding is a significant win
jason: js on low end devices may not be ideal (expensive processing)
myles: we should test it if we claim performance
garret: (agreed)
myles: imagine you can preprocess
a font to make it more amenable to streaming. Output is still a
valid opentype font file. So ... browser could just fetch the
parts it wants today.
... possible problem with dynamic text
... so add a flag to try to stream (opt-in)
rod: agree, css opt-in
garret: rod and I discussed having this in the src: property
(discussion of needing opt-in, which everyone seems to think is desired)
<chris> CSS Fonts 4 has the supports() addition to src. https://drafts.csswg.org/css-fonts-4/#font-face-src-parsing
myles: so then I can implement
range requests today!
... format is purely for recognition, no influence on how we
interpret the bytes that are delivered
ned: at last I can serve truetype fonts only to browsers that support woff!
(discussion of compression of range request approach)
vlad: Monotype has modified microtype express that allows rendering from the compressed datastream with huffman encoding using known tokens.
(digression into video streaming compression)
myles: I now have actually time scheduled for this!
garret: vlad can you show us your toys?
vlad: yes, ISO standard
<Vlad> The first font compression format standardized by ISO allows rendering font glyphs from a compressed data stream: ISO/IEC 14496-18
vlad: by end of this meeting I would like to have a timeline of what should happen next and who will do each thing
<myles> ScribeNick: myles
Vlad: I think by the end of this meeting we should have a timeline with names attributed to each item
<Vlad> ISO/IEC 14496-18: https://www.iso.org/standard/40151.html
hi
Garret: I'm sharing the document!
<Garret> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit
Garret: I wrote this proposal! To
try to outline what an analysis framework would look like.
First, we want to list success criteria, what we want to
measure
... 1) We want to load font data fairly. As much as possible,
we want to load only the data for the current page.
... 2) We want to minimize user-perceptible delay
... "user perceptible" means we don't care what the timeline of
what events happen when, but we only care about how long it
takes until the user's content renders
... We need input data to drive the analysis. We should have a
set of page view sequences. We don't need to know the URLs, but
instead just need to know which characters are rendered with
which font.
... Once we have that, then we need a corpus of fonts. The hope
here is to cover lots of languages, font technologies,
OpenType, TrueType, etc. We want the whole spectrum of fonts in
use. Maybe this means emoji and icon fonts too.
... For the analysis: for each font font family, go through the
page sequence, calculate how much network delay is imposed for
each view, and assume that fonts are cached between page views.
This will tell us at minimum how many requests and how many
bytes will cost.
chris: This doesn't include any styling information.
Garret: Yes, we want to simplify
this.
... We might need to capture the set of font features that are
needed
myles: really?
jpamental: Does this happen before or after the CSS is loaded and parse? That could contain content.
Garret: Figuring how how we turn
a set of URLs into a code point set is an open problem.
... We'll try to include things like layout features.
rsheeter: we will probably need to expose used glyph information too
Garret: So we'll be listing the number of network requests and the size of each one. Then, we turn that into an estimated network delay. To do this, we will do some handwaving, and assume some network parameters, like throughput, RTT, etc. We would probably have multiple profiles and run this same analysis for each profile (e.g. see what it would be like on 3G, broadband, etc.)
myles: Can we make sure this is experimentally gathered
Garret: WebPageTest has a data
set that can do this
... The page view sequences will only include sequences which
happened multiple times (For anonymization)
... Then we need to aggregate all the delays across all the
page views. To do this, we need to use a cost function. We
haven't proposed a particular one, but here are some
characterizations: 1) It is always increasing 2) it isn't
linear (hitting 3s is real bad b/c browsers show fallback
fonts) 3) the total number of bytes is secondary, but still
somewhat important. Even if the delays are nice, people pay $$
for bytes
rsheeter: We did an exercise for finding solutions that are bad solutions but our metrics would say are very good.
Garret: Proposal: There's some
time point P which is the time to load the page. If the time is
less than that, the cost is 0. Beyond that, it increases
exponentially. Perhaps the group could come up with something
better, but this is a starting point
... We also are interested in bytes transferred. So we can
calculate an efficiency metric by comparing number of bytes
downloaded with the "optimal" amount
... The network delay: There's some base round trip time, plus
some time-per-byte. We can just add them.
Myles: is this experimetnally gathered?
Garret: We can gather some data
for this
... In addition to the proposed PFE methods, we should test
using whole font files, unicode-range and language-based
subsetting, and Adobe's solution
... If we learned more about Adobe's solution, we can feed that
into the analysis
KenADBE: I just put in a request for that
Garret: HTTP range transfers, and two more "optimal" solutions which give us a baseline. This computes the optimal subset, and assign this to each page by just dividing it by the number of pages
chris: This incremental thing is also a privacy leak. If I make a page with a few interesting characters on it, and I'm monitoring what gets requested, i know the user's browsing history
<all> : yep
Garret: We want to represent
arabic and indic, lots of types of fonts, and CJK too.
... We also might want to test variable and non-variable fonts
as well
jpamental: This technology is what will enable variable fonts to work
chris: Being able to strip out unused axes might be helpful
KenADBE: What about font collections?
Garret: Yes, we should add .ttc files to this too
jpamental: From the web design world, most of my audience is western, but there is interest in what we're doing combined with variable fonts is through the roof.
rsheeter: We have a self interest
for this too.
... It can end up being a huge penalty to send a whole variable
font when only one point on the design space is used
jpamental: Variable fonts aren't actually that big
rsheeter: yes they are
<rsheeter> (at least some of them)
chris: If you've got variable
fonts, and it's CFF2, then it may be more or less
amenable
... I think CFF2 is more tractable for that approach
ned: It requires the same transformations, but the modifications ....
chris: It doesn't duplicate a bunch of stuff that's already in the other tables
ned: wellllllllll..... traditionally, type 1 multiple master fonts baked everything into the glyph description. And CFF 2 uses the same mechanisms, but it removes the data and puts it in a side table in CFF 2. It's a sidecar table within the CFF 2 SFNT. We would need to address that as well.
Garret: The google fonts collection is a good starting point, but we'll need more. We have holes in our collection
Vlad: Which fonts can be donated? We need some rare scripts, certain shaping types.... I heard that most of the elaborate and exotic fonts are present in the Noto fonts family. If we want to make sure it works everywhere, we can use Noto
Garret: We have basically no CFF
fonts.
... Our technology stack didn't support CFF until
recently
... Noto CJK is one of the first ones
myles: what about font files that are actually used on the web?
rsheeter: Yes, we should determine what are the top used fonts on the web
KenADBE: You would permission to use arbitrary fonts from the web.
myles: Even if we couldn't use the "real" fonts because of licensing, could we at least characterize the fonts that are used so we could try to make our own collection mirror the real collection
jpamental: Aspriationally, this solution will enable, or at least not restrict, people from designing more freely. I want people to be able to use more weights than just 2.
rsheeter: Yes, this shouldn't be too difficult to show.
<discussion about font licencing>
<KenADBE> In case I haven't shared it before, our Source Han Serif promotional website makes use of our dynamic augmentation → https://source.typekit.com/source-han-serif/
Garret: Is this the wya we want to do this analysis?
Vlad: this analysis covers everything that matters
chris: this is the right way
myles: We need to actually measure some of these scenarios, on real networks. If they match what the analysis produces, then that's good. if not, we need to go back and redesign something
Garret: rsheeter: yes, good
idea
... Practical concerns: 1) Repository location, and 2)
assigning owners
chris: W3C has github repos
Garret: programming
language?
... Python?
ned: Python already has a bunch of libraries for namipualtin font data
Garret: it's easy to run C code from python, so we can do that (to run the harfbuzz stuff)
ned: if there is code that is tested that is performance sensitive, we shouldn't use python. But if you just want a bunch of tools for examining font data, then Python is great.
Garret: For thing like "what code points does this font support" that's easy
ned: Do we want to write a new
toolkit?
... let's just use FontTools
... font production engineers use python.
<rsheeter> rod: +1 for this toolchain
rsheeter: What happens if we can't make the browsing data public?
myles: We just figure out a way to get other data
Garret: We can run the analysis on the internal data set and external one
chris: And they should match. If
they don't, we throw out the private data
... It would be nice if font foundries could run these tools on
their own font corpuses
Vlad: Most groups in the W3C
strive to operate in public. But lots of groups have some
private discussions
... If we cannot share our data set publicly, maybe that's
okay?
chris: Nowadays, that would
require extraordinary persuasive power to make a group that is
member-only
... In general, open publishing and verication is the way that
science is going. I would like us to proceed in that
manner.
rsheeter: So private data would be useful, but non-sufficient
ned: we have a tool that would be
run on a private data set. If that tool can be distributed and
run on any data set, private or public, then theoretically
there's no need for this group to publish results that rely on
a private data set
... if we can run that tool on a private and a public data set,
and show that there's no divergence, then that shows the public
data set is sufficient
KenADBE: we serve both open source fonts and proprietary fonts. The public data set would include those open source fonts. if we see the numbers are off for the proprietary fonts, then ....
ned: Then we need to see what the difference is between the proprieratry fonts and the public fonts.
chris: If both data sets get the
same results, that's good to know. Otherwise, we'd still like
to know
... The evaluation report is the justification for the work we
do here. We have to prove that our work is viable
Vlad: I'm trying to open up Monotype's font corpus for this analysis. But this will absolutely not be available publicly.
Garret: it's valuable for Monotype to measure their internal data set and share their results
jpamental: During this process, if everyone gets different results, we need to work more on the tool.
Garret: Anyone voluteering?
<all> ....
<all> .... lunchtime!
<Vlad> trackbot, start meeting
<trackbot> Meeting: Web Fonts Working Group Teleconference
<trackbot> Date: 03 September 2019
<scribe> ScribeNick: myles
Garret: as a follow-on to Myles's
analysis about glyph closures, I did something similar that is
more realistic that looks at text on real webpages.
... I simulated the glyph needed for sample page views, by
looking at both glyphs and glyph closures. For the layout
approach, we shape all the words in the text (using harfbuzz),
and collected the set of unique glyph IDs. For the closure
approach, computed the unique set of code points in the text,
and the size of the glyph closure
I grabbed a collection of wikipedia pages, and looked at 6 fonts.
Garret: This is a small sample, but it's still interesting.
The result of the analysis: The number of glyphs that are needed, and estimated the size of a font would be with the two different approaches.
Garret: So this is a mini
analysis
... For the layout approach, I estimated the font size by
taking the size of everything other than outlines, and added
only the outlines that were needed. For the glyph closure size,
we just count the glyph subset
... So we need 64 glyphs if we're shaping, and if we're looking
at closure, we need 90 glpyhs. All the non-glyph tables 46 kb
and the glyf table is 125 kb in the original font
... The font using the layout approach is 46kb + 9kb =
56kb
... The font using the glyph closure size, the glyph closure is
17kb, because we don't need the non-glyph tables. We subset all
the tanles
*all the tables
Garret: If we go to a second
page. Here, we need 8 more glyphs for both methods. This
increases both fonts by 1 kb
... Caveats: This is a very small data set, so take it as you
will. I used Wikipedia pages, which are not
representative.
... I only tested with a small handful of fonts
... I didn't consider font compression
... for languages, I tested English, Arabic (Persian), and
Indic (Devanagari)
... If we run through 6 page views, this is the number of
glyphs for each approach. The whole font file is significantly
larger than just the glyphs we need. The closure-based method
universally needs more code points than the glyph method, but
they were roughly the same. The worse case was 50% more
glyphs
... Then we looked at the final font size. For latin, the
closure approach produced a significantly smaller font file.
For Open Sans, <missed>
For almost all fonts, there's a reduction in font size
Garret: The closure size is
smaller, because the non-glyf tables were subset.
... e.g Roboto has a large GPOS table: 24KB.
... Even though we sent more glyphs, we overall won out because
the other tables were smaller.
... Arabic: Somewhat similar results. Definitely more glyphs
for the closure approach. But still a reduction in the final
font size.
... The final font sizes in Arabic are similar to the Latin
results
... In Amiri, GPOS is huge
... but we only pull out a fraction of it into the result
font
... lastly: Devanagari. But here, the glyph closure explodes
even higher. Approaches a 3x increase.
... This has the opposite behavior. The layout is universally
winning. This is likely because this font is glyph dominated
and the closure explodes more extremely.
... for Devanagari, a lot of the shaping rules are in the
shaping software, not the GPOS tables, which means the GPOS
tables are smaller than for, e.g. Arabic, where it's all in the
GPOS tables.
... These indic fonts benefit a lot from Compression. 600 KB
compresses to 150KB. Compression will play a big part in the
final story.
jpamental: This underscores the need for any viable solution to work with compression.
Garret: I suspect there are a lot
of similar glyph shapes or outline shapes in the font. Those
compress well
... Key things: 1) The good news: Both methods were better than
sending the whole font (even for Latin). 2) The question of
closure vs layout depends on the question of script.
myles: No CJK?
Garret: nope, CJK fonts are basically all glyph data, and usually very little layout rules, so both approaches will probably be close to a tie.
KenADBE: Except for pan-CJK
fonts, because they have extensive LOCL GSUB features.
... Also, some of the higher end Japanese fonts will have
significantly large GSUB features.
Garret: We don't have any fonts like that
KenADBE: I can't make it public,
it's proprietary
... I also made a font that has the same shaping rules as such
a font, but has blank glyph outlines. This makes it not
proprietary.
ned: We're talking about fonts where we wouldn't be using subroutines anyway, to build a font where each glyph contains the numeric represntation of the glyph ID. Then you could have the layout features that substitute for different glyphs,and the outlines would be different enough.
rsheeter: Or we could take the glyphs from another font
ned: yes, that sounds better
Garret: I can run some CJK
numbers
... another takeaway: There is some value in transfering tables
other than glyf. We should look into doing it. If we could work
that into the range approach, it would be beneficial. We would
have to find tables that dont participate in shaping but are
still per-glyph.
... This tells me a patch based approach, codepoints or glyphs,
could do better than a code points based approach. We need the
full analysis framework to draw some conclusions
... I have a spreadsheet of the raw data. If you want, I can
share it.
... Other interesting things: You can see how this behaves over
a sequence. So closure will usually grab almost everything up
front, and have small additional glyphs added. But the layout
approach takes less up front and requests less down the
line.
... Similar with file size
... I'll share this with the WG
myles: CJK fonts are large, so we need a solution for these big fonts. How do we know whether we need a single solution or two solutions, one for large fonts and one for medium fonts?
Garret: rsheeter: That will fall out of the performance analysis
ned: You mentioned roboto having a large GPOS table. Our system font's positioning information is 75% of the font. If you're talking about the limits of a font design, it is possible to have a latin font with a lot shaping rules
jpamental: It does bring up something. Unless we do the same analysis on something like proxima nova, or other really well developed professional fonts that are really have been worked on and have a ton of kerning pairs, there are going to be fonts not ....
rsheeter: I'd love to hear nominations for specific fonts.
jpamental: It was itneresting that different fonts have wildly differing glyph counts. Any font used for print that has been slaved on would be a candidate for having a ton of that information
myles: A solution that works really well for big fonts but not that well for small fonts may be more valuable than a solution that works really well for small fonts but not that well for big fonts
Vlad: 250 milliseconds spent to transfer subset or 300 milliseconds spent to transfer whole font. We have to include this in the analysis
myles: What I was trying to say is that we should be measuring absolute values like time and bytes, not percentages
jpamental: If other common web things like DNS prefetch and those kinds of things you can do to get assets to load faster, and the way HTTP2 works now. I wonder if that scenario has changed in what you were describing, Vlad
Vlad: I don't know. maybe?
Garret: network optimization in browsers are fairly complex. DNS prefetch prediction, etc.
jpamental: Looking at the work that harry roberts does, what he's laid out for good performance practices. I don't think people are following those for web font hosting. We should make sure that we're setting up a test that takes those things into account and make that part of a solution.
Vlad: my one purpose is we need to be careful to not see the results the way we want to see them.
jpamental: I'm thinking about making sure tha thowever we set up the test, we set it up using all of the current best possible practices, rather than dropping a line of CSS. If we want to see if this can be a high performance thing, where 100ms might be enough for someone to make a decision about this, let's make sure that we're testing a solution that uses all of the possible performacne tricks that a web developer could put into pracitce
Garret: We're not literally going to fire up network requests
myles: We should literally be firing up network requests
jpamental: If we're going to be testing this in different scenarios, we want to cover those things so that.... if what we're trying to do is improve the state of the art, we should set the bar at the state of the art.
Vlad: let's move on.
rsheeter: Let's talk about variable fonts
rsheeter: We took what variable
fonts we could get our hands on, and tested with a partial
instancer in font tools. We can see how much bigger a font gets
if we delete axes. We learned that adding axes is really
expensive. It varies by axis somewhat, but it can easily double
the size of your font by adding an axis.
... We should be contemplating the partial transfer of variable
font axes.
jpamental: You can't remove the axis without putting it somewhere on the axis.... you have to snapshot the axis.
Garret: If you want the default position, you get lucky, but we coudl pin it somewhere in the space, and then later during the enrichment just throw that away.
rsheeter: Vertical fonts could save considerable bytes
jpamental: which fonts?
rsheeter: <lists fonts>
Garret: Oswald is a variable font, and it's one of our top fonts.
myles: how commonly used are these fonts on the web?
<rsheeter> https://fonts.google.com/specimen/Oswald
jpamental: In my mind proves variable fonts worth in that scenario, where it can be a drop-in replacement for these other fonts.
Garret: Oswald is our 4th most popular font
<rsheeter> overall stats @ https://fonts.google.com/analytics (featuring Oswald!)
<Garret> https://github.com/google/fonts/tree/master/ofl/oswald
Vlad: A variable font can be used
in 2 cases: 1) the same font is used in multiple different
variation settings to make elaborate designs in a single page.
2) A font with optical size, where things may change if the
window layout change, or if you change orientation from a
portrait to landscape.
... If you serve the original default view, and no variation
changes ever involved, that seems a win. But if I do move to
landscape, that triggers variation changes. How will you
respond. Will you calculate a new instance to send?
rsheeter: We're assuming at this
point that the UA provides information to the server. The
client could send a request for multiple desired points in the
design space
... it's worth considering if we can progressively enhance
based on variable data.
Garret: We could talk about the design doc about patch'n'subset, or I prepared a talk about Harfbuzz
Vlad: the design document would
be good. I just want to make sure that we create a
timeline.
... for the byte range and font preprocessor. Do you have a
timeline, Myles?
Myles: I will match whatever Google does on the test harness
Garret: maybe a quarter or two?
Myles: March sounds probably okay
rsheeter: We need to know how to do compression with the byte range method.
myles: yes, that has to be done first. Maybe a monthish?
Garret: the work on the data set will happen in parallel with the work on the framework. The data set should be done before the implementation is finished.
<rsheeter> scribenick: rsheeter
<Garret> https://docs.google.com/document/d/1DJ6VkUEZS2kvYZemIoX4fjCgqFXAtlRjpMlkOvblSts/edit#
garret: a protocol for
simulation, not a final protocol. So uses shortcuts, ex
protobufs which we wouldn't use in a final protocol.
... imagining post requests because the body can be
substantial
... has fingerprints (hash) so we can confirm both sides are
operating on the same thing
myles: call it something else
(discussion suggesting hash or checksum)
vlad: font id!
garret: request has hash of original font, hash of current client font, sets of codepoints desired & needed, hash of alternate indexing for compression
myles: what are the hashes on first request?
garret: omitted (discussion of
proto optional fields)
... response has patch, codepoint remapping
(discussion of sending full codepoint set vs unicode-range to help client with font fallback)
garret: first request gives a
full font, subsequent ones give a patch onto a full font.
... discussed how we handle discrepancies between client and
server
myles: this seems complex, why would we get two independent implementations?
garret: don't think it's as bad as it looks
rod: compare to brotli, which is quite complex and has two implementations
andy: what's the point of the remapping
garret: shrinks the size of the
codepoint set
... also allows grouping for obfuscation
... recommended a specific fingerprint
myles: what happens when the server sends something unexpected?
garret: checksums allow
detection, client should essentially reset / start from
scratch
... appreciated if any missing error handling is suggested
myles: what happens when it fails again?
garret: should work
myles: eventually won't, will have to fail over
garret: also supports upgrade to the underlying font (optional)
myles: seems unnecessary
... they don't change very often
andy, garret: we change them from time to time
jason: very few people / sites would notice
rod: we (google fonts) probably change more than most and it's still relatively rare
jason: desirable to patch forward
garret: hopefully public implementation helps with complexity
myles: concerned about complexity
for UA
... has Chrome seen this?
garret: not *this* but we've
talked about PFE with them
... we can tradeoff complexity against efficiency to some
extent
(discussion of over-flexibility of all optional, is an aspect of simulation approach using protobufs not a final solution)
myles: this is a unique thing to be specifying
rod: what aspect?
myles: binary rpc protocol
(discussion of binary vs text in http2)
ned: woff2 is binary
myles: it's not communication
ned: concern is binary communication?
myles: just want to suggest it's complex
andy: we avoid complexity in
internal font handling in return
... can imagine extending have/want on codepoints to more
(discussion of glyph ids, codepoints, variable space, etc)
myles: dislike telling the server what you already have, request size grows
garret: set compression is meant
to alleviate this
... hitting 1 bit per codepoint, can reduce further via
blocking
... it actually declines as you get more codepoints
rod: tl;dr runs are cheap
myles: what's next
garret: I can talk about harfbuzz if of interest
rod: hb-subset of interest
because we believe it alleviates concerns with fonttools
subsetter speed
... public link to presentation?
garret: will send later
... subsetter takes codepoints, glyph ids, layout features, etc
as input. Reduces font to only what's requested to be
retained.
... glyph closure is a key step, walking through layout tables,
composite glyphs, etc to identify the complete set of glyph ids
we need to keep
myles: can it cycle?
ned: yes. but parsers guard against it.
garret: also cmap 14 can flip glyphs
ken: two types in that table
garret: GSUB!
... the most expensive table to process
... find all the reachable lookups, iteratively applied to
glyph set until glyph set stops change [or we hit a stop
limit]
... lookups can recurse / invoke each other, track through all
these
myles: can go forever?
garret: yes! but there are op limits in implementation
rod: fuzzer helps
garret: remap glyph ids, unless
specifically requested otherwise
... then we actually prune table by table, keeping only data
associated with glyph ids [usually] that are actually
retained
... some tables work as pairs, classic example is
loca/glyf
... other than layout (G*) it's usually simple
... G* tables are complex because it's a packing problem
... now we have all the subset tables, glue them together and
declare victory
... that's generic, what's unusual about HarfBuzz?
... typically library parses font into an abstract
representation
(which often means copying data around, reading the whole font, etc)
garret: harfbuzz mmaps and casts memory => data structures. No parsing, no memory for copies of structures. Because mmap'd we don't load parts of the font we don't use.
(so subsetting things like CJK fonts is VERY efficient)
garret: fun fact! magic endian
conversion types
... no need to decompile explicitly. Custom iterator (range in
proposed C++ std) gives view of subset data.
... FontTools subset cost is a function of the size of the
input. Harfbuzz it's a function of the size of the output.
(walkthrough example code for HDMX)
(magic of C++ pretending to by Python :)
garret: So, how does it
perform?
... Google Fonts uses this in production
... FontTools CJK fonts 400-600ms, hb-subset 0.8-1.2ms
... Noto Nastaliq [chosen for complex layout] 800ms with
FontTools, hb-subset 4.6ms
... So, how complete is hb-subset?
... non-layout tables done for truetype and cff
... GSUB/GPOS in progress
... Adobe working on variation subsetting
... live in prod at scale
... intended to be complete (all G* layout types) by end of
year
rod: in subset(before), subset(after), compute delta the delta is actually the slow part (surprisingly, at least to me)
jason: FontTools subset was breaking my italic axes two months ago
garret: bug please
rod: ideally a repro on an open source font
garret: but please report it regardless
ken: I have a font where 99% of the font is cmap 14 subtable
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Succeeded: s/subject:/topic:/g Succeeded: s/rsheeter/garret/ Present: Garret rsheeter chris KenADBE myles Jason Ned Vlad Found ScribeNick: myles Found ScribeNick: rsheeter Found ScribeNick: myles Found ScribeNick: myles Found ScribeNick: rsheeter Inferring Scribes: myles, rsheeter Scribes: myles, rsheeter ScribeNicks: myles, rsheeter Found Date: 03 Sep 2019 People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]