W3C

- DRAFT -

Web Fonts Working Group Teleconference

14 Feb 2019

Attendees

Present
chris, sergeym, myles, jpamental, Garret_Rieger, Vlad, ChristopherChapman, jfkthame, KenLundeADBE
Regrets
Rod, Sheeter
Chair
Vlad
Scribe
myles

Contents


has anyone volunteered to be scribe?

<scribe> ScribeNick: myles

The optimized version of one was using dynamic font subset using containing only those characters that were present int eh text bubbles. The third one was the same subset embedded inline in the CSS file.

Vlad: As you can see from those slides, they decreased data transfer time, and there was a huge benefit by doing this. The end result wasn't that exciting because wehn you compare against entire page load time, you don't always have the same level of benefits whe you do something clever
... Those numbers would be useful for our discussions
... We're ready to start
... As you've seen through the email list, we had an interesting discussion that was started as a set of questions
... Those questions were answered with additional details provided.
... I'd like to make two clarifications. Regarding shaping breaks between two adjacent characters: 1) When the static subsetting is done bad poorly, and you lose data that should work but it's broken. If two characters are in the same subset, then there's no reason to consider that problem. However, when we consider characters that were providing in two different subsets, then that's important

myles: you're right

Vlad: Then, yes, then font breaks are considered separate text spans, because shapign can't work between fonts.
... Jumping ahead, that's why I like the solution that was prototyped by Garret and Rod. When they produce an incremental subset, the reality is that they produce a larger subset which uses incremental data transfer that utilizes delta. We don't repeat the same data. The result is a larger subset, not just a new subset
... That's why I would like to have some data on the Adobe solution. Up until now we didn't have any insight into what has been done
... Gentle reminder to Ken and <inaudible> to consider it.

<KenLundeADBE> Persa

<ChristopherChapman> Gentle reminder to Vlad that Persa & Bram are tied up until the end of February.

myles: For large fonts, the data seems to show that static subsetting seems to work pretty well.

<Vlad> Thank you :)

Garret_Rieger: Yes, that's what we've seen. It breaks some lesser-used features, but in general it works pretty well. What we'd be hoping for in a solution is something where all layout works no matter what. That's important for us.

myles: can you characterize the breakage?

Garret_Rieger: In our analysis, if the frequency of pairs of shaped characters was below some threshold, we allowed them to break. There were things below the threshold.
... I can see if we can get a list of example features that would have been broken.
... there's no way to divide Roboto and keep features together. There's no divisible units. Kerning is responsible for this for Latin fonts
... Roboto supports latin, greek, cyrillic. And latin extended
... Possibly Vietnamese as well

jfkthame: You'd have problems as soon as the font does dynamic accent positioning.

Garret_Rieger: We split latin and cyrillic, but this means you lose kerning between latin and cyrillic
... Streaming is a good solution for Roboto

<Vlad> * Note to all - please record your attendance on IRC

Garret_Rieger: Our current solultion for CJK isn't optimial. There's lots of room for improvement. A streaming approach would outperform what we're doing today. I'd love to do an analysis.

<jfkthame> (was late, sorry)

chris: This is a major area where p[eople aren't using webfonts today because it's a nonstarter. It needs to be investigated

Garret_Rieger: We used tradeoffs when we investigaed it. We allowed ourselves to not have perfect rendering of the original fonts

jpamental: I wonder if there's a progressive approach where a subset is created but if there's somethign in the page that's getting broken, then that could trigger a subsequent download of more stuff

Garret_Rieger: Our proposed approach works well for this. We want the server to send back more than what the client asked for.
... Today, if you saw rare characters, yo uneed the entire extended subset.

myles: the browser can make the same inference, right

Garret_Rieger: yes
... we like it on the server beacuse it allows us to change those decisions more quickly.
... we've had success with that approach before

Vlad: the added benefit is that different font providers might want different strategies

myles: we need to do analysis to determien whether there is one true solution or whether servers would need to experiment

Vlad: let's say you have a client with some needs and the server might want to send additional characters, too. One privider might want to always extend the required subset and another might not.

myles: good performance should be on by default. if there's a single good solution, it should be deployed everywehre

Vlad: What if one provider wants some smarts?

jpamental: Isn't that counter to our mission it he W3C?

chris: one implementation int he initial phase is a good idea, and allowing for quality of implementation is good, there are tradeoffs between compression quality vs CPU usage. For the W3C, two implementations need to be compatible

Vlad: Extending the data that the client is asking for, that should be free for the font provider to make a decision

<jpamental> I'm sorry - have to drop off. Thanks all - looking forward to reading the notes

Garret_Rieger: A new soultino could work liek subsetting or a new thing depending on the situation. In places where subsetting works great today, we can keep doing statis subsets. In other situations, we can do incremental buildups

myles: we should make our decision informed by data.

Garret_Rieger: yes
... I'd love to see comaprisons between the different approaches.

Vlad: If you look at the slides I shared yesterday, the performance data that was measured, that is the most useful part is what it shows. When you ask for a resource that you need, The total tiem from the moment you send the request tothe end isn't just determined by data size, there are other factors, that can take an order of magnitude. Even if you ahve the most optimized subset, you'll still be waiting for 200ms for the first byte to come. That was

the reason why a solution that may be suboptimal that includes more characters that prevents one extra download is preferred.

Garret_Rieger: This speask to the importance to whether we are supporting cross-domain caching. The best optimization we've done to date is cross-domain caching.

Vlad: my intention wasn't to split fonts into three sets. I was using two different graphs as two projections into a 3D space. If you visualize the space, You have two sets of graphs that let us do something liek that. In both cases, the results are comparable to these graphs.
... Any optimization that we do on a complex data set that doesn't provide clear benefit that doesn't provide a data size benefit would have dimishing returns
... Back in time when monotype did dynamic subsetting, we looked at performance data. We realized that doing dynamic subsetting may not always make sense. It required time to make it available on the client side. 180 / 350 ms is just waiting. A dynamic subsetting solution, you wait almost the same amount. You save a lot of time on data transfer, but the waiting tiem isn't that much different.
... In the long run it doesn't make too much of a difference.
... we need to be conscious about this. What we want to do and actual perf improvement may not be in direct relation.
... We may sometimes decide that we want to do significant effort. If we come up with a dedicated solution for multi-script and complex shaping rules, that would be a much bigger effort for performance improvements, the performance improvemetns are uncertain.
... So, if we focus on fonts that have significant data in them (large fonts) and optimize data sizes, and doesn't break scripts, that will give us ore perf

myles: I agree

Vlad: There's a threshold below, wehre doing anythign might not help

myles: what do other people think?

ChristopherChapman: From our side, CJK it made a big difference. For smaller scripts, it didn't make sense to subset.
... Consistent with this finding.

Vlad: We need to consider client side complexity for any decision we make.
... For example, I don't know how Adobe implemented their dynamic augmentation solution, but I would speculate that if you compare two different solutions and one of them depends on smarts to be on the server and the client is simple, you get the data transferred to you and use what you get, that's one thing, and if you have a different solution wehre the data transferred to the client woudl have to be processed bfore you maek use of it, that

additional complexity ont he client may be a big factor to consider. Merging two subsets into a single one that can be used that is complex. Asking clients to do it migth be hard.

myles: We should talk to implentors when thinking about what is implementable.

Vlad: Yes, we should investigate all solutions

ChristopherChapman: At adobe we would like to keep the client side simple.

chris: It's easier to upgrade server infrastructure. Server upgrades are a path to success.

Vlad: Based on what I hear up until now, we are converging on a solution that might serve us well

myles: Are we in agreement on focusing on big fonts, are we in agreement?

Garret_Rieger: Yes.

Vlad: Yes, with a caveat. Whatever solution we have should not break things. It might not be optimal, it may produce more than is necessary, but as long as it's not breaking shaping, it's fine.

Garret_Rieger: Medium-sized fonts are probably also worth looking into. Roboto has a few thousand code points.

KenLundeADBE: Korean are medium too

Garret_Rieger: Yes, or large.

KenLundeADBE: Most Korean have 3000+ characters. That's medium, not large.

myles: how good is static subsetting? Maybe we're done

Garret_Rieger: We break shaping.
... there's no good solution that doesn't break layout.

chris: Our charter says that things that break layout aren't good solutions

Vlad: When you're talking about breaking layout, are you talking about an approach or implementation?

myles: approach

Garret_Rieger: approach
... We've found it's impossible to subset Roboto without breaking layout. This is because of kerning.
... punctuation characters are in Latin. If there is layout between Latin and the other scripts, that won't work.

myles: Could Google Fonts put together a list of font candidates?

Garret_Rieger: yes
... all our font sare on github
... 1000 families on GitHub
... They're already .ttfs

chris: Can you do this analysis on commercial fonts?

<Garret_Rieger> https://github.com/google/fonts

myles: All Swift, all the time
... If people use macs, anyone can run it

<ChristopherChapman> https://www.google.com/get/noto/

ChristopherChapman: What about noto? ^^^

myles: Why was Noto designed to be so large?

Garret_Rieger: Noto as it exists today is a series of font files for each script. We want to present to users as a single family. They may have text in any language showing up unpredicatably. They want a single Noto family that would just work. Incremental would be beneficial

<ChristopherChapman> Also, here's the GitHub link for Noto: https://github.com/googlei18n/noto-fonts

Garret_Rieger: the file per script approach is difficult for indic, beceause there's shared codepoints in many indic languages
... Shared codepoints between fonts confused browsers

myles: How do other fonts solve that last problem?

Garret_Rieger: put all indic in a single file
... it could be too big

ned: The core characters for indic tend to be in the same block, but there are some exceptions. There is another block that can be used by a few scripts. On a standalone basis, each font handles just the characters it supports in shaping. That breaks down

sergeym: If you want to find a solution for browser and CSS, for downloaded fonts, right

ned: On Windows, it's handled by a UI font that covers all the indic scripts

sergeym: I dont' knwo why it's in a single font. Maybe 5 years ago we had different fonts for each indic scripts

ned: yes

Vlad: we're about done.
... but not completely done
... I would liek to have this subject covered in full before we transition into any next step that involves discussing possible solutions, or finalizing or converging on a solution. With that in mind, I'm interested in the information ChristopherChapman mentioend in IRC. Two main contributors are tied up until the end of Feb. What should we do?
... We should have the investigation done before we pick a solution.

myles: there's not much I can do without talking to the group now

Vlad: Gathering data is the most important step
... We're still trying to do that.

myles: We should investigate how text is used on the web

Vlad: We should collect language-specific data on character frequencies. If we want to consider shaping rules, then this is orthogonal and script-specific
... If we can collectthis data it would be informative
... let's do this over the next few weeks while waiting for additional information taht we need to finish this discussion

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2019/02/14 18:04:11 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Default Present: chris, sergeym, myles, jpamental, Garret_Rieger, Vlad, ChristopherChapman, jfkthame, KenLundeADBE
Present: chris sergeym myles jpamental Garret_Rieger Vlad ChristopherChapman jfkthame KenLundeADBE
Regrets: Rod Sheeter
Found ScribeNick: myles
Inferring Scribes: myles

WARNING: No "Topic:" lines found.

Found Date: 14 Feb 2019
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report


WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]