W3C

- DRAFT -

Web Fonts Working Group Teleconference

03 Feb 2020

Attendees

Present
Vlad, Garret, Myles, jpamental, Persa, ChrisL, Ned
Regrets
Chair
SV_MEETING_CHAIR
Scribe
Garret

Contents


Vlad: first three agenda item's are Myle's

<jpamental> https://www.w3.org/Fonts/WG/track/actions/open

Myles: 1'st and 3rd haven't started on and the 2nd I have a presentation for
... going to talk about the efforts to sort files for CJK content
... need a fitness function. Grades an ordering.
... used average percentage of file skipped when loading a document.
... gives one number per document. Average this for all documents. Want this value to be higher.
... previously gathered documents using a crawler (no sequences just documents).
... gathered 100,000 document. Turns out this was too many.
... so used a sample of 1000 documents for most of the work.
... documents are independent. Implemented it on the GPU using metal.
... will rewrite on opencl or vulcan if it's needed since metal is only available on mac.
... more accurate then previous effort, correctly counting the # of bytes per glyph.
... trying to figuring the input that produces the highest output
... very large number of inputs.

10^27754

Myles: how many characters are ever used. Turns out to be about 4000 glyphs.
... no point trying to sort glyphs I don't have info for.
... that brings the number of orderings to a smaller number, but still too big.
... six different strategies.
... hyperparameter optimization.
... the red dotted line is the true function we want to represent.
... every glyph has a position and can be moved. So it's position get's smaller or bigger. The position is somewhat indepedent. Consider the position as a floating point.
... what this means is that we have a 400 dimensional search space.
... hyper param opt is a way of intelligently picking parts of the space that are potentiall good.
... about trying to find a good place to do the next sample.
... has some intelligence about that. As you add more samples the uncertainty gets smaller.
... used a python library to do it.
... when we've picked a point to evaluate the cost to evaluate it is 7 seconds.
... constructing the search space in 4000 dimensions takes a long time.
... not a good fit for the problem since it's too slow.
... want to be able to run it as fast as possible. 100 or thousands of times.
... word vectors. Open available data. Has vectors for many words. Can be added.
... they use a 300 dimension space.
... the position is semantically meaningful
... want to collapse to a 1d space. Glyphs that are close are close in the space.
... doing a matrix multiplication.
... 4000 chars x 300 vector entries
... multiply by the green matrix to multiply and get the orange matrix
... green is 300x1, much smaller.
... can use the same approach but now the search space is much smaller
... now at 1.5 it/sec
... still too slowl
... so didn't get any good answers.
... gradient descent. If you have a topological map you can find the pits by always moving in the downward direction.
... in order to calculate the steepest slope you calculate the gradient vector.
... the number of items in this vector is the # of items in the search space.
... each value is a partial derivative. So you evaluate the function twice in a small window
... that's invocations of the fitness function. Need to evaluate 8000 invocations at each point.
... still too slowl
... genetic algorithm. Start with a population of points to evaliuate.
... the idea is. If one of the points has properties that are good. In the next generation you want to spread those out.
... so you select solutions to mate and they generate offspring. You select base on the performance of the fitness function. The qualities of the high performers will spread.
... so I implemented this. Didn't work very well. The reason: if you pick a random ordering and score it. If you pick another and score it you get another value. You get collection of random fitnesses.
... all of the random fitnesses are nearly identical.
... so this means when you mate your first generation you haven't spread any high performers since there are none.
... when you mate different orderings you can't have the same glyph at the same position.
... so this means when you mate them the output is pretty random.
... so even if I started with some high performers because the way the mating algorithm works, the successive generations devolved.
... so this didn't work either.
... simulted annealing.
... if you're making ametal and the molecules are moving fast. If you cool it rapidly you freeze things where they are and they get stuck spread out.

If you cool slowly they settle in a more dense configuration.

Myles: so you start with a particular ordering. Pertube it in a random direction.
... over time you move less and less. Move to the new location if it's better based on some probablility.
... if the new result is better you want to move there, if it's worse probably not move there but in proportion to how bad it is.
... can tune this based on what gives us the best result.
... over time the slope changes.
... next part is what type of pertubation can be made.
... ran a bunch of experiments. Higher is better. If you use swapping that works better than rotate and reverse.

Myle: rotate - take a sub range and rotate.

Myles: reverse similar.
... here are the initial results
... fitness goes up fast and then more slowly over time.
... ran 25 times, graph has them all stacked. All start at the same point roughly.
... the interesting fact is that all the lines stay on top of each other through the duration.
... for the amount that I ran doesn't matter where you start.
... the y axis the fitness are getting significantly better.
... if you look at the end the results are still going up. So if we run it longer we can find something even better. Didn't have time to run it for very long.
... best result I've been able to get is 78%
... final finding of this presentation
... able to skip 78% of a particular font file. That's pretty good.
... not done yet.
... interested to see what happens if you run for long enough that it plateaus. Interested if there are different plateaus
... if the all end up at the same place then we can start any where
... if they get stuck in different places, we may need multiple trials to find a good answer.
... another interesting thing is once I get to the local maxima want to see if the orderings are similar.
... if they all end up in the same place then it's likely a global maximum
... haven't finished work on the particle system.
... all the work I've done today is on a single font file. Hoping that new fonts wont change the nubmers much.
... haven't looked at compression much yet.
... want to contribute the work I've done to the repo.
... whole lot of tries that didn't work and one that looks promosing.

Vlad: would it be possible to create a most optimal ordering that can be shared across many fonts. Meaning we can produce a recomendation to order in a specific way.

Myles: that's a good idea, should pursue doing that. Glyph id's not meaningful outside of a single font. Would need to do an ordering of unicode code points.
... in a font where glyphs and codepoints match 1:1. In a font with tons of shaping this wont work well. CJK font's tend not to have a lot.
... so that could work well.
... but may not be applicable to all fonts.

Vlad: start with a document know what the content is, know how to optimally resort the font.

Myles: yeah.

Vlad: Myles, do you have an estimate for a realistic due date for the next iteration on these open action items?

Myles: can I get back to you on that.

Vlad: the charter says we should have eval results by Q3 this year.

<myles> Garret: A lot happened

<myles> Garret: I'll go through my notes

<myles> Garret: for the analysis framework, I've been contributing a bunch of code.

<myles> Garret: I've implemented codepoint remapping. For patch-n-subset, it's helpful if we reorder the indices used for the code points, because that helps compress the size of the requests and responses for the codepoints. I implemented a basic one that collapses (41, 42, 43) to (0, 1, 2)

<myles> Garret: I have PRs, they aren't merged. The test I ran with that is pretty promising. I have more work to do more advanced ordering (frequency)

<myles> Garret: I won't go as far as Myles because it's less important

<myles> Garret: I modified the cost function. Previously it was exponential that rampted to infinity. This was too extreme and started overflowing integers. For most browsers, if a font takes more than 3s to load, the browser will just give up. Now I'm modelling the cost function as a sigmoid (like an "S"). That works better for the analysis.

<myles> Garret: I got approval to share metrics from Chrome on typical connections and RTTs

<myles> Garret: They are all in the repo now. We are using that in the analysis. Those are measured from real clients.

<myles> Garret: I added parallelization support for the analysis. It's not using GPUs but it is spread across multiple cores

<myles> Garret: The analysis is now at the point where you can run it end-to-end. I ran it with some preliminary data that we gathered internally and gotten some results.

<myles> Garret: We're not ready to share the numbers (the data is preliminary)

<myles> Garret: The good news is that the incremental transfer approach will beat unicode-range. So that's promising.

<myles> Garret: For Harfbuzz, we got color subsetting support in Harfbuzz, so we can do incremental for icon fonts or emoji fonts

<myles> Vlad: What about SVG and CPAL?

<myles> Garret: I didn't work on it. The person working on it said he wasn't going to do SVG and CPA>

<myles> CPAL

<myles> Garret: SVG interacts with CPAL so they have to be done together. Don't want to do one without the other

<myles> Garret: Next, i'll work on improving the tools for the analysis, to help you parse the results that come out of the end. How to get a few different kinds of metrics like cost function values, request / response sizes, etc.

<myles> Garret: codepoint blocking, codepoint remapping

<myles> Garret: Some work on improving the unicode-range simulation (now it's not including the cost of CSS which is a significant factor on CJK fonts)

<myles> Myles: BTW, CSSWG just resolved to add keywords into the unicode-range descriptor

<myles> chris: The keywords are ISO script codes

<myles> Garret: that's interesting

<myles> jpamental: Is that like "latin 1 extended" or something else?

<myles> chris: I think so.

<myles> Garret: Will keep an eye on it.

<myles> Garret: for my open action items, I have "work on analysis framework" is continuing, not done. Implement patch-n-subset, that's not fully done yet. I can get back to you with a good date for when those will be done

<myles> Vlad: Brief update: I mentioned already we have these special account on Monotype's systems for WebFonts WG. That account is not connected to real data yet. My goal is to identify two classes of fonts that we should use. One is CJK fonts that have a highly-used composite features, and the other is anything that has other features.

Vlad: working on identifying to types of fonts, CJK fonts with heavy use of layout, and fonts with interesting set up of features.

<myles> Vlad: My colleagues need to help me figure out what those fonts are so we can pull them from the database into the account

<myles> Vlad: In ISO, there has been a change in the communication venue for this communication group. We used to use Yahoo groups. But Yahoo recently made those groups useless for us.

<myles> Vlad: We have an alternative that's set up by [missed] university. We are moving there.

<myles> Vlad: It allows us the same functionality. There's an archive. We were more-or-less able to transfer the Yahoo emails over to the new system.

<myles> Vlad: I will send additional information to this group's email list.

<myles> Vlad: This is ISO-specific, but any changes made in one migrate to the other [myles assumes he's talking about the Microsoft OpenType spec]

<myles> Garret: When should we meet next?

<myles> Vlad: Can we set the new expected dates on the action items before deciding?

<myles> Garret: ok

<myles> Vlad: If we are comfortable with doing weekly calls, I'm fine with that

<myles> Garret: let's wait and see

<myles> myles: let's wait until we have deadlines

<myles> Vlad: okay

<PersaZula> myles - awesome presentation

<PersaZula> and thank you to Garret as well

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes manually created (not a transcript), formatted by David Booth's scribe.perl version 1.154 (CVS log)
$Date: 2020/02/03 18:47:28 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.154  of Date: 2018/09/25 16:35:56  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00)

Default Present: Vlad, Garret, Myles, jpamental, Persa, ChrisL, Ned
Present: Vlad Garret Myles jpamental Persa ChrisL Ned
No ScribeNick specified.  Guessing ScribeNick: Garret
Inferring Scribes: Garret

WARNING: No "Topic:" lines found.


WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Found Date: 03 Feb 2020
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.


WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report


WARNING: IRC log location not specified!  (You can ignore this 
warning if you do not want the generated minutes to contain 
a link to the original IRC log.)


[End of scribe.perl diagnostic output]