How do I set myself as scribe?
<myles> ScribeNick: Persa_Zula
scribe Persa_Zula
<chris> scribenick Persa_Zula
<scribe> ScribeNick: Persa_Zula
I need a scribe cheapsheat
Stateless vs Stateful - will update by F2F
https://www.w3.org/Fonts/WG/track/actions/208 - will be updated by F2F
Overview of https://www.w3.org/Fonts/WG/track/actions/209
myles: There's two approaches
that group has been approaching. Not talking smart server vs
range requests.
... whether the client should be asking for codepoints from the
server or glyphs from the server
... if the client asks for codepoints, server has to reply with
every set of glyphs it can possibly reference. otoh, if the
client req. glyphs, client needs to konw up front the
relationships. it has to know the shaping info before it gets
any glyphs at all
... it runs shaping locally without any glyphs, and as a
followup sends another request for glyphs
... analysis is not specific to a format - weighing the upfront
costs vs the cost spread over every request
... the relationship between the size of the outline & the
size between everythign else. for large fonts, outlines are
almost all of the font, and everything else is just a few % of
the font file. for the glyph base approach there is one round
trip but it's small compared to the rest of the size of the
font file itself
... the other side, for real fonts: if a client requests a set
of codepoints, how many actual glyphs will there be that would
have to get sent back over the wire. interested in the metric
of comparing the total number of glphs sent across the file
compared to glyphs requested. only glyphs needed are requested.
in the first model, if you need 5 glyphs and the server might
give 100.
chris: how would this work with variable fonts
myles: work is orthagonal to variable fonts. only interested in shaping information
Vlad: if there is a font with
diff. stylistic sets definied, you'd have to include all of
them. alt. you'd need css data to find out which styleset
... you might also need features applied.
... in arabic, you have up to 4 glyphs for each character
depending on the location in the word
myles: results - most of the
extra 95 glyphs are not due to optional features - turns out
that due to required language shaping rules
... relationship between these two numbers. example font. if we
req. glyphs 0 - 5, how many end up in closure. if we req. GID
0-7, how many end up in closure? do it until we each end of
file
... is the function linear? the two options work similarly. if
the function scales super lineraly, the codepoint-based
approach doesn't work very well.
... for a paricular font, you can see how superlinear the
result is. did this for every font on a Windows machine. and
did it for every font in the GFonts corpus.
... in the first email, first graph - every line represents a
font. higher a line gets, the worse that font would work for
codepoint approach
... next graph - for every font, we can associate the area
under the curve - higher the area under the curve is worse for
the codepoint approach
... table for the worst offenders
... in the google fonts corpus - similar results. except that
the graph looks worse. the very last graph - the higher the
dots are, the worse the codepoint approach is. many complex
fonts don't work well for this codepoint approach
... overall result of this is that we can choose either system.
but the codepoint based approach is really terrible for some
languages. glyph approach has an equal cost for every language
- and the cost turns out to be small.
... comparing a system that falls down on some langaguges vs a
approach that has an equal cost for all languages
Garret: the glyph approach might
blow up as well because it could request almost all of the font
at the beginning. we need to analyze sequence of the
codepoints. can't answer this until we have a browsing data
set.
... it's premature to assume that it's unworkable to use
codepoints
<Vlad> https://docs.google.com/document/d/1kx62tpy5hGIbHh6tHMAryon9Sgye--W_IsHTeCMlmEo/edit?usp=sharing
Vlad: we have 3 approaches to
consider
... 1. patch based approach. 2. byte range approach suggested
by myles - dependant on a standalone tool. 3. hybrid approach.
using patch update mechanism so it doesn't need font data
optimized but still has resembalnce to glyph based approach
with byte ranges consolidated in a single pathc
... we need to consider if the original patch based approach
can satisfy the original request. the approach the myles
suggested might require more than one request for the first
page render
... in order to be able to evaluate in real life, our framework
has to account for this, and having real life data that we can
use to mimic what we see out there
Garret: working on getting a
dataset, might have info by the F2F
... will add hybrid approach to the doc and explaination on how
it would work
... working on a prototype for the patch based approach in time
for the F2F
myles: needs a subsetter that isn't super slow if am to continue experiments
Garret: harfbuzz is under development, 300x faster than fonttools for this
Vlad: framework analysis - does the cost function take into account the subsetter speed
Garret: the cost to calculate the
patch vs creating the subset. computing the patch is
higher
... ft parses the entire font in the object tree. HB generates
almost no memory aloocations when working with the
font
<scribe> ACTION: Garret to include cost of subsetting in the model
<trackbot> Created ACTION-210 - Include cost of subsetting in the model [on Garret Rieger - due 2019-08-26].
Garret: Updates on codepoint
compression. Tried new methods in the 2nd round.
... Arithmatic coding (entropy coding) with the codepoint
sets.
... Codepoint remapping. Use codepoint values and remap them to
a continuous space. Points are much closer together than when
using unicode values
... also reordered based on frequency. effort to get things to
cluster together to use ranges
... used this on the top techniques that came from the first
round and re-ran and the results look very good
... originally 4 bits per codepoint, now for CJK 1 bit per
codepoint. very promising. can efficiently transfer things back
and forth betwenn client and server
... can be used for glyph sets just the same - just encoding
integers.
myles: mapping is done per font ?
Garret: yes. first response from server sends a guide for client on how to remap. the frequency data can be ignored if needed.
myles: first request is special. should the first reponse include shaping info?
Garret: smart server approach
allows us to send whatever we want back and forth. hybrid
approaach -> 1st request, codepoints. get shaping info, then
talk glyphs going forward
... only incur extra penalty on the first request
... hopefully we can build this into the protocol. for a CJK
maybe you always talk codepoints. Maybe in Arabic you talk
glyphs instead. You can be flexible and do what's best for a
given font
myles: is there ever a situation
where the codepoints will be better than the glyphs?
... would requests 2-n ever want to request codepoints instead
of glyphs?
Garret: for latin fonts, the
non-glyph data could be bigger. if using codepoints for latin
we wouldn't have to send everything up front
... we need to look at it with real world data
Vlad: including codepoints in the first request in the hybrid approach is a good twist. from the performance - you avoid costs assoc. with calculating follow up glyph closures
Garret: not that subseting will add too much cost, but it can have the benefit of keeping it simpler
Vlad: if we just update one specific big table in the binary font with everything else being the same, would the binary patching be smaller
Garret: don't know internals of
brotoli but we should explore it
... we should include all of this in analysis
... close to being done with codepoint compression. close to
where we'll be able to get for now.
... one last thing we can do to shrink the cost of encoding
sizes - is to block up the codepoints or glyphs with one value.
justifaction there in doing unicode range for CJK. the tail for
CJK is infrequently used. can apply to glyphs or codepoint
Vlad: almost 2 weeks from day of
F2F. thanks, Jason, for venue info.
... hard stop at 5pm
jpamental: The one thing we have true for certain, the hard stop. That will give us the option to continue at dinner.
Vlad: dinner probably won't be meaningful technical discussion
myles: We need to gather an agenda for the timeslot
Vlad: Ambitious wish for the F2F. Want to end up with a defined schedule and timeline for what has to happen next. By F2F we need to finish everything we said we want to finish. For day of discussion is for you to submit Agenda+ requests and a time period you need
John_Hudson: will there be dial-in?
jpamental: we should be able to set something up
Vlad: can set up a zoom meeting - same capability now for the whole day
Vlad: not hard set on Monday
meeting, if the discussion won't be beneficial. let's prepare
for F2F this week. If anything else arises we can discuss next
week, otherwise see you at F2F
... Jason, please keep us posted on F2F details as you find out
more
jpamental: I will own that and make sure folks have all info they need for access to venue
This is scribe.perl Revision: 1.154 of Date: 2018/09/25 16:35:56 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: Irssi_ISO8601_Log_Text_Format (score 1.00) Present: KenADBE Vlad chris myles Persa_Zula jpamental Garret Found ScribeNick: Persa_Zula Found ScribeNick: Persa_Zula Inferring Scribes: Persa_Zula WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Found Date: 19 Aug 2019 People with action items: garret WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: IRC log location not specified! (You can ignore this warning if you do not want the generated minutes to contain a link to the original IRC log.)[End of scribe.perl diagnostic output]