See MSE, DataCue and TextTrackCue slides
cyril: There have been a few
discussions this week on DataCue, TextTrackCue, I put together
a few slides to summarize them.
... DataCue, goal is to expose in-band data to
applications.
... The major use case is emsg
cpn: I would say it's not only
about in-band events.
... For user agent that feature native support of HLS or DASH,
events could be in manifest.
... We also want to be able to support arbitrary objects that
applications may want to synchronize.
cyril: DataCue essentially flows
from the media to the application.
... TextTrackCue flows the other way around.
... You let the browser do the synchronization, where the cue
should be displayed, but the application prepares the
cue.
... And then MSE for TextTrack is enabling end-to-end
synchronized processing and rendering, from the container to
the display.
... Browser vendors do not like additional parsing, which may
be an issue here.
... [showing diagram of MSE/EME pipeline]
cpn: Different points of view of
how much parsing would be done by the user agent.
... For certain well-known events formats, the user agent could
expose a structured object, or just the raw thing.
cyril: TextTrack for MSE would hand the parsing to the JS but the user agent would still have the cues and handle the synchronization. We don't even know the time to the JS app, because it's the same time when the event is handed over to the app and when it comes back for rendering.
glenn: The call to the app would be synchronous?
cyril: Either way.
glenn: Asynchronous may be simpler, you'd need a handle.
cyril: The whole thing seems similar to WebCodecs.
padenot: In fact, that's the
exact opposite.
... Things are still fuzzy though
... We did some experiments, that went wall.
cyril: Yes, you don't depend on "time marches on"
jer: In a way, for custom
parsing, the user agent could just produce a DataCue and the JS
app would create theright cue that gets feeded back into the
rendering
... Strawman: push data in MSE, get metadata samples exposed as
DataCue, without needing to expose any specific interfaces to
define.
... I'm just throwing that out as an alternative.
[discussion on putting decryption out of the picture since events are not encrypted]
jer: In summary, mechanim to add custom support for currently unsupported timed events?
cyril: Yes
padenot: Metadata have been
problems for years. People routinely demux things in JS, e.g.
to get ID3 out of it. Easy for MP3s.
... The load is not really complex in that case.
jer: It does require specific knowledge about timed events formats.
padenot: Yes, and the UA already parses the data, so double-parsing happens here.
cyril: Question is what's next?
cpn: We have a DataCue repo in
the WICG. This would be great input there.
... In the IG, we ran use cases and requirements. This proposal
gives us much more what we're looking for.
... The ability to get events ahead of time is baked in.
Separated from the triggering of the cue.
... I'm suggesting you use the explainer in the repo to iterate
on this design and shape the API.
andreas: I think it could be one option. Your slides show that everything is connected. Possibly something that needs to be discussed together. I'm just worried that if it's just in the DataCue repository, it might be limited to the topic.
nigel: What's needed there is
that the architectural components need to be separated.
... I have a similar point is that what would be a real shame
is that, if we did all of this and didn't solve the
synchronization aspects.
... One thing I'm conscious of is that the rendering side for
audio sends samples to your digital audio converter. The
rendering for video puts pixels in your video buffer. The
rendering model for TextTrack is parse JSON, create DOM
fragments, apply styles, seem to take longer. I'm wondering if
we need to do that earlier to prepare things in advance.
David: Isn't that a bit tricky?
cyril: If the only place that is allowed to change the CSS comes from the in-band data.
jer: There's always a validation
and caching problem.
... The web browsers are meant to render things very
quickly.
... I don't know what the requirements are.
nigel: We've discussed threshold.
We sort of ended up with 20ms.
... The metric is to measure when the text is there on the
screen.
jer: One of the problems we have
is JS.
... One of the points is that TextTrackCue v2 absorbs JS to
process the cue.
cpn: It's not only for text track cue placement, so need a general solution
jer: We do as much as we can to
keep things out of JS for this reason, not to have to get back
to the main thread.
... That said, I ran an experiment. The average latency is 4ms
on main browsers between when the event is triggered and when
it's handled by JS.
... So things may already have been addressed.
wolenetz_: Couple of questions. Trying to get out reliable synchronization between in-band events to what? JS?
nigel: The strong sync
requirement is related to output.
... There's good sync for handling of the input.
... The metric we need is for the output.
... Changing display of a subtitle caption.
... Or real-time sync with audio handled with Web Audio
API.
cpn: There's also the ad-insertion case where the event triggers a switch in the video.
jer: It's not the time required to parse the metadata. It's more about display and rendering.
wolenetz_: If I understand correctly, to get a strong sync, need to offload the custom processing of data cues to JS on a separate threads, or media types that the MSE parser should understand. There's a large gap between the two.
jer: At the MSE level, sync issue seems not as much important as rendering
GregFreedman: There seems to be two things here. TextTracCue v2 and this thing. If it's all done in advance, do we really need MSE?
jer: The thing is that there may be timed events in the media stream already and you don't want to do the demux twice.
nigel: Alternately, it would also make sense to push all of our components to the same pipeline.
jer: Yes, we are kind of combining 2 discussions in one use case. It's good to have an overview picture.
nigel: Wondering if there's a
model we can think of to change the firing of cues.
... Instead of going through the "time marches on" algorithm
and the browser's idea of where the time probably is.
chcunningham: Through the MSE proposal, you'd have a separate source buffer for the text track data?
cyril: I would think so
chcunningham: Is this imagining new types of metadata that don't exist yet, or exposing metadata that already exist?
cyril: All the specs exist to do
that. In practice, not a lot of people do that.
... I don't know about others.
jer: One possible use case is 608
captions.
... 608 will carry things in-band. Currently, in Safari, it
shows up as a TextTrack.
... That would be one type of currently existing text
track.
cpn: On that note, there is this
Sourcing In-band Media Resource Tracks from Media Containers into HTML document that is in an unofficial state.
... Is it something that we'd want to rationalize?
cpn: From my reading, there's a
number of things referenced from HTML that is not clear whether
they are supported.
... E.g. Media fragments, advanced media fragments.
... The fact that it is not REC is a concern for me.
wolenetz_: We in Chrome have not shipped widely in-band parsing through MSE. Was behind a flag, now the old code is removed.
jer: Neither Webkit.
<cpn> https://www.w3.org/TR/media-frags/
<cpn> https://www.w3.org/TR/2011/WD-media-frags-recipes-20111201/
Yongjun: We always use out-of-band. In-band has never worked in our experience.
wolenetz_: Getting back to the
DataCue use case, seen some use cases around emsg.
... If there could be agreement about specific types, would
that satisfy most needs?
jer: It seems that it's more difficult to implement than the naive approach to expose events when you get them.
nigel: I feel that the
distinction between in-band and not in-band is not always
clear.
... MPEG-DASH is fetching audio, video and text tracks from
separate URLs for instance.
... Is that in-band or not in-band?
... There are schemes in DVB for putting TTML in transport
stream. Mandated for set-top boxes in nordic areas.
... If you fragment that and send that through, you'd like that
to be exposed.
... On the other end, the BBC always does its captioning stuff
out of band.
chcunningham: Two worlds. Broadcast use cases would like to leverage in-band. With the DASH question, it seems it should have a clear answer.
yongjun: Both are possible in DASH. You can put things in-band but no one does that in practice.
David: DASH manifest is essentially in-band. The fact that it's processed in JS is secondary.
[side discussion on the definition on in-band and band]
andreas: Appropriate rendering of cues is a priority. For me TextTrackCue v2 goes in the right direction.
chcunningham: Yes, I'm trying to understand what are the priorities for needs are.
andreas: The question is how to
separate the different activities in different groups.
... WICG DataCue repository. A TextTrackCue proposal for
WICG.
... PAL mentioned yesterday that he wanted to make
responsibilities clear.
cpn: The new generic cue proposal would go in WICG.
hober: Yes, the main goal is to end up with updates on WebVTT specs
cpn: We can of need a place where
we can do the overall architectural piece.
... I guess we could use one or the other to do that.
jer: It seems that the
architectural discussion does not need to generate technical
spec. It could belong in Media & Entertainment IG.
... The DataCue portion, end goal is to do it here.
... For generic TextTrackCue, end goal is Timed Text
cpn: How would the interaction with WHATWG work?
hober: The current envision
working mode with WHATWG is to have them react when needed on
CG repos.
... This room seems like a good place to discuss effective
changes.
nigel: I'm just recognizing that there is lot of media activity going on in different groups. No group chartered to do horizontal reviews of media specs.
hober: Unofficially, the Media WG is the right group to do that.
jer: It's going to be the job of
Mounir and I to coordinate discussions.
... To make sure that the different groups are aware of
discussions when needed.
jongjun: CMAF, WebM, other file formats, what's the integration story? Do we cover all of them?
nigel: In Timed Text, we have
liaisons with a bunch of external organizations.
... e.g. CMAF might say "subtitles shall have IMSC1"
tidoust: regarding the sourcing in-band tracks, is anyone interested in working on it?
jer: it would naturally fit in scope for this group
tidoust: I'm wondering if there are people willing to update the document and implementers's interest should updated be made.
[silence heard]
[Media WG charter allows for group to take the spec on board through DataCue]
See Media Source Extensions repo
Matt: For MSE v.Next, we
currently are trying to figure out the editors. I'm happy to
edit MSE
... Netflix will find someone, also Microsoft will try to find
someone
... Is anyone else interested?
... How to discuss MSE on calls? Last time, we had dedicated
calls
Jer: We'll rotate which specs
need attention for the monthly calls, and can have topic
specific calls
... If MSE needs more time, we'll figure it out
Matt: Some maintenance work is
happening on the W3C repo, and incubation in WICG repo with
branches for each v.Next feature
... Would like a better idea of the process for incubating
v.Next features, and how to merge upstream
Jer: I suggest upstreaming the
existing WICG work as the starting point, then we can do PRs
against the newly upstreamed spec
... Versioning will be the hard part
Matt: It's more complex for MSE
than EME, as there are some old things
... How can we simplify? Will follow up with the team about how
to manage the branches and v.Next
... The only incubation feature with a shipped implementation
is codec switching
... We have some tests in WPT for that feature
... There's clarification added to MSE for codec parameters for
addSourceBuffer and canChangeType, browser not required to
accept them, we're relaxing Chrome's requirements around
that
... Are there IPR considerations around v.Next MSE?
... Can we bulk upstream everything from WICG into the W3C
spec?
Francois: No problem to merge
upstream. At some point we'll publish FPWD. We have full IPR
commitment with the Rec
... So don't worry for now. Publishing FPWD will trigger call
for exclusions
Matt: The tests are in the same media source repo, is this expected procedure? Do we want a folder for V.Next features?
Mounir: We could keep that, for backwards compatibility. Want to avoid breaking stuff. Can put things into MSE, don't see the need to separate them
Francois: I think the tests people prefer to avoid versioning
<mounir> ACTION: mounir to talk to foolip to double check whether versioning is needed for MSE v2 WPT
Matt: We'll continue using Respec. What were the problems with EME regarding Respec?
Mounir: There's no problem
<tidoust> ACTION: tidoust to exchange with wolenetz on setting up MSE repo, updating boilerplate, ReSpec, etc.
Matt: There was some related MSE
discussion around reducing overhead for applications that take
media, containerise it, only for MSE to decontainerise and
play
... There's a proposal for adding new byte stream formats to allow injection of demuxed or raw audio and video frames.
Matt: We can follow this up after
the meeting
... Yesterday, we discussed latency hint. It seems this is
meant to describe what happens after decode. these actions
don't need to depend on what the source was
... I think a latency hint on the media element is for
playback. We could think of use cases wanting to tie this hint
to MSE behaviour, e.g., play through gaps
... Don't want to bind that hind to any of those things. Also
garbage collection regimes
... Prefer to see this done on the media element, rather than
MSE
Mounir: This seems to agree with the conclusion from yesterday
Matt: The next proposal, working
on a prototype, is using MSE from a dedicated or shared Worker
context
... Had a demo at FOMS, found some severe problems with it,
related to implementation rather than the spec
... As it stands, the prototype doesn't use a new
MediaSourceHandle object, it uses a URL to communicate the
identity from the worker context to the main context
... Should have more to say, and an improved demo, by FOMS
Mounir: Do you have service worker in scope? Is it working in the prototype?
Matt: Service worker is
different, it's about intercepting requests and servicing from
a cache
... It's unrelated to Workers which is about threading
Mounir: I think it would make sense to use it from Service Worker
Jer: SW are shared across pages, what's the use case?
Mounir: I see the SW as the thing
that does networking for the page, it's alive when the page is
closed. Enables some offline use cases. When the page tries to
play, it can serve segments from the SW cache
... I think there's benefit of doing that. From a spec point of
view, SW is a kind of Worker
... Is SW out of scope for the spec? What do other implementers
think?
Paul: I think it's not good, I
don't see MSE in SW
... I'd have to see real use cases, I'm skeptical
Jer: There are implementation
restrictions. The difficulty would be connecting the two
processes
... I agree with Paul, we'd have to have concrete use cases to
judge this proposal against
Mounir: Do we have many APIs not available in SW in the platform?
Paul: Yes
Matt: There seems to be consensus that exposing MSE in SW increases complexity, source buffers and GC issues. I agree about seeing use cases that can't be polyfilled
Mounir: Can you fetch from a DedicatedWorker?
Jer: Yes, then the fetch is interposed by a SW if it exists
Paul: Yes, and that does not mean there are going to be a lot of memory copying
Matt: I believe Facebook does the fetching from a Worker context and hands it off to the main context for MSE
Matt: I'm also working on
eviction policies
... Please read the proposal
... A use case is game streaming, where low latency live is
critical, and want to minimise delay by having a single
keyframe, infinite GOP
... MSE doesn't work well with that, it has to buffer
everything, a keyframe and all dependent frames are treated as
one unit for buffering and GC
... I'm working on simplifying this to the core. Should seeking
be allowed? What about seekable and buffered ranges
... I'll have a prototype implementation in Chrome to look at
in more detail at next FOMS
Paul: What's the use case for seeking?
Matt: Seeking to the live head,
if you've got behind.
... If you're using the infinite GOP mode, the keyframe may
already have been dropped from the buffered range, so it may
not be available for seeking
... There's potential for race conditions between seek, decode,
and playback
Jer: Could solve this in the spec
by disallowing seek with infinite GOP
... Could set playbackrate to Infinite to catch up
... Decode as fast as possible
Matt: Could disallow it, or allow
but stall if seeking to a range without a nearby keyframe
... I'm investigating the complexitiies in Chrome
... There's a policy that could collect everything before the
currently playing GOP
... GOP is codec specific, so we'll need to update the proposal
and spec to be less specific
... I'd like to get help with purgeable or pre-emptive GC
... This could be used to prevent the UA from running out of
memory, by not waiting for an explicit remove call
... I would like help with the spec for that
... Not all implementations may be able to do that, and we
wouldn't want it to become the default mode
... Jer, would appreciate your help with that
Jer: OK
Jean-Yves: It's hard to know
before implementing, there can be nasty surprises
... so it's hard to comment on what we may need. I'm looking
forward to seeing the prototype
Matt: What's a keyframe,
something that's signalled as such, or something that actually
is a keyframe?
... Issue #156. When MSE was first worked on, the
createObjectURL created URLS that were auto-revoking
... The implementation would revoke immediately, so couldn't
use in a later event handler
... From discussion at FOMS, Firefox still does this.
... Now there's a
createFor method for auto-revoking object URLs
... If we're using
these from a Worker context to communicate to a main thread
media element, there's a race condition if we use the original
form of object URLs
... Chrome doesn't do auto-revokation currently, so there's an
issue with media elements and object being kept alive
... Working on a new form where things can be removed, and
delay auto-revocation.
... One complexity from auto-revocation, it's diverged from the
MSE spec, will need to coordinate with the File URL folks
Matt: Issue #160
discusses ways to solve how an app can tell an implementation
what to do when it hits a buffered range gap
... Solving interop issues, as well as trying to prevent
stalling, and make the implementation more relaxed with respect
to gaps
... And seeking forward in infinite GOP. Would like this in
v.Next implementation, but not at top of my priority list
Jer: The new editors could take a pass through the issue list, and bring the list to the group for further triage
Matt: Sounds good to me
Jean-Yves: For MSE v.Next, the most requested feature we see is dealing with missing data or gaps, and eviction policy for low latency video
<Zakim> mounir, you wanted to talk about MSE in Workers a bit more
Jean-Yves: There was a bug from
David at BBC, it would stall on one browser and not on
another
... having a uniform approach to dealing with gaps, so should
we wait for data to be appended, or should we skip over
it
... In HLS.js, if they see a gap, they seek over it
ChrisN: Want to keep all viewers at the live playhead as much as possible
Jer: How does it interact with I
frames?
... The spec says you must pause at the end of the buffered
range. Could specify a time limit
Matt: Some kinds of gaps may not be full gaps, maybe the audio could play through but not have enough video
jernoble: there are two kinds of gaps: known by the application, and unknown. one potential to solve the application-case would be to allow the application to explicitly coalesce ranges.
Matt: Should we coalesce the buffered ranges? The app would have to poll for unexpected buffered ranges
Jean-Yves: If the gap is small and will be ignored, should we reflect this in the buffered ranges?
Jer: I think we do already, it's
a CPU problem to poll for the buffered ranges
... If we decide to add spec language on which ranges to skip,
we'll also specify how the buffered ranges would reflect
them.
Matt: Two ways of looking at it.
One idea is to let the media element continue to describe what
the playback behaviour would be
... Or maybe the sourceBuffer is the place to look at the gaps
and see how they've been coalesced.
... Proposal didn't allow apps to report the gaps
Jean-Yves: If you have no video but audio can play through, you don't want to have to wait for the video
Jer: Should we have a different have gap skipping behaviour for audio vs video tracks?
Jean-Yves: With gaps within the same sourceBuffer, reflected in the source buffered range. Then gaps due to missing gaps at the intersection of two buffered ranges, this is data that will not come
Jer: The ability for a client to bridge gaps on a source buffer basis...
Matt: Most MSE players use one
track per sourcebuffer, but there's no ability in a multi-track
source buffer, so you'll see gaps
... Should file an MSE issue to get some notion of track
buffered
... Useful for implementations using muxed content
Jer: CPU usage was high due to
requirement to create new buffered range objects from a polling
loop
... HTML seems to have changed such that bufferedRanges doesn't
require a new object to be created, may want this in MSE as
well
Matt: That would help
<tidoust> See the definition
of the buffered attribute in HTML
... and note "The buffered
attribute must return a **new** static normalized TimeRanges
object"
... completed with the warning "Returning a new object each time is a bad pattern for
attribute getters and is only enshrined here as it would be
costly to change it. It is not to be copied to new APIs."
Matt: There's room for improvement for the app to tell the implementation what to do. Should it stop, or let time march forward, or skip to the earliest buffered thing, lots of options.
Matt: I'd like some concrete use
cases. Keeping up with the live edge is a good one
... May not be solved by what's proposed so far. I haven't had
time to look at this, concrete proposals are welcome.
Jean-Yves: Eviction policy, can only evict when you get new data
Jer: It's bad at the end of video
playback where we hold onto the buffered data unnecessarily. It
has been requested by people at Apple concerned by memory usage
on limited memory devices
... This one might be worth prioritising by the editors
Matt: We experimented in Chrome with pre-emptive eviction, but didn't see much improvement in the playback metrics
Jean-Yves: Also out-of band evictions
Jer: We can't change behaviour of existing applications
Matt: Bad for apps already tuned to existing eviction policiy
<jya> for information: sourcebuffer.buffered needing to return the same object if it hasn't changed
Matt: The newer eviction policies would certainly be more aggressive
Mounir: I talked to one of the
editors of SW, his rule of thumb for exposing APIs is to expose
everything unless it's a foot-gun
... createObjectURL is disallowed in SW, because it's linked to
the timeframe of the Worker
... What's the latest on how we pass the data back to the page
from the worker?
... An object URL or a transferable object?
... for any kind of Worker
Matt: The Worker creates an MSE
object URL and postMessages it to the main thread (although can
have transitive Workers)
... It's just a string
... We can't use createObjectURL from a SW, so would prevent us
from using this API from a SW
... Could we create a MediaSource object from a SW, and what
would it mean to attach this to a media element?
... Trying to attach to multiple media elements will fail
Mounir: I'm worried that we design something that can't be used from this new part of the platform
Jer: Seems to be a question for the SW group. If we created a transferrable object, so I see issues with SW lifetimes and long lived objects
Mounir: I want to avoid having MSE unavailable being unexpected behaviour in SW
Jer: We could file an issue on
whether to expose a MediaSource in a SW context, then discuss
with the SW group
... This could also be a WebIDL issue, would need to check
that
Matt: What scenario where MSE in
SW is required, that couldn't just be solved by SW as proxy and
offline cache?
... It would have to be some small amount of media, because of
buffering in SW cache
Paul: One scenario is extremely tight real-time video where you want to avoid context switches. But this may be better solved in WebCodecs
Mounir: I could be that scenarios arise in future, so we need a strong argument not to do it
Jer: I disagree. This is going to
be hard to specify, so we do need use cases to drive it
... It'll need a lot of spec language
Paul: It's similar to putting AudioContext into SW, or WebGL, why do it?
Matt: With object url based approach doesn't lock us into an approach where we can't use a transferrable MediaSource
Jer: We can use Issue #236 to collect use cases
See [Proposal] Allow Media Source Extensions to support demuxed and raw frames
Jean-Yves: [introduces demuxed and raw frames proposal], lots of settings are not present in the current proposal, e.g. for images decoding, you need decoded size and display size. Also crypto may be per sample, etc.
Matt: How would encryption
information be transmitted with this proposal?
... Extensibility is a good question. A web app on a TV might
have a very long lifetime, what if content providers need an
extension to the format, what happens with previous
implementations?
MarkW: Need a way to append with raw data
Jean-Yves: I see this as an important feature
Matt: This is a problem people
hit all the time, remuxing in JS just to pass to MSE
... Extensibility is a concern, but how valid is the concern?
We could add a new byte stream format to the registry
Yongjun: So we're changing MSE and EME to more of a frame player?
Yves-Jean: It's enabling use without a container
Yongjun: The container needs tobe there for timestamps and init data, we still need these
Jean-Yves: You also need the decoding time stamp for h.264. It's used in MSE, to determine if you have a gap
Yongjun: Try to avoid using the term "frame", prefer "access unit", it's a more comprehensive term
Matt: In MSE we use the term "coded frames"
Jean-Yves: yes, "sample" and "coded frame"
Matt: MSE tries to abstract itself from specific codecs
[discussion of terminology, PES packets]
Yongjun: What about pass through mode?
Jean-Yves: All this deals with compressed data, which may contain video, audio, text. At this stage we don't know
<jya> Gecko calls them MediaRawData
Matt: Thank you, looking forward to collaborating you with on MSE v.Next
Jer: Thank you too
See Media Playback Quality repo
jer: Relatively small spec. Pretty solid, not many issues.
chcunningham: Copied and pasted
from MSE.
... Chrome never launched the API. We have the webkit prefixed
things though, which is weird. I'd like to fix that.
... Some of the issues are pretty straightforward. Some of them
could trigger backward incompatible change
chcunningham: For issue #7, I think we
should, because it has shipped in different browsers.
... For issue #3, apps may be interested in hearing about changes
instead of polling, which could be done with an event, but not
possible with a read-only object that obviously cannot
change.
chcunningham: Any appetite to changing the API and adding an event, or something else? I don't have strong opinions.
GregFreedman: I like the idea of having an event, e.g. when it's dropping.
padenot: Can we create a new member?
jya: With regards to the buffered range, comment is if it's an attribute, then you should always the same object.
padenot: Maybe the right road. What constitutes a meaningful change may be hard. We observe dropping frames on load, when it does not really matter.
jer: You're thinking of avoiding false signals?
padenot: Yes.
Richard: Resizing windows can also cause dropped frames
chcunningham: It's interesting to get into that even if we don't add any kind of eventing.
jya: Example of playing a video at twice the rate. Only one every 2 frames gets displayed. Are the other ones dropped or not dropped?
chcunningham: We were reported as dropped frames. And that's bad. Not a signal of hardware performance.
jer: The spec is pretty clear
here that something is dropped when it misses the
display.
... It didn't miss the deadline, therefore it's not
dropped.
padenot: Yes, it's not composited.
chcunningham: To close on the issue, there seems to be an appetite for eventing but not for breaking backward compatibility, I'll take an action to come up with a proposal.
jer: Interface of read-only
objects is fairly complicated. A dictionary would be
better.
... An option is to have a callback when a value passes a
certain threshold.
<scribe> ACTION: chcunningham to work with jer and propose an API that does some sort of eventing in a backward compatible way, and that converts the VideoPlaybackQuality object to a dictionary
GregFreedman: What happens when video is not in the foreground?
chcunningham: Chrome's smart
about this.
... You should not be observing troubles in that
situation.
... Are there any other dropped frame cases that we should talk
about?
jer: Good question for the content table.
GregFreedman: Sometimes, we see dropped frames when the video starts. Not sure whether that needs to be reported.
jya: For me, the smoothest bit
should be the start. That's the only time when we can guarantee
that we have the info.
... To display canPlayThrough, we wait until we have 5 seconds
and 10 frame buffers
tidoust: What happens with Picture-in-Picture?
chcunningham: That's a good
question. To be looked at.
... Moving on to corrupted frames.
jya: Does it ever happen?
chcunningham: That's my point. Chrome does not have that notion.
jya: We don't have the concept of corrupted frame either.
mounir: As far as I can tell, no one has.
chcunningham: I propose that we remove corruptedFrames from the spec.
PROPOSED RESOLUTION: Remove corruptedFrames from Media Playback Quality
RESOLUTION: Remove corruptedFrames from Media Playback Quality
[discussion on variable framerates]
jya: If we're late, we skip forward to the next keyframe. For example, Chrome will actually pause. We'll always try to play the audio. Similar behavior to what the Flash plugin used to do.
chcunningham: That makes
sense.
... The case I wonder about is people using MSE for live.
jya: It actually makes more sense
for real-time playback, because audio is the most important
there.
... For 4K videos, Firefox may drop a good number of frames.
But then Chrome pauses the video, so no dropped frames. Which
is better?
chcunningham: We should file a GitHub issue and exchange about that
jya: It would be good to define what a smooth media experience is.
jer: Whatever we decide, I believe I should be able to convince the team that does this to adjust behavior.
mounir: I just want to remind
people that changing object to dictionary may be backward
incompatible. Interfaces are exposed today. Mostly used for
feature detection. I don't personally care because we didn't
ship the interface. But others may.
... I don't know if you wanted that to happen.
padenot: We may have counters of
usage. I can check.
... We remove the moz- prefixed properties.
mounir: We have the webkit- prefixed properties. It is used in practice.
See Media Session
jer: A new API incubated in WICG
to provide integration with platform media controls
... play, pause, skip etc. intercepted by JS and implemented by
page
mounir: Chrome launched
MediaSession on desktop for hardware keys
... working on UI in Chrome that would also use those
keys
... stop action added to spec
... playback state too, so you can define how long the playback
is, playback rate and position
... can design UI with scrubber
... spec changes not in Chrome yet. Planned.
... working on plugin to enable Websites to benefit from
this
paul: Gecko shipping this too
jer: WebKit also interested
<tidoust> Issue #233 Add "seek to start" and "seek to live" actions
cpn: doing live content, 24.7,
segmented into shows
... have feature to restart from start of the show within live
stream
... also seek to live position
... would be interesting to us to support these actions
too
... more generally, what kind of actions should be added to MS
beyond current set
... current set based on fixed duration on demand
... want to use with live content too
jer: do you present live stream to infinite duration stream, or presented with a duration for current show ?
cpn: latter - DASH stream with available time range
yongjun: Do you have all the future segments in the DASH manifest
<scribe missed answer>
jer: also up to page to implement the behaviour when a hardware button is pressed
cpn: concern is "next track" and "previous track" buttons might mean next / previous program vs beginning / live position in current
mounir: actually previous track
often used to go back to beginning of current thing
... try to use this API to expose hardware buttons without
implying what they are
... on Android we expose these keys to Android
MediaSession
... if something not supported ... ?
... don't know if 'start' and 'live' are common
... talked about defining names for these
cpn: issue would then be the UI
jer: may not have control of the
UI to demonstate button would be skip-to-live
... e.g. can't change label of skip buttons on YouTube
... but if you have a touch bar we mighty be able to label it -
similar for PIP
... range of possible UIs means it would be hard to require
buttons like this to be reflected in UI
mounir: have limited set of icons we can use
jer: however, if you let the UI know what actions are supported bworser can choose what best to show
cpn: so we would say we
want
... next track / live track
mounir: recommend that. You'll get something on those platforms which have suitable buttons
jer: did discuss enum value for
skip
... don't want to open it to arbitrary string
... limited set of localizable values
... e.g. select skip button and choose from set of allowed
lables
... one of those could be skip-to-live
... could look into that, then BBC could prioritize the
skip-to-live button and let the UI show it
mounir: replied to issue
<tidoust> Issue #191 TAG Feedback: of all the potential metadata...?
cpn: In the TAG feedback, Travis asks why we pick
artist / title / album /
... my comment: those very music track specific. What are they
for ? Display ? Might want more general purpose display
fields
... or is the platform making semantic use ? e.g.
recommendation based on artist. This is a whole space of media
metadata.
... there are semantic web vocabs for this, schema.org
mounir: why specific to music:
when Chrome did this mostly focussed on music. reflected
priorities of the time. can change
... presentation vs semantic: not semantic, but do pass the
information back to the OS on mobile
... metadata matters. Watch etc. tries to display based on what
it thinks is important
... maybe it will favor artist over album. can't do that if you
supply line 1, line 2, ...
... but actually we just try to display it all
cpn: concern is that if I have radio show I need to do my own mapping and then rendering will be different on different devices
jer: same information might be
displayed in multiple places and we can't say e.g. how much
space
... that said, artist / title / album could be
generalized
... right now implementations are simple, though. space to
improve
mfoltzgoogle: how many lines do you get today
mounir: today we show
everything
... notifications more complex
mfoltzgoogle: if we add more semantic tags which we can't display all at once. Who prioritizes ? Browser, Page ?
jer: existing problem with
actions - no priority score on which actions page thinks are
more important. UA chooses
... only choice page has is binary to advertize support for the
action or metadata or not
... don't think there is an alternative
... if we had schema.org syntax I'd hope we could use it
... how much metadata is on a iTunes track, for example ?
Doesn't all show up: already have this problem.
... doesn;t get harder if we add more data now
mounir: schema.org - weren't
aware of this enough when writing spec. did start a project to
use schema.org as default values for media session in
Chrome
... not shipping soon, but wonder if we should incorporate into
Media Session spec. Not sure how easy this would be ?
cpn: v.interesting for us
mounir: yep, looking into it in Chrome. Can show a nice UI today, but just the title
cpn: if we annotated using schema.org we'd like all UAs to pick that up
jer: are you saying alternative to broadening the schema for metadata would be just to use page level schemes
mounir: MediaSession is
imperative not declarative, but schema.org is targeted at
search engines
... YouTube does not provide schema.org data if you are not a
search engine
... need to have a better understanding or relationship between
schema.org and the content metadata
... don't know if TAG would like this
cpn: Travis referrenced schema.org
mounir: but it's a strange
JSON-LD thing that browsers don't support
... don't want to copy-paste from there. good point. we need to
look into this
cpn: would we be constrained by Android MediaSession ?
mounir: no, we would just convert
if we are missing anything
... mostly title / artist
tidoust: can really go a long way
describing these things with just a few base properties from
schema.org "Thing"
... can already do a lot. CreativeWork underneath Thing
... very useful way to normalize everything in the world in a
shallow structure
... including artwork
... more specific Things too with further properties
mounir: my read is that noone else has anything to add
paul: there are some issues raised by Mozilla, #235, #237, #238, others (?)
mounir: #237 is WebIDL Boris bug
paul: #238
editorial, there is a patch
... PR#235
... constructor change needed evrywhere
mounir: <merged the patch>
paul: Issue #234 is more substantial: MediaSessionActionHandler doesn't work for seek operations
mounir: had this issue with
Permissions API and ended up using object
... made me very sad
... permissions API you have a descriptor and WebIDL - object
bypasses the WebIDL
jer: same as generic TextTrackQueue - derived IDL interface ca\n't accept a different type for the same method as superclass
eric: you can use any
... we had a tag in the base class that you have to key off
paul: I like same solution for all occurences of this problem. object used in the past
mounir: should we ask TAG ?
jer: might also be q for IDL people
paul: let's punt until we have the relevant people
mlamouri: pip api is launched in
chrome
... proprietary (different) api launched in safari as
well
... Mozilla has chosen to defer
jer: apple's api is not entirely the same, but it is compatible
mounir: companies have built
polyfill to use either api
... the api nowadays is not very controversial
... considered a skip button, didn't pan out
... considering an auto-pip behavior
... chrome has some code for auto-pip, incomplete, behind
flag
... also looking at integrating media-session and pip in chrome
(integration detail)
... we should also discuss arbitrary content in pip
... lets talk about v1 topics first
scott: issue #119
... we want to define when the controls should show in the pip
window
jer: clarifying, the api would let the page tell the UA what controls to show
mounir: we (UA) may not know perfectly what will be shown
scott: the idea - when their are
action handlers associated, is that when we should show the
buttons for those actions?
... or should it be up to ua
jer: up to UA
... say UA is trying to implement but has no control over what
shows up in pip window
... if they can control, then they can look at the installed
handlers
mounir: site could disable the action via media session to hide controls in pip
jer: issue #167, someone wants to flip camera rendering to simulate mirror mode while capturing in pip
mounir: this is v2
... the way its so far implemented we ignore any transforms
eric: it could theoretically be implemented
mounir: UAs don't have a clear strategy to go about it
jer: if they need to do this
without a spec change they could do video -> canvas ->
transform -> media stream -> video -> pip
... crazy, but doable
fbeaufort: spotify folks use this canvas trick to display a pop up video player
paul: thats very
inefficient
... lots of main thread, memory thrashing
mounir: issue #163 about
maximizing the window
... not a good idea for security reasons
eric: i agree
mounir: issue #166, add option to go
full screen
... like the idea, but UA would should its own controls, which
sites may not like.
jer: is this bug asking for full screen on closing the pip, or go straight from pip -> full screen?
mounir: the latter
markf: has full screen been considered as a media session control?
paul: on android, if you tap on pip window in android there are os controls to go back to previous non-pip state, which may have been full screen
mounir: here it looks like user wants to do full screen from pip. real solution is to have a full screen media session action
chrisn: would be the interaction if you requested full screen while pip is up?
mounir: if you click full screen in the player while pip is open outside, we swap to full screen
chrisn: similar queston for remote playback
mounir: i think we don't re-inline window if you initiate remote playback
chrisn: so pip window would remain, just empty state?
mounir: yes, ideally we should fix that
jer: i think its a UX choice. designers call what to do with pip window on fullscreen/remote playback
mounir: we could add some non-normative suggestions for these
<scribe> ACTION: mounir to add non-normative suggestions for behavior for pip window if remote/full screen engaged from main player
jer: issue 156, integration with HTML
mounir: we got this feedback for a number of specs on moving from wicg to media wg
mounir: something they suggested
was to merge into html spec to avoid monkey patching
... i'm opposed. we should avoid monkeypatching, but opposed to
merging into html spec because its already a fairly large spec
and it will split the pip spec into two places
... what does the group think
chrisn: i've seen a tag issue on this, concern about monkey patching in general
paul: html spec is already so big
mounir: what I hear is we could consider changing interfaces, but we generally don't want to?
paul: yes, i oppose changing the
interface for theoretical purity
... i see dominic's point, but media is special. everything
hangs off the media element, but we have a ton of specs that
are different/complicated extending this outside html
... his concern is monkey patching, not extending the
interfaces?
... can we do the opposite? take html media element to a
different spec?
mounir: open to it, but suspect its not up for debate
jer: at some point ian conceded
we could move to a separate spec, but it never happened
... agree it is a possibility
paul: is it self contained
enough?
... could we grab a clean section?
eric: it does seem possible
jer: the media section is ~70 pages of the 900+ pages of the multipage html spec
mounir: so the room seems opposed to merging things back to html; we're open to moving media out of html
<scribe> ACTION: jer to discuss moving media out of html w/ hober
mounir: moving on to pip v2
... first topic: who is interested in an API that can do more
than show a video?
... bbc, microsoft
jer: is netflix not interested?
greg: we tested it, but its perceived not a net benefit to users. still a possibility, but slim
jer: does netflix app on ipad do auto-pip
jya: yes it does, when you press home button
greg: netflix pip testing in general, haven't made final conclusion, but lean toward reserving the toolbar space for other things
mounir: most of feedback we got
initially was for non-video additions to pip
... new buttons,
etc (e.g.
mute)
... Youtube, twitch, similar feedback
... folks clearly wanted to customize the window and the
existing api was too restrictive
greg: let me add, web UI felt strongly about pip (positive), but it didn't improve our streaming metrics
mounir: netflix may not be the core pip use case - many cannot multitask with a pip movie and something else
mark: we have folks who are interested, will continue to provide input, but can't commit to a roll out at this point
mounir: we tried initially to do
custom controlls
... thought this would solve most issues (mute, full screen)
etc
... but it had drawbacks: couldn't let folks use their own
icons
... we pivoted to an API that lets you put anything in a pip
window
... not a pop-up, still requested via pip api
... API looked weird, but it was feasible
... worried about dev experience, moving objects between
documents
... most ua's reset video when it leaves a document
... for EME, would mean resetting keys (possibly not
fixable)
... next: could we take any part of the dom and show in pip
(similar to full screen today)
... killed this idea, pip window would inherit screen/window
attributes of its parent, makes position/sizing very hard
jer: have you considered doing presentation API -> present to pip window?
markf: yes, in second screen
group, explored a second window object
... resolved some of the issues for opening second window
... but still needed a post msg api to talk between the
windows
... seemed to high a barrier to adoption
eric: js in the current page couldn't access the pip
jer: sites that might adopt pip
v2, might also adop presentation api for casting
... they might just work together automatically. post msg could
work the same for remote cast as to a separate window?
markf: there is a second browser context loaded, but presentation API users mostly run in 2UA mode, where cast is not necc presenting the same content as the original player
mounir: presenation api even for
going full screen on a separate window seems difficult for
developers
... the latest thinking (early draft), is to do an element that
you could use to write some content, like an iframe, but not.
would have different window/screen instances.
... can integrate with cast and full screen api
... aim: keep dev experience smooth as possible, let sites say
the player is this special object, which changes how its
rendered depending on its pip/fullscreen/inline mode
... very similar to iframe-seamless - old idea that was never
implemented
... its a very big change for pip strategy
... but hope it resonates with how sites actually position
players
eric: off top of head, sounds
like a huge amount of work to impl
... looking at incremental benefit vs other issues, I'm not
sure cost-vs-benefit is worth it
jer: maybe eaiser if were actually an iframe?
eric: ppl tried to do this with magic iframe, it was a disaster
mounir: the inner frame would be a different execution context, but you could see/manipulate the doc from parent frame
jer: it might be easier impl/spec
if we didn't try for seamless iframe here
... exploring an alternative: original proposal seemed very
spec heavy. but leveraging an iframe instead, letting it go
full screen, could solve porblems
... if we had said earlier that the only thing that can
fullscreen is a document element, this would solve a lot of
problems for mac UI (full screening inner elements is awkward
for multi-desktop)
... if we do the same thing for pip, we may avoid similar
issues
mounir: this implies every site
would wrap a player in an iframe
... sites would not do this because it adds load latency
... having this seamless element is easier for sites to use,
just harder for us to implement
markw: the amount of work required to impl needs to match the value
tidoust: could we simply do overlays on pip?
jer: our HI teams won't agree to
even less invasive proposals
... no way they will agree to this
mounir: also considred a pip where we could paint whatever you want, but no interaction
jer: pip 1.1 could be to take a video element with associated gl canvas on top?
mounir: similar to what we
proposed, sending part of the dom
... would require us to create a window with web content
... return the issue where screen/window info is wrong
jer: if we had generic text cue,
we would make it work in pip
... but not interactive
ericc: yes
... some open questions still
jer: so canvas overlay on pip will come naturally from our impl of generic text cue
mounir: this won't be enough for
some sites
... in summary, apple seems to think this is too complex
... mozilla goes back to "defer" position
... technical aspect?
paul: I see the value, but seems
very complex
... if it were implemented, it seems useful, but it seems very
hard
jer: take a look at the history of magic iframe (failed)
eric: it was maybe a mode for iframe, definitely a webkit thing, not sure of time, caused all kinds of issues
See Autoplay Policy Detection repo
adenot: On the Web, videos are
now often blocked from playing automatically, with sound.
... You can know you're blocked by calling play(). When
playback is disallowed, the promise is rejected.
... This requires having a source, and requires mutating the
state of the media element.
... Parties have requested an API to not mutate the video
element, to know if autoplay would be allowed.
... For example, fetching different media with subtitles if
autoplay is not allowed.
dbaron: Seems to be a property of the media, if it has sound.
adenot: Not a property of the content, whether it's muted.
Tess: Safari detects whether the media has an audio track to determine autoplay.
mounir: The use case from FOMS is to ask whether to show an ad with sound or without sound, the checks take too much time.
padenot: 2 ways to achieve. Per-document or per-element.
[Reviewing proposal from TPAC 2018]
padenot: API would return what
play() would return.
... Metadata is important. You don't know if you have an audio
track or not without it.
... Webaudio has an event. Needs to know if it's slow to start,
or won't start at all.
... document-level API resolves this.
... there is a new readonly attribute on document that returns
a value for the enum.
... allowed, allowed-muted, disallowed, unknown.
... unknown is for a
per-media-element policy.
... Firefox can ban all autoplay (even w/o sound).
jer: So does Safari.
ppadenot: We need a thing that
returns a value.
... 2 ways. A sync API, readonly attribute that hangs off the
document.
jer: Answer can change over time based on user interaction. Answer will change from call to call.
mwatson: Will the spec define
enum values, what autoplay means?
... Make it clear to developers what is allowed and not
allowed.
mounir: Would like to go into a state where it's better defined (like muted).
jer: [Question about user
gesture] It's not universal. We didn't have a definition of
user activation until recently.
... would allow us to make a normative reference.
jya: Mozilla used to have a banner for autoplay. We replaced it with a user gesture.
mwatson: Sites broke because behavior was not well defined.
Tess: Distinction between defining the browser behavior, and writing the spec.
jya: This one is designed for our users, versus what content providers want.
mounir: User activation will
define how long the gesture is allowed, through a Promise
chain.
... today when you click, some browsers will work in event
handler, others will use a timeout.
... musta's work will use a timer or propagate in a
setTimeout.
... just exploration at the moment.
padenot: Contention is shape of API. Pros and cons.
mounir: sync is clear and simple.
con is that policy impl is complicated, overhead.
... pro for async: does not require prior computation. Con is
more complicated, some delay.
jya: With async, user actions may
the answer.
... Answer could change in an event loop.
mounir: Permissions API has an event. Site can be aware of change if they want to.
jya: if you had a sync attribute,
and event, how would Chrome implement it.
... Whenever someone read it, it returns unknown, read it
again, it returns a different value.
... we could add a new state since unknown means check the
element.
mounir: Today Chrome has to do a
lot of work to get the autoplay policy, we have settings, we
have a preseed list.
... we have an override mechanism for users, and a pre-seed
list.
... we need the content settings, we need the database, and we
need the pre-defined values.
... today it's every time we go to a page. We would like to do
this lazily.
... there is some expectation that paused stays true when you
start play().
... could change since play returns a Promise.
... a browser in the future could need an async API. The only
problem here is a microtask.
ericc: Doesn't the browser need it for the autoplay attribute?
mounir: We could do a roundtrip in the browser. Chrome doesn't use the attribute until it's visible, no timing expectation.
<Zakim> markw, you wanted to ask another question about the definition of autoplay, not to comment on the a/synchronous issue
David_Baron: I feel like it's close to the boundary
padenot: This is implementable
when doing complicated things, by getting info on page
load.
... Already a cross-process message. We hit the database when
you fetch the website.
... We send this one bit, autoplay allowed or not. User won't
have a delay.
Tess: Unclear that autoplay processes can be improved by disk access. People seem to do well with small databases.
jernoble: The stakes of this argument are low. We don't have a clear answer for design principles.
Tess: Action item to make principles clearer.
jernoble: World with more
databases, more processes to load a page. If something
intrinsically requires a DB, or a fetch, then a Promise is the
right choice.
... If it doesn't require a network access.
... It may not need a Promise.
mounir: Permissions API, TAG
asked us to make the API async. If query, which is much simpler
than autoplay, is async, why is autoplay?
... autoplay is more complex than permissions. More states than
a key-value entry.
... Permissions had to be async, because it's a database
access. Not reasonable to dump autoplay DB into renderer.
... If we have a good reason to believe it will evolve, then
async.
jya: Complexity of the
implementation in Chrome.
... Improve means that the result will be slower, it's not an
improvement.
David_Baron: Implementer needs are not always lowest. Our ability to evolve the platform, don't make one thing perfect at the cost of other attributes.
padenot: The solution is already implemented. As a user it works.
mounir: It slows down the loading of Chrome. It increases memory usage. Load data we don't need.
jernoble: We care about page load
time, from a fresh launch. If something affects page load time,
someone will care.
... Even if the memory usage is amortized over page
loads.
... We will need to take care of edge cases with an async API,
it's not free
mounir: On iOS you need to wait for a user gesture.
Sanghwan_Moon: From a user
perspective, sync makes more sense. Impl complexity argument
doesn't make a lot of sense, personally.
... Would making it async introduce anti-patterns, reading
multiple times.
jya: Typical use, if <blah> then loading my page.
Sanghwan_Moon: What if it took 200ms?
mounir: Most implementations would call play on the result.
jya: Typical: can I autoplay, or
show a popup to allow autoplay.
... news page, please click to enable your sound.
mounir: Would a delay be negative for the user? Ads use case, need to show right away.
jya: Ease of implementation, add a message if the policy should change. Event when the policy is known.
mounir: We won't do that.
David_Baron: Writing content for Chrome, others using event and not looking at value.
mwatson: What is not an autoplay?
When the user has requested playback.
... I've put up a play button, the user experience could be a
sequence of videos.
... Our definition of autoplay needs to account for that.
That's an experience we want to deliver.