This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Some media frameworks do not make raw metadata available, but only a parsed form (which we can expose on .value as proposed in bug 25353). Because UAs built on such media frameworks may not have access to the raw metadata, we should make .data optional.
What is the benefit of providing raw AND parsed forms? I don't believe we do this for WebVTT today do we? ISTM that the UA either knows how to parse it or it doesn't and we should represent these 2 situations separately. I was always under the impressions that DataCue represented something the UA didn't know how to parse and so it was passing the bytes off to the application to see if it could deal with it.
I think the idea is that we would generally provide either .data or .value, but if a UA gains the ability to parse the data and adds .value, it should continue providing .data if possible to avoid breaking existing applications. I'm not opposed to splitting this into two different cue types if the consensus is that that's better though. For Apple's use-cases, we might want to start by defining an ID3Cue (or maybe more generically named, since name/value pairs are a common kind of metadata).
(In reply to Brendan Long from comment #2) > I think the idea is that we would generally provide either .data or .value, > but if a UA gains the ability to parse the data and adds .value, it should > continue providing .data if possible to avoid breaking existing applications. > > I'm not opposed to splitting this into two different cue types if the > consensus is that that's better though. For Apple's use-cases, we might want > to start by defining an ID3Cue (or maybe more generically named, since > name/value pairs are a common kind of metadata). For known well defined things like ID3 I think it is better to define and expose an ID3Cue instead of using something generic like DataCue. I would be interested in participating in the discussion around exposing ID3 data via an ID3Cue.
(In reply to Aaron Colwell from comment #3) > (In reply to Brendan Long from comment #2) > > I think the idea is that we would generally provide either .data or .value, > > but if a UA gains the ability to parse the data and adds .value, it should > > continue providing .data if possible to avoid breaking existing applications. > > > > I'm not opposed to splitting this into two different cue types if the > > consensus is that that's better though. For Apple's use-cases, we might want > > to start by defining an ID3Cue (or maybe more generically named, since > > name/value pairs are a common kind of metadata). > > For known well defined things like ID3 I think it is better to define and > expose an ID3Cue instead of using something generic like DataCue. I would be > interested in participating in the discussion around exposing ID3 data via > an ID3Cue. I wouldn't necessarily call ID3 a "well defined" format. An ID3 metadata item has a key and a value, and while you can infer the value's data type from the key in reality it can be anything at all. ID3 is not the only metadata format with *exactly* these characteristics (eg. QuickTime, MPEG-4, MPEG-7, etc). Are you really suggesting that we create different Cue types for every one?
(In reply to Eric Carlson from comment #4) > (In reply to Aaron Colwell from comment #3) > > (In reply to Brendan Long from comment #2) > > > I think the idea is that we would generally provide either .data or .value, > > > but if a UA gains the ability to parse the data and adds .value, it should > > > continue providing .data if possible to avoid breaking existing applications. > > > > > > I'm not opposed to splitting this into two different cue types if the > > > consensus is that that's better though. For Apple's use-cases, we might want > > > to start by defining an ID3Cue (or maybe more generically named, since > > > name/value pairs are a common kind of metadata). > > > > For known well defined things like ID3 I think it is better to define and > > expose an ID3Cue instead of using something generic like DataCue. I would be > > interested in participating in the discussion around exposing ID3 data via > > an ID3Cue. > > I wouldn't necessarily call ID3 a "well defined" format. An ID3 metadata > item has a key and a value, and while you can infer the value's data type > from the key in reality it can be anything at all. Fair enough, but certain keys have particular meanings when considered in the ID3 context. It is pretty straightforward to define what an ID3Cue would look like and applications would know how to handle these key/value pairs when it received an ID3Cue. > > ID3 is not the only metadata format with *exactly* these characteristics > (eg. QuickTime, MPEG-4, MPEG-7, etc). Are you really suggesting that we > create different Cue types for every one? It depends on which problem we are trying to solve. If we are trying to solve the "How do I get song title, track, album data out for any media type?", then it would probably make sense to have a specific cue type for this and define mappings from all those formats into specific attributes. If we are trying to solve the "Just expose the raw metadata in the underlying format" then I think something like format specific cues or the combination of TextTrack.inBandMetadataTrackDispatchType and DataCue w/ nothing but a .data attribute is sufficient. I think any of these paths are easier to reason about than sequences of DataCue<type, data>, DataCue<type, data, value>, or DataCue<type, value>.
(In reply to Aaron Colwell from comment #5) > (In reply to Eric Carlson from comment #4) > > (In reply to Aaron Colwell from comment #3) > > > (In reply to Brendan Long from comment #2) > > > > I think the idea is that we would generally provide either .data or .value, > > > > but if a UA gains the ability to parse the data and adds .value, it should > > > > continue providing .data if possible to avoid breaking existing applications. > > > > > > > > I'm not opposed to splitting this into two different cue types if the > > > > consensus is that that's better though. For Apple's use-cases, we might want > > > > to start by defining an ID3Cue (or maybe more generically named, since > > > > name/value pairs are a common kind of metadata). > > > > > > For known well defined things like ID3 I think it is better to define and > > > expose an ID3Cue instead of using something generic like DataCue. I would be > > > interested in participating in the discussion around exposing ID3 data via > > > an ID3Cue. > > > > I wouldn't necessarily call ID3 a "well defined" format. An ID3 metadata > > item has a key and a value, and while you can infer the value's data type > > from the key in reality it can be anything at all. > > Fair enough, but certain keys have particular meanings when considered in > the ID3 context. It is pretty straightforward to define what an ID3Cue would > look like and applications would know how to handle these key/value pairs > when it received an ID3Cue. > ID3 metadata values can be text, an image ("APIC") with optional MIME type, a defined structure (eg. "POPM", "AENC", etc), a number (eg. "PCNT"), or arbitrary binary data (GEOB) with optional MIME type. It seems to me that will require an ID3Cue attribute(s) to provide at least String, Object, and ArrayBuffer values. This is exactly what Ted is suggesting. > > > > ID3 is not the only metadata format with *exactly* these characteristics > > (eg. QuickTime, MPEG-4, MPEG-7, etc). Are you really suggesting that we > > create different Cue types for every one? > > It depends on which problem we are trying to solve. I think we are trying to provide script access to arbitrary metadata carried in media files. Nothing more. If we are trying to > solve the "How do I get song title, track, album data out for any media > type?", then it would probably make sense to have a specific cue type for > this and define mappings from all those formats into specific attributes. If > we are trying to solve the "Just expose the raw metadata in the underlying > format" then I think something like format specific cues or the combination > of TextTrack.inBandMetadataTrackDispatchType and DataCue w/ nothing but a > .data attribute is sufficient. I think any of these paths are easier to > reason about than sequences of DataCue<type, data>, DataCue<type, data, > value>, or DataCue<type, value>. http://id3.org/id3v2.4.0-frames defines more than 40 keys for "text" values alone. Do you suggest we define key names for all of them? We will have to have an escape mechanism for unknown keys, why not just use the ID3 keys as-is?
(In reply to Eric Carlson from comment #6) > (In reply to Aaron Colwell from comment #5) > > (In reply to Eric Carlson from comment #4) > > > (In reply to Aaron Colwell from comment #3) > > > > (In reply to Brendan Long from comment #2) > > > > > I think the idea is that we would generally provide either .data or .value, > > > > > but if a UA gains the ability to parse the data and adds .value, it should > > > > > continue providing .data if possible to avoid breaking existing applications. > > > > > > > > > > I'm not opposed to splitting this into two different cue types if the > > > > > consensus is that that's better though. For Apple's use-cases, we might want > > > > > to start by defining an ID3Cue (or maybe more generically named, since > > > > > name/value pairs are a common kind of metadata). > > > > > > > > For known well defined things like ID3 I think it is better to define and > > > > expose an ID3Cue instead of using something generic like DataCue. I would be > > > > interested in participating in the discussion around exposing ID3 data via > > > > an ID3Cue. > > > > > > I wouldn't necessarily call ID3 a "well defined" format. An ID3 metadata > > > item has a key and a value, and while you can infer the value's data type > > > from the key in reality it can be anything at all. > > > > Fair enough, but certain keys have particular meanings when considered in > > the ID3 context. It is pretty straightforward to define what an ID3Cue would > > look like and applications would know how to handle these key/value pairs > > when it received an ID3Cue. > > > > ID3 metadata values can be text, an image ("APIC") with optional MIME type, > a defined structure (eg. "POPM", "AENC", etc), a number (eg. "PCNT"), or > arbitrary binary data (GEOB) with optional MIME type. > > It seems to me that will require an ID3Cue attribute(s) to provide at least > String, Object, and ArrayBuffer values. This is exactly what Ted is > suggesting. Are you planning on creating a DataCue for each individual frame in an ID3 tag? I was assuming that a single ID3Cue would contain all the values in a single tag. I was expecting ID3Cue to expose a map<DOMString, any> type interface. Was your plan for ID3 in DataCue to have .type be "ID3" and .value return an Object with keys representing the frame name and the values being the frame contents? At the end of the day, I'd like there to be a way to force agreement on how particular metadata is exposed so we don't end up with multiple ways for exposing stuff like ID3. I'd also like provide a clear way to differentiate between the UA having no clue what the data is vs it has parsed the data into the agreed standardized structure for the application. Do we actually need .data AND .value? Couldn't we just represent this as something like .type = "unknown" (or "application/octet-stream" or "") and .value = an ArrayBuffer() w/ the data? It seems like that would be better than having everything optional. > > > > > > > ID3 is not the only metadata format with *exactly* these characteristics > > > (eg. QuickTime, MPEG-4, MPEG-7, etc). Are you really suggesting that we > > > create different Cue types for every one? > > > > It depends on which problem we are trying to solve. > > I think we are trying to provide script access to arbitrary metadata carried > in media files. Nothing more. > > If we are trying to > > solve the "How do I get song title, track, album data out for any media > > type?", then it would probably make sense to have a specific cue type for > > this and define mappings from all those formats into specific attributes. If > > we are trying to solve the "Just expose the raw metadata in the underlying > > format" then I think something like format specific cues or the combination > > of TextTrack.inBandMetadataTrackDispatchType and DataCue w/ nothing but a > > .data attribute is sufficient. I think any of these paths are easier to > > reason about than sequences of DataCue<type, data>, DataCue<type, data, > > value>, or DataCue<type, value>. > > http://id3.org/id3v2.4.0-frames defines more than 40 keys for "text" values > alone. Do you suggest we define key names for all of them? > > We will have to have an escape mechanism for unknown keys, why not just use > the ID3 keys as-is? No. I was assuming the frame names would be used as the map keys.
(In reply to Aaron Colwell from comment #7) > (In reply to Eric Carlson from comment #6) > > > > ID3 metadata values can be text, an image ("APIC") with optional MIME type, > > a defined structure (eg. "POPM", "AENC", etc), a number (eg. "PCNT"), or > > arbitrary binary data (GEOB) with optional MIME type. > > > > It seems to me that will require an ID3Cue attribute(s) to provide at least > > String, Object, and ArrayBuffer values. This is exactly what Ted is > > suggesting. > > Are you planning on creating a DataCue for each individual frame in an ID3 > tag? I was assuming that a single ID3Cue would contain all the values in a > single tag. I was expecting ID3Cue to expose a map<DOMString, any> type > interface. > > Was your plan for ID3 in DataCue to have .type be "ID3" and .value return an > Object with keys representing the frame name and the values being the frame > contents? > Exactly: I was thinking .type would be "ID3" (or maybe "org.id3") and .value would return an Object with .key containing the literal frame name and .data containing an Object with the frame contents. > At the end of the day, I'd like there to be a way to force agreement on how > particular metadata is exposed so we don't end up with multiple ways for > exposing stuff like ID3. I'd also like provide a clear way to differentiate > between the UA having no clue what the data is vs it has parsed the data > into the agreed standardized structure for the application. > Agreed > Do we actually need .data AND .value? Couldn't we just represent this as > something like .type = "unknown" (or "application/octet-stream" or "") and > .value = an ArrayBuffer() w/ the data? It seems like that would be better > than having everything optional. > That was our original idea, but when we presented it at the F2F last week Maciej pointed out that that would make it difficult for a UA to return a typed value for a key after shipping a version where it returned it as an ArrayBuffer.
(In reply to Eric Carlson from comment #8) > (In reply to Aaron Colwell from comment #7) > > Do we actually need .data AND .value? Couldn't we just represent this as > > something like .type = "unknown" (or "application/octet-stream" or "") and > > .value = an ArrayBuffer() w/ the data? It seems like that would be better > > than having everything optional. > > > That was our original idea, but when we presented it at the F2F last week > Maciej pointed out that that would make it difficult for a UA to return a > typed value for a key after shipping a version where it returned it as an > ArrayBuffer. I guess I was under the assumption that .type would be different for the raw unparsed vs parsed. Is the idea that it is better to roll out a single .type that a UA can potentially backfill .value in a later revision instead of not outputting a .type until the UA can support a parsed .value?
(In reply to Aaron Colwell from comment #9) > (In reply to Eric Carlson from comment #8) > > (In reply to Aaron Colwell from comment #7) > > > Do we actually need .data AND .value? Couldn't we just represent this as > > > something like .type = "unknown" (or "application/octet-stream" or "") and > > > .value = an ArrayBuffer() w/ the data? It seems like that would be better > > > than having everything optional. > > > > > That was our original idea, but when we presented it at the F2F last week > > Maciej pointed out that that would make it difficult for a UA to return a > > typed value for a key after shipping a version where it returned it as an > > ArrayBuffer. > > I guess I was under the assumption that .type would be different for the raw > unparsed vs parsed. Is the idea that it is better to roll out a single .type > that a UA can potentially backfill .value in a later revision instead of not > outputting a .type until the UA can support a parsed .value? .type identifies the .key namespace. It should not change depending on whether or not the UA is able to parse the cue value.
(In reply to Eric Carlson from comment #10) > (In reply to Aaron Colwell from comment #9) > > (In reply to Eric Carlson from comment #8) > > > (In reply to Aaron Colwell from comment #7) > > > > Do we actually need .data AND .value? Couldn't we just represent this as > > > > something like .type = "unknown" (or "application/octet-stream" or "") and > > > > .value = an ArrayBuffer() w/ the data? It seems like that would be better > > > > than having everything optional. > > > > > > > That was our original idea, but when we presented it at the F2F last week > > > Maciej pointed out that that would make it difficult for a UA to return a > > > typed value for a key after shipping a version where it returned it as an > > > ArrayBuffer. > > > > I guess I was under the assumption that .type would be different for the raw > > unparsed vs parsed. Is the idea that it is better to roll out a single .type > > that a UA can potentially backfill .value in a later revision instead of not > > outputting a .type until the UA can support a parsed .value? > > .type identifies the .key namespace. It should not change depending on > whether or not the UA is able to parse the cue value. I think I misunderstood an earlier comment you made. Are you planning on creating a new DataCue for each ID3 frame in a single ID3 tag? I was assuming that you were grouping all the ID3 frames contained in a ID3 tag in a single DataCue. My assumption was that parsed vs unparsed frame values would be an internal detail of the ID3 mapping to DataCue.value and not leak out into the DataCue interface itself.
(In reply to Edward O'Connor from comment #0) > Some media frameworks do not make raw metadata available, but only a parsed > form (which we can expose on .value as proposed in bug 25353). Because UAs > built on such media frameworks may not have access to the raw metadata, we > should make .data optional. Is the use case here indeed ID3 tags? I am not aware of ID3 being a timed metadata format - I always thought it's just providing file metadata at the beginning of the file. That doesn't map into the text tracks model of time-aligned cues IMHO.
(In reply to Silvia Pfeiffer from comment #12) > (In reply to Edward O'Connor from comment #0) > > Some media frameworks do not make raw metadata available, but only a parsed > > form (which we can expose on .value as proposed in bug 25353). Because UAs > > built on such media frameworks may not have access to the raw metadata, we > > should make .data optional. > > Is the use case here indeed ID3 tags? I am not aware of ID3 being a timed > metadata format - I always thought it's just providing file metadata at the > beginning of the file. That doesn't map into the text tracks model of > time-aligned cues IMHO. If you consider ID3 in a live ShoutCAST/Icecast context it looks more like timed metadata because the artist & track information changes throughout the lifetime of the presentation. The tags definitely only apply to certain parts of the timeline in that context. Another way to look at this is to view it like metadata in chained Ogg files. The info at the beginning definitely does not apply to the whole file.
(In reply to Silvia Pfeiffer from comment #12) > (In reply to Edward O'Connor from comment #0) > > Some media frameworks do not make raw metadata available, but only a parsed > > form (which we can expose on .value as proposed in bug 25353). Because UAs > > built on such media frameworks may not have access to the raw metadata, we > > should make .data optional. > > Is the use case here indeed ID3 tags? I am not aware of ID3 being a timed > metadata format - I always thought it's just providing file metadata at the > beginning of the file. That doesn't map into the text tracks model of > time-aligned cues IMHO. The proposal is meant to make it easier to consume any metadata format that a UA is able to parse. ID3 is one such format. An ID3 tag can be located anywhere in an MP3 bit stream, not just at the beginning or end. This means that an ID3 tags can show up at arbitrary times in an MP3 stream, which would seem to map nicely to what we have been calling time-aligned metadata. Apple's HTTP Live streams can also include timed metadata within the audio/video stream (whether live or stored). HLS supports inclusion of metadata in ID3 format. Shoutcast streams can also have timed metadata although the metadata isn't necessarily ID3 tags, the metadata blocks can be simple text strings with name-value pairs.
Let's discuss ID3 and metadata in bug 25354 . I think there's still the problem of making .data on the DataCue constructor optional. I'm considering this in particular in the context of [1]. I can imagine DataCue objects to be created e.g. from SSA/ASS cues (which are text) [2], but also from VOBSUB cues (which are binary data) [3]. I agree that we should get rid of one attribute. We could just drop the .text attribute and put everything in .data, including string, and expect the JS dev to convert as necessary. Seeing as .text was always for convenience (to do this conversion for the JS dev), we have probably just introduced more trouble than it's worth. [1] http://rawgit.com/silviapfeiffer/HTMLSourcingInbandTracks/master/index.html#webm [2] http://matroska.org/technical/specs/subtitles/ssa.html [3] http://matroska.org/technical/specs/subtitles/images.html
> We could just drop the .text attribute and put everything in .data, > including string, and expect the JS dev to convert as necessary. Seeing as > .text was always for convenience (to do this conversion for the JS dev), we > have probably just introduced more trouble than it's worth. If we go with your JSONCue object, then text data could be exposed that way.
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the Editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the Tracker Issue; or you may create a Tracker Issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Partially Accepted Change Description: https://github.com/w3c/html/commit/d5ec35198fea01e56b0ec38eedce58a57a013138 Rationale: It is true that DataCue with both a .data and .text attribute is overdefined. DataCue is not defined for exposing parsed metadata such as ID3 tags, but for exposing random timed data that may well be in binary format. It would be useful if somebody started writing a ID3Cue extension spec for the parsed metadata case. For interoperability reasons it would likely make more sense to actually list valid metadata name-value pairs rather than a generic JSONCue as proposed in bug 25353. Since this bug is about the DataCue, the committed patch resolves this bug.