W3C/SMPTE Joint Workshop on Professional Media Production on the Web

#Minutes of live session 3 –
Metadata, privacy, accessibility, and standardization roadmap

Also see the agenda, the minutes of session 1 and the minutes of session 2.

Present: Animesh Kumar (InVideo), Bruce Devlin (SMPTE), Chris Needham (BBC), Christoph Guttandin (Media Codings/InVideo/Source Elements) (he/him) (Christoph Guttandin), Daniel Gómez (InVideo), David Nessen, Dominique Hazael-Massieux (W3C), Donovan McClelland, Dr. Rachel Yager (W3Chapter), Ed Gray (Avid), Emmanuel, Francois Daoust (W3C), Gerrit Wessendorf, He Zhi (China Mobile), James Pearce (Grass Valley), Jo Kent (BBC), Julian Fernandez-Campon (Tedial), Karen Myers (W3C), Kazuhiro Hoya (JBA), Kazuyuki Ashimura (W3C), Kelly (CaptionFirst), Lin Li (China Mobile), Louay Bassbouss (Fraunhofer FOKUS), Marie-Claire Forgue (W3C), Paul Macklin (Vimond), Paul Randall (Avid), Peter Salomonsen (WebAssembly Music), Pierre-Anthony Lemieux (Sandflow Consulting / MovieLabs), Sacha Guddoy (Grabyo), Sebastian Tatut, Simon Thompson (BBC), Song XU (China Mobile), Spencer Dawkins (Tencent), Steven Liu (OnVideo), Steven Liu (OnVideo), Takio Yamaoka (Yahoo! JAPAN), Van (Vidbase), Wendy Seltzer (W3C), Wolfgang Heppner (Fraunhofer IIS), Xiaoqian Wu (W3C), Xueyuan Jia (W3C).

#Table of contents

Opening remarks
Agenda bashing
Metadata
- Preserving metadata in production workflows
- Processing workflow APIs
Accessibility
Next steps

#Opening remarks

See Opening remarks slides and transcript.

#Agenda bashing

Chris Needham: In this session, we're proposing to cover metadata in the media production process and we'll talk about overall architecture issue, accessibility, and we'll spend the last 30 minutes of the call summarizing and looking at the next steps of where we should be taking these and where we go in the future.

#Metadata

#Preserving metadata in production workflows

Related GitHub issue: Issue #27.

Chris Needham: We would like to spend the next 20 minutes talking about metadata. This was a topic that Bruce introduced in his presentation. Bruce, if you would like to say a few words, you introduced an interesting diagram showing the different levels of information, is it textual, binary information, is it asynchronous, is it metadata. Can we dig a bit on the detail around that and to think about what, in terms of a browser API context, what we should be looking at that's interesting.
... We talked previously about the metadata that's carried in the bitstream so I don't know if there is anything in there we want to expand on that. Interested to hear your thoughts.

Bruce Devlin: Let's give a minute outlining the problem as I see it, as we start to bring more and more devices into this Internet context, more and more of them emit metadata, in particular in the media production space. Metadata could be anything. It could be the focal length of a lens, that in itself is a difficult thing to define and I'm not getting into that level of detail today. It may be the GPS location of a cameraman running along with a camera. It may also be the pan and tilt, the velocity of the gimbal upon which the camera is sitting. The person holding the steadycam may be on the back of the truck with a GPS but that truck may be a trailer being towed and you may have that information that's towing the truck, the trailer, and the cameraman pointing it to the most important point on set, the actor. There is all of this stuff going on.
... What's interesting, when you shoot that scene, you know everything about it. When I look at metadata papers, I see people showing how clever it is to look at the video and figure out who is in front of the camera in that scene. We knew who was in front of the scene on set! We threw that metadata away. By the time you need it, going into the VFX pipeline or the distribution pipeline, you have to go and reinvent the metadata from scratch.
... That's the general context. There's a number of things that you may like to do with the metadata on set or if you're backhauling the metadata to do stuff with it. Firstly, you may want to visualize it in a more friendly fashion. The metadata coming out of the devices, it typically is one of four categories as you pointed out, Chris. It may be in text form, in binary form so that's the first context. That's important, it tells you about how you'll process it, what sort of processing engineering you need.
... The next axis that's equally important, the time axis. Some metadata, is it asynchronous? The clock is ticking somewhere, it may be the audio sample or video sample clock, it may be a crystal oscillator while on the gimbal on the truck and it is a crash detector and syndicator at the same time. So there's synchronous metadata versus non-synchronous and contextual versus non-contextual. When you know what quadrant the metadata is in, you have a good grasp of what data you have to put metadata around to put it in transport: WebTransport, WebRTC, SMPTE ST 2110 or some other random transport system to get it from A to B.
... The problem I'm trying to figure out, it is do we have to just consider the transport getting the metadata from where it was originated to where it needs to be visualized. Is there something that we should reconsider such as the format of the metadata? If you try to transcode the metadata, generally you destroy it.
... The first problem that I wanted to throw on the table here for the combined brain trust of W3C and SMPTE: is there something we could do, a bit of extra work to make something generally useful to get the data from where it is made to where it needs to be used? At that point, I'll stop talking.

Chris Needham: Is there anybody in the call who would like to respond or share your own point of view? In your experience, do you have an idea of what the web platform could do to actually solve this?

Bruce Devlin: I think part of the thing that has to be done is visualization of the metadata while it is happening and there may be many layers of that data needed to make it useful. I have spent a lot of my career working with people who are doing simple things like shot logging to figure out who kicked the football, who caught the ball, who is doing what at the same time. Those things are all really custom bits of software glued onto great huge monolithic pieces of software around the backend and they tend to be database driven.
... Looking at the wonderful stuff that W3C is doing with the transport, the logging, the insertion and extraction that's more generic, the generic web browser and generic tools that with the metadata visualization plug in with an API, you could make the generic logger something that's not just human assisted but possibly AI assisted, so don't know if it is a human assisting this or an AI or just... I feel there is something there but I'm not quite expert enough to figure out the shape of the thing that I think needs to be built.
... Maybe starting from a pure technology layer, WebRTC we have heard is one way to do live transmission of audiovisual data. Does that support channels other than audio and audio and video? Can you transmit the text point or the arbitrary data or is it limited to only the essence that's supported by codecs on the web platform?

Chris Needham: There is the data channel. You can imagine, on a conference call like this, we can share audio and video but we can also exchange in chat, so this supports that kind of information as well. As far as I'm aware, it does not provide any synchronization guarantees, it is a side channel to the audio and the video.

Bruce Devlin: In this problem space that's one of the important things that hasn't been done before. This is why I'm clutching a bit at the solutions. I think there are certain classes of metadata that have to be absolutely hard locked to the originating audio and video. They're a first class citizen in terms of the synchronization. There are other bits of asynchronous metadata for which it is a bit more loose and floppy with less guarantee. It certainly needs to be resynchronized at a later date, not at the point of emission but definitely re-synchronized after some sort of a delay for subsequent downstream processing.

Chris Needham: This makes me think of a couple of things. With WebCodecs, we have the ability to encode and decode the audio and video. If you're doing an encode, you also need to insert asynchronous metadata, if the metadata needs to go into the bitstream alongside the audio and video, then perhaps WebCodecs does not currently give you that capability.
... If the metadata goes inside of your media container format with the bitstream, then perhaps that's something that a library would do rather than the WebCodecs API and it raises the question of how do you implement that. Part of what we do in building the web APIs is to think about what are the capabilities that really only the browser can provide and web developers can build on top of that.

François Daoust: The document that Dominique shared on the chat is the use cases and requirement exercise that the WebRTC working group where through to discuss the next version of WebRTC.
... It includes some use cases about the synchronization issues that we mentioned actually in the past couple of live sessions as well. There are situations where you have data on the side and you need to synchronize it with some media stream. That includes captions, which we mentioned already, but may include other types of data, for instance for virtual reality environments.

Dominique Hazaël-Massieux: Just to add to this, this need of synchronizing data and media has been brought up a few times in WebRTC, I guess a big part of making it happen would be to see the strong demand. Use cases in the document that's being shown are is mostly linked to real VR/AR at this point but if there are specific use cases for the media production workflows that could emerge, I think it helps with getting it higher up the list.

Bruce Devlin: I think I can probably provide some use cases from my random post-it notes scattered around the office, if only there was a solution we could do this, trying to get a structured document, I can certainly provide things that people have contacted me about.
... You mentioned a few use cases that would be worthwhile digging into a bit. Analyzing what's in the media to extract the metadata from it is interesting. Some things we know upfront, who will be in the scenes, so you don't need to analyze.
... I guess there are cases where the media gets run through machine learning algorithms to detect and extract extra information and so one of the things that perhaps the web platform could provide is other hooks to enable you to insert your own analysis, for example into a media processing pipeline.
... Just mentioned one really important one, which actually the BBC is one of the leaders with along with the New York Times and Washington Post and various other news sources, is a thing called the C2PA, an anti fake news identifier and it is really important that it is synchronous with the content. If it is not synchronous with the content you can't tell the provenance of the news being shown. That's one that's really, really important that the synchronicity is kept pretty much to a frame level so that you can prove the source of not just the camera that generated the content but also what editing stages took place between then and now so that you know that if the stuff was fake, at least you know who faked it.

Chris Needham: We have had actually in W3C an initial introduction from the C2PA group. We haven't yet really dug into what the requirements are from a browser integration perspective but that's definitely an area that we're aware of.

Bruce Devlin: I would be certainly interested to follow up. I will give a use case that's very browser centric. I was talking to a designer of the C2PA and it came down to: what if somebody presses pause in the browser. You hit the pause on a keyframe, you have the keyframe, that's what codecs tend to do. If that's the only frames manipulated so that they were faked actually you have increased the statistical likelihood of somebody sharing a still from your video that was faked rather than the original, hence the synchronousity of the information, it has to be very, very good.

Chris Needham: A thing that interests me, you talked about this difference between stuff that happens in sort of monolithic databases and stuff that happens at a high level. What I'm interested in, what use cases we have for mutability of data which changes on a per-frame basis as opposed to data which has to be accurate, which can be considered generally unlikely to have the interval of change of frame by frame every so many milliseconds.
... Stuff that lives in databases and can be consumed by traditional APIs could still be frame accurate and doesn't scale well to being mutable on a per frame basis. A lot of the stuff that you said as being per frame accurate, is that a capture and is it considered immutable from point of capture onwards? It seems to me like there is still merit in considering these things as relatively distinct entities unless there is a lot of overlap where there is stuff that changes every single frame and is curated after initial creation, if that makes sense.

Bruce Devlin: It does. I'll give you one specific use case that actually I'm doing work on today, which is lens focal length, lenses are complicated. With physical lenses, if you change the focal length actually everything about the lens breathes, but the focal length reported by most lens metadata is wrong, because the secondary effects of the breathing of the internals of the lens moves the reference plain slightly different than through the lens thinks it moves from and it is a per lens characteristic thing that needs calibrating.
... If you're doing the effects stuff, you're pulling focus in the effects of other things around it, you want to actually match that second order effect, not the first order effect because it is visual otherwise. There is a group that I'm working with, trying to figure out, first, how do you capture it accurately enough because the sampling is not done in the framework, done faster than that. What the crew would like to do, it is to get that 100 kilohertz lens data and get the trajectories of the lens movement absolutely right, including the second order effects, specially for big lenses, and being able to put that in a VFX pipeline and resynthesize what the image would have been after this is applied. That's the pipeline I'm working on at the moment. That's a specialist use case which the lenses tell you about today.

Chris Needham: It would be interesting to have an offline discussion.
... I would like to come to Kaz and James to wrap up this part of the agenda today.

Kazuyuki Ashimura: The expected metadata for media production in general shall include various data like hardware, location, the location not only of the hardware itself, but the cameraman and so on, so the users and also the data about the codec and so on. Those various kinds of data should be clarified via collection of main use cases from studios.

Bruce Devlin: Absolutely. There are a couple of studios that have internal projects where they have lists of stuff and there is not that much collaboration going on that I'm aware of. I'm trying to get the collaboration going so that we can have one very big list that we can then start choosing metadata items from. I would be mostly concerned about the transport and the visualization of this data and the people onset are concerned with capturing it without hurting anybody.

Kazuyuki Ashimura: I'm working on the Media and Entertainment Interest Group and I work with other groups. We have other things related to IoT and that group is also working on metadata for IoT devices and broadcasting devices, production metadata may be also directed to IoT.

Bruce Devlin: I think you're right. Good observation.

James Pearce: The metadata that's significant will vary wildly, the shape of that and what metadata we're capturing would vary wildly depending on use cases. Is the idea to provide assistance with validating that data, making sure it is in the right shape or are we suggesting just that we'll store opaque data and we'll let the application deal with making sure that it conforms to some schema?

Bruce Devlin: I think it is a bit of both. At a high level, you know the children's toy with a square and a circle and a star and a rectangle, I have visualized that most metadata I have found that people want to use fits in to one of these four holes that I talked about. Those holes, they're text, binary, synchronous and asynchronous, but then within each one of the holes the metadata itself would probably follow a schema depending on the sort of metadata it is.
... I have this idea that if we were able, for each one of the four holes, to provide a framework, a methodology, an API to give you all the bare bones of being able to shunt stuff from A to B quickly, you may take more than 50% of the work out of the overall system and you are just left with the JSON, an XML or a validation exercise at either end. It is a gross over implication, I feel there is something there.

#Processing workflow APIs

Related GitHub issues: issue #28, issue #29.

Chris Needham: Julian, I'm wondering if you have your own perspective that you would like to share on this.

Julian Fernandez-Campon: I think it is complementary to what Bruce is saying. You manage and process the data, whether it is binary or text metadata. What I see is that we need to standardize this metadata at some point to be able to abstract different systems that are processing that metadata coming from the different applications, as was mentioned about the tools and content.
... One possibility for implementation, it is using this for all of the editorial metadata that can be IPTC. If you're dealing with other metadata, the thing is to standardize the metadata that you're using for the editorial and for all which is related with production and things.
... I think that also the ACES, the SMPTE ST 2065, it is a good option.
... What I mean, it is something like this
... [presenting slides]
... Imagine that you have a workflow and this workflow is doing some things and getting some media or something. We have one that's the analysis and in this case, we're using the analysis tool which is getting information about the codec's frame rate and many other things and it can be GPS information of things which are coming. For some reason, I want to change this one and instead of using this, I use this AI tool, called in this case Julio, and the use case that we're proposing here, it is that the workflow doesn't change. The information which is going to the next stage, which is the transcoder in this case, it is taking technical information, other information which is part of this analysis, or if we want to be using something else instead of this, we want to use for example the media info or for example we know we'll be receiving a comment card, it doesn't change.
... If we have standardized the metadata coming in the editorial metadata or the metadata that's technical, related to other things, we can do things like this, constructing the workflows with the transformations and the specific data model which is generating the information. That's what I wanted to show you. You have the profile, and now I'm able to be analyzing this.

Chris Needham: It is interesting. There are different groups that are interested in different aspects of this. I think that, in W3C, we tend to focus on the browser capabilities and so some of what you have spoken about in your talk was about having consistent APIs across different processing components.
... There are other places where this standardization happens and, like what Bruce said, how do we bring the collective expertise together so that we're not overlapping in terms of what each standardization group is doing but doing things that are complementary? There is a question in my mind about the role for W3C in terms of metadata standardization.

Julian Fernandez-Campon: This is an open question. As you say, there are many standards. W3C could for example standardize what I have been showing here. It will require, of course, more discussion. Again, this is one level up of the equal word limit, you have the database, how it is represented, it is more on that.
... If there is a way of standardizing this metadata. The other thing I want to show you, the other thing that was referred to, is the mapping between the different standards so for example here, you have the one that we have adopted.
... If you see, in the IPTC, the correspondence, again, this is one of the questions, if at some point the W3C could be proposing or suggesting a schema that could be IPTC or could be extending what is needed, depending on the type of content to this. Then we know all of the applications like this one for example that I was showing, it is what we have, didn't have any implementation from another vendor, you know that when you're going to be getting metadata from somebody in the browser, content, they know that they'll always have these feelings. The only problem would be translating these fields into the internal data model that they have or as I was showing, translating into the metadata which is going to be supported by the other system.
... This is what I wanted to throw in here to frame the presentation, to see what can be done for the standardization with the metadata at the higher level if it is possible to have more work on this. I think it is a big question. We may not be able to answer this here and now. It is interesting to follow this up with those relevant groups.
... Thank you for the opportunity, also happy to share thoughts. If someone thinks that it doesn't really work, we have to think in another way so we're totally open, of course.

Chris Needham: This has been a great discussion. We are a little bit over time and perhaps we should have allowed more time for each of the topics.

#Accessibility

Chris Needham: Given that we want to start our wrap-up conversations and leave 25 to 30 minutes at the end, what I would like to do is to change topic and look at accessibility. We had a great presentation submitted from Ed Gray and I wanted to follow up with you, Ed, and your talk.
... It gives a great motivation for looking at accessibility and there are two aspects here, one is that we want to be able to provide tools to create content that's accessible to audiences and users in general. It struck me that the accessibility of the tools themselves as we move to more browser based production workflows, needs consideration. What are the design considerations we should be taking into account as we build web based production tools?

Ed Gray: I want to thank you for inviting me, giving me a seat at the table for accessibility, it is an important topic. For those that don't know, I'm blind myself, I'm Director of Accessibility for Avid and we're trying to drive accessibility to our creation tools, pro tools for audio, media composer for video, all of our storage and media asset management products.
... It is a good question. I would be lying if I said I knew exactly where to wedge this discussion in. I would say if we're talking programming web interfaces for accessibility, the ideal is that anyone with any disability can walk up, use the same tool, that you don't have the blind version of this and the hearing impaired version of that with the captions, but you want to have one tool, you want to build accessibility features into it.
... I think most people start with a combination for visually impaired users. These are a small portion of the populations, probably 0.75% of the addressable user base, who can't see a screen well enough to do any work so you need to do something and you want to announce gestures, actions, caption elements of the screen, so the screen reader can interpret them. The way to start exploring that, I think there is a part of W3C organization, the web content accessibility group, and they have guidelines for things that in their own words should and must be accounted for to make your tools accessible.
... There are things like contrast levels, the amount of space between lines, having supporting tab ordering so that you can, e.g., hop between headings, someone with no vision has a fighting chance of identifying what is on the screen when they launch the screen. I would say it is an ongoing effort. There is never some point where you're finished because the platforms always evolve and people want to make their programs work with the latest platforms and OSes.
... The way to do this, if you want to do a good job of it, you need someone in the company that's web accessibility specialist that knows how to do the programming, driving the accessible program as far upstream in the definition and planning process as possible for the tools.
... Depending on the country you're in, it becomes something ranging from a nice thing to do to something that is illegal not to do. I think more and more you will see pressure from a legal standpoint that we have to make these tools accessible, to not make them accessible is a form of discrimination basically.

Chris Needham: As you mentioned, the W3C's content accessibility guidelines, it is a very strong area of development within the W3C obviously. It make me wonder whether you have looked at the new version 3 guidelines that are being developed, and if there are particular requirements that you see should be taking into account in that process?

Ed Gray: That's a take home topic I could follow up on. I can't distinguish really what's really new in WCAG 3 that affects people generally or immediately. I know that currently most people standardize on level 2.3 and 2.4 and there are levels A, AA, AAA levels of accessiblity people standardize on.
... There are contrast levels that you can measure with the contrast checking tool and this is to make the interface perceivable, understandable, robust. Somebody could see it, work with the elements and know what's on the screen, and it is a lot to do, it is practically an impossible burden to keep on top of all of those things all the time.
... If you're an organization that has a dedicated team, because you could show that accessibility is a first class citizen in your development process such that new versions get tested for accessibility with every release, you have a place to answer questions for customers about accessibility, showing a track record of accessibility improvement, you're doing pretty well.
... Even then, someone is going to come along and sue you. There is a lot of nuisance lawsuits and things that go on. Good if you can show the track record of accessible development.

François Daoust: You mentioned WCAG. The accessibility group is also developing media accessibility user requirements, but I think it is more oriented towards media playback scenarios. Here we're looking at media processing. I'm wondering if there are specific accessibility requirements when you develop media authoring tools.

Ed Gray: I can send around some accessibility guidelines. You will see that the WCAG guidelines are extensive and very much cover the authoring side of the house. The other thing I want to mention there is VPAT, a voluntary product accessibility template, and that's something that a company can use to self report on how accessible their products are. These things are not a walk through the park, they're very detailed documents for accessibility and I would say that they're full of ideas and maybe they intersect with authoring tools.
... With authoring tools, it is a 1,000 mile journey, you have to announce gestures you make. You want to announce when you connect a camera, that there is a certain type of data coming in so that, at some point, when you're ready, you hire visually impaired beta testers and without sighted appearance they work on the authoring tools because they hear what's going in and out and the types of data they're working with.
... The MacOS and Windows platforms themselves provide for a lot of accessibility, I'm not sure that if you have tools for that.
... An example is an Adobe Flash movie, where the text is essentially not accessible. You have to approach that a different way. The other way of saying that the WCAG guidelines and the VPAT guidelines I'll share with the group have lots of ideas and they're very applicable to all types of tools.

Christoph Guttandin: Sometimes I get asked by people why we need media applications on the web, for example, why do we need to draw on the web if they're already a couple of applications on the desktop and I reply that, what you take advantage of, among things that the web has, it is generally more accessible. Is that actually true?

Ed Gray: As far as the population developing accessibility tools and ideas, there is more work being done on the web than anywhere else. It is more reliably up to date than people's authoring tools. The irony of the web is that when you make the web great for sighted people, you make the accessibility programming many dimensions more challenging. You have a website, you have a transparent curtain you drag across the screen, all of the letters come cascading down, then this thing comes up, you have all of the animations, stuff like that, your accessibility tools don't know what hit it.
... It is preferable to have an accessible website with a clearly organized structure consistent across different pages. That's a challenge as a new operating system and new styles of programming coming along. The web has these tools and as people get more advanced to make the experience richer for sighted people, it makes it harder for the accessibility programming to keep up.
... I want to point out that I am blind. This is not just the blind guy show. You have to think about captions for the hearing impaired, also emphatically things for motor challenges and generally if you make the tools controllable by a keyboard, more adaptive surfaces, more adaptive processes for people with physical impairments. And then the other topic is cognitive impairment, that's a thing, if you have someone with anxiety, schizophrenia, autism, you make interfaces readable for them. You make things so that they last more than three seconds at a time so people don't have convulsions reading the interfaces. It is not just captioning elements so screen readers can use them.

Rachel Yager: I have two questions, rather observations and trying to see if there are ideas or suggestions. First, I had the honor to work with a few people in presenting who has disabilities, either hearing or vision impaired. That is my understanding of things that I do not see. I'm not challenged, even though I wear glasses, sometimes it is difficult for me to get all of the tiny stuff and pushing all of the buttons on the screen, so I imagine it is much more difficult in terms of presenting as someone with disabilities. The first question, you have the standards, sometimes when I look into the popular generated content, I mean, for me, I prefer to look at that, because it is so much easier to browse it through. Is it possible to go from the design point of view to make it good for everybody, not just, you know, for disabilities, and for the handicapped or whatever disabilities there are, but for everybody?

Ed Gray: From a design standpoint of view, if you make it useful for everybody, I think that adoption, the way to design it, it is more faster in a way because it benefits everybody, not just key disabilities. This is the first thing.

Rachel Yager: Is it possible to have a design point of view to make it not just an accessibility standard just for a certain group of people but for everybody so that the adoption would be faster and the consideration to use it is faster?
... The second question, when publishing, e.g. on YouTube, I was impressed with the authoring capabilities for media. When I put on the video, on the closed caption, but I realized that it is AI generated, I realize with music overlay on it, I cannot get closed captioning automatically generated which leads me to ask in our discussion, is there some design that we talk about adoption in the mass media in terms of social media? YouTube is the popular one.

Ed Gray: I'll answer the second question first. I don't know the answer, I haven't investigated that. Let's look at that together offline. That's very interesting and Avid makes tools for processing media, we should know the answer to that.
... Now, on your first question, the way to make the tools most usable by the greatest number of people is to pursue universal design principles. That's something else. We talked about the WCAG and all of the things to make accessible, that's one gift we can give ourselves. Another, to look at how people self report on these things.
... The other area to look at, universal design. Universal design was originally invented I think in the 50s for physical buildings and they had 7 principles of universal design, physical space, you know, making things easily readable and accessible and not having a lot of physical effort to access things. Someone decided to apply this to web and software development.
... Now there are 7 principles of universal design and emphatically they're used in learning and education and so you have a classroom, you have people with different abilities and enjoying the same learning experience, so you could go to Wikipedia and see the seven principles and just to understand where they're coming from.
... That's the idea, accessibility, it is just usability for people with disabilities and then universal design says that you want one robust version of your authoring tool or application, everybody can walk up and use. When Stevie Wonder talks about accessibility at the Grammy's, he's like I want everything accessible for everyone all the time, he's saying, you know, people even have different levels of blindness. I have retinopathy and color blindness so there are different levels and different types of disabilities. We want to design something and have a diverse community of beta testers to kick the tires on it so that things aren't being designed by some empathetic group of designers at your company but they're designed by people that use the tools and point out real world deficiencies.
... The quick answer is universal design, it is written about extensively, the idea that it is going from universal design for physical plants to universal design for software, web development. That's where you start full stop. We want to make a robust tool for everyone.

Rachel Yager: I think I just want to add on, it is impossible for us, like for example, my own experience in terms of presenting together with visual impaired and hearing impaired people, it is that I didn't realize the challenge they have in presenting.
... I got a glimpse of that because I was rehearsing and it is impossible to know all of the challenges for me. In fact, to check, having a checklist of things to do, the checklist, it is for content. I just want to share this, my own experience and I think this checklist of accessibility should not be at the bottom of the list. It should be at the top of the list.

Ed Gray: Let's talk more about it when you have a chance.

Kazuyuki Ashimura: Thank you very much. I simply provided supplemental information about WCAG 3.0a, the latest working draft and also ATAG 2.0, the authoring guidelines. You have the URLs in the chat, and there is information on accessibility guidance. We should look into them. One more information, it is there will be a workshop on smart assistance and voice interaction. We will discuss accessibility for the web in general.

Ed Gray: Can I come to that?

Kazuyuki Ashimura: Sure. Please!

#Next steps

Slides: Closing remarks (PDF version)

Slide 2 of 20

François Daoust: I don't know if it is worth digging into too many details as to how the different W3C groups are organized. We have different types of W3C groups, some are going to be more about incubating technical solutions and some about refining use cases and requirements and then you have the actual standardization group which is the Working Group.

Slide 3 of 20

François Daoust: We mentioned that we should come up with a list of groups. This is a list of W3C groups that are media groups with additional ones that are useful to look at as well as given the discussions we have had so far and we should add another one, which is the accessibility group as well.
... The list includes real working groups developing standards, for instance the Media Working Group developing WebCodecs. In the final report of the workshop, we'll include plenty of links so that people can find themselves, can know where to look and where to go to find and file issues.

Slide 4 of 20

François Daoust: SMPTE is organized into technical committees, I think. I don't know if, Bruce, you can quickly detail how this works.

Pierre-Anthony Lemieux: Bruce had to step out, I can present really quickly. It is very similar to W3C and all standard organizations. They have different names for groups, but the work is organized in Technical Committees covering different areas and when a new work item is brought, it is either assigned to an existing Technical Committee or a new Technical Committee that's created.
... It is rare but you have the option of creating more flexible groups like task forces for short term kinds of investigation and study groups and so really, just like W3C, several groups so it is really straightforward to start new projects and collaborate with other organizations.

Slide 5 of 20

François Daoust: What do we want to get out of this? Of course, discussing the topics, we know that we're not going to solve topics directly in the small issue on GitHub or within an hour. We want to understand what we should be doing next.
... One option is to do nothing. That's the easiest way, you have an issue, you don't think it is worth doing anything about it, or on the contrary, it is worth doing something about but it is already being taken care of by some group so you don't have to do anything. That's a possibility.
... Then you have a bunch of options which are really about providing input, trying to come up with a better understanding as to what needs to be done. It can be providing input directly to the relevant working group, which means that you have the specific need on an API that's being developed by that group, and the group is considering the API function that you need but they need first to understand the need precisely, so they need some proof, evidence that it is actually going to be used and how. Data quantification, some data on what needs to be done, etc. That can be individual feedback that you provide for GitHub issues. If needed, we can gather in some sort of a group, it would be the Media & Entertainment Interest Group to provide coordinated feedback.
... Another possibility to explore is feasibility at the application level, and to explore a bit more the work before going to the group. There are a number of topics we have discussed in the first couple of sessions. What I heard from browser vendors that were present for some of them is: we're providing low level building blocks via the APIs, we could add more, but we believe that what you're looking at could be addressed directly at the application level. We realize that this may not be easy, may not be convenient, and there is a performance penalty that you have to pay if that needs to be done at the application layer. For us, there needs to be more investigation as to what could be done already with the existing low level API and whether it is enough or whether something else needs to be added to the APIs. We had that discussion about muxing and demuxing with WebCodecs typically.
... There is something that could be done in the Interest Group, which could serve as coordination point to develop perhaps some open source library or to review code, existing libraries, and otheriwse quantify the problems.
... Similarly, you may need to gather community support when you have a need and it is not clear that the need is shared across the media industry. The Interest Group can help to do that as well.

Slide 6 of 20

François Daoust: In some cases we discussed issues where it is not obvious that spect change would be needed. It is more of an implementation thing that you want to converge on the same things being implemented everywhere. In such cases, one way to improve the situation is to document the state of the art, document the needs, track implementation progress, develop tests. Thinking about WebCodecs for instance, the support that we discussed, it is not clear that there is much that can be done from a spec perspective. It is clear that there is a need that more codecs get supported or exposed.
... Final options are of course to incubate the standard and develop the standards for real.

Slide 7 of 20

François Daoust: There is no way that a standard will be developed if people don't step up. Standards are not developed in a vacuum, we need people willing to drive the work, we need people to step up. None of the actions, none of the issues, the proposals that I have in the following tables are going to work if no one steps up. In other words, we're not offering to do the work for you. We need you as a community, to help shape the future of standards.

Slide 8 of 20

François Daoust: We reviewed the different topics and in these slides, we have the different topics listed. We need to go fast because we only have 10 minutes left. We talked a lot about WebCodecs. There are a bunch of specific topics that we discussed for WebCodecs. I won't go through each of them. As for API support in Workers, for me, the next step is do nothing, as it is already ongoing. It may take time but it is ongoing.
... We talked about muxing and demuxing, and also about exposing SEI metadata for instance and it seems more exploration of that space is needed. These were the topics for which we heard the "doable at the application layer" feedback. Browser vendors will explore that space themselves but note that this can be done without their help. Perhaps it is a good use of your time as a community to investigate the space and collect data so that you can inform implementors about what is and is not practically speaking achievable.

Slide 9 of 20

François Daoust: We mentioned supporting professional codecs, support for higher fidelity audio. I don't know if there is reaction there in the audience now on some of this. In the report, the goal is to go through that list and propose these next steps. We will explain what group is responsible for which topic. Some work could be done in the Media Working Group, or in the Media & Entertainment Interest Group as a way to collect feedback and discuss refined needs.

Pierre-Anthony Lemieux: Before going in details, one of the things that comes to mind, maybe we ought to discuss is there are many, many threads, we talked about many different APIs across many different groups.
... How do we bring all of the threads together in a single place which we have done in the past three days? Going forward, how do we keep that discussion going without having it split and scattered across so many different groups?

François Daoust: The practical immediate document is going to be the workshop report.
... But part of the answer is in the next steps. To explore the suggestions, the Media & Entertainment Interest Group seems to be the right place to crystallize future discussions, meaning to bring discussions that spans multiple groups, using the interest group as a coordination point.
... The Interest Group is restricted to W3C members. The repository of the workshop is going to stay on. Issues are listed there, they can continue their own life and serve as focal point for some of these discussions as well. Linking has proven useful to create the web, and can be used as well to converge discussions between different groups and making sure that anyone can participate and nothing is lost in the process.

Slide 19 of 20

François Daoust: you can envision other settings. One of the open questions that I have in the list, in that final slide, is: what's the story between W3C and SMPTE moving forward? Does there need to be a collaboration between the two organizations? How do we achieve that if we need with the topics at hand?

Pierre-Anthony Lemieux: On that last slide, there is restricted membership as you have pointed out, let's say there is a group that starts to touch the web, and I think saying "go attend the Interest Group", that's a high bar.
... Should we look into exploring setting up a joint group between the organizations to make exchange easier? Do we do that ad hoc? Any time there is a question, we just talk and maybe that's something that I would like to hear feedback from folks in the room if this is something that we should consider, a common group between the two organizations. Maybe that's a group we don't need.

François Daoust: If we do, we can think of different solutions. We have groups between different organization, the Spatial Data on the Web Working Group comes to mind, it is a joint group between the consortium and the OGC. Different area but an example of where we managed to do that.
... We have, of course, an ongoing liaison with SMPTE, which may be enough as well.

Kazuyuki Ashimura: We have a liaison and so we can just simply invite them or some of them for example to the calls and to have a collaborative discussion there. There is no problem.

François Daoust: In my experience with standardization, the final issue is not going to be membership fees, but more willingness and availability of people to work on the tasks at hand. People want results. Standardization unfortunately takes time and dedication to document needs, convince people, gather a community and so on.
... The actual place where this happens could be a Community Group, it could be the Media & Entertainment Interest Group, etc. The good thing about it being the Interest Group is that it already exists and already has good coverage in terms of companies being present.

Kazuyuki Ashimura: We should ask people who are very interested in these topics to continue to participate in the discussion. We can think about which group would be the best collaborator with them and we can start some initial discussions within the Media & Entertainment Interest Group with stakeholders, including people from various organizations here.

Chris Needham: The thing I would like to know from participants in the workshop, what are your views? Do you think we should continue these conversations? I think we can propose solutions around which group we use and all of this.
... I think what would be interesting for me to understand is how much appetite there is to dive into some of these follow up conversations.
... Summarizing what I have heard over these three session, some things are about browser implementations, some are about potential spec changes, some are about new areas to explore, like we have heard today about the metadata for example. Doing all of that requires participation and support from the people who are interested in those particular topics.

François Daoust: We're reaching the end of the session. Concretely speaking, what I suggest is to pass around the tables in the slides to the workshop participants and survey what you think next steps should be for each of the topics and whether you would be able to help with it. The goal is to assess interest and willingness to work on some of these topic, first because we need that to prepare a useful report out of this workshop and second because that is actually the first step in the list of next steps!
Things can't progress if you don't participate. That's the key message.

Pierre-Anthony Lemieux: It is a good test actually.

Chris Needham: It is a good approach.

#Minutes of live session 3 – Metadata, privacy, accessibility, and standardization roadmap