13:55:54 RRSAgent has joined #me 13:55:58 logging to https://www.w3.org/2023/06/06-me-irc 13:55:58 Zakim has joined #me 13:56:08 Meeting: Media & Entertainment IG meeting 13:56:12 scribe+ cpn 13:56:46 Agenda: https://www.w3.org/events/meetings/184d4a81-f7e7-4984-ab2e-8f40e889d558 14:00:02 tidoust has joined #me 14:01:08 igarashi has joined #me 14:01:16 present+ 14:01:24 scribenick: Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka 14:01:27 present+ Hisayuki_Ohnata, Kinji_Matsumura, Ryo_Yasuoka, Tatsuya_Igarashi, Chris_Needham, Kazuyuki_Ashimura 14:01:30 ohmata has joined #me 14:01:34 chair: ChrisN, Igarashi 14:01:56 present+ Andreas_Tai 14:02:23 present+ Chris_Lorenzo 14:02:25 chair+ ChrisL 14:03:01 present+ Ewan_Roycroft 14:03:02 agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2023Jun/0000.html 14:03:07 rrsagent, make log public 14:03:11 rrsagent, draft minutes 14:03:12 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:04:29 scribenick: cpn 14:05:15 atai_ has joined #me 14:05:19 rrsagent, draft minutes 14:05:20 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:05:59 nigel has joined #me 14:06:24 Topic: Introduction 14:06:40 ChrisN: Main topic is DAPT 14:06:51 ... Some discussion on the group charter with the APA WG 14:07:22 ... expected collaboration on media requirements 14:07:33 ... discussion during TPAC in September 14:07:54 present+ Francois_Daoust 14:08:01 ... Charter should go to the AC soon, I hope 14:08:11 i/expected/scribenick: kaz/ 14:08:12 Kaz: I'll bring it to the team strategy meeting next Tuesday 14:08:18 i/Charter should/scribenick: cpn/ 14:08:44 scribenick- Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka 14:08:47 rrsagent, draft minutes 14:08:48 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:08:54 present+ Francois_Daoust 14:09:28 present+ Nigel_Megitt 14:09:29 rrsagent, draft minutes 14:09:30 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:09:40 Topic: Dubbing and AD Profile of TTML2 14:09:56 Nigel: No slides, but here's the document 14:10:08 ... This is a requirements document 14:11:39 ... This is work on an exchange format for workflows for producing dubbing scripts and audio description scripts 14:11:58 ... It defines in an exchangeable document the mixing instructions to produce a version of the video with AD mixed in 14:12:06 i|No slides|-> https://www.w3.org/TR/2023/WD-dapt-20230505/ Dubbing and Audio description Profiles of TTML2 WD 14:12:09 rrsagent, draft minutes 14:12:10 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:12:16 ... Wanted to get this work going for a number of years, we had a CG with not enough momentum 14:12:31 ... Combining the AD and dubbing work created more momentum 14:13:06 ... This is a profile of TTML2. TTML2 provides tools for timed text documents, it has lots of features, e.g., styling for captions or subtitles 14:13:32 ... We don't use those styling features particularly, but the semantic basis of TTML, to make the work of creating this spec a lot easier 14:13:44 ... The use cases are in the requirements document from 2022 14:14:08 ... The spec document is on the Rec track, in TTWG, intended to meet the requirements in the requirements doc 14:14:47 ... The diagram shows where it fits in. An audio program with key times. Then you might be creating a description of images, a script that describes what's in the video at relevent times 14:14:58 ... Or you might transcribe then translate it, to create a dubbing script 14:15:09 ... In both cases, you have a pre-recording script 14:15:19 ... You might record it with actors, or use text-to-speech 14:15:25 ... Then create instructions for mixing 14:15:53 ... You end up with a manifest file that includes everything spoken and times, with links to recorded audio and mixing instructions 14:15:59 q? 14:16:45 ChrisN: Is this mainly used in a production setting, what about for playback in a client? 14:17:08 Nigel: In dubbing workflows, what localisation teams do in the production domain is send the media to someone to do the dubbing 14:17:34 ... The script is useful when making edits. They also send it to a subtitling house, to create a subtitle file that would be presented to the audience 14:17:49 ... The language translation differs significantly between dubbing and subtitles 14:18:16 ... If the words don't match, it's terrible. Cyril Concolato is the co-editor, he described having this experience 14:18:32 ... He couldn't have the subtitles with the dubbed audio, not a good experience 14:18:43 ... Once you have the translated timed text, you can send it to be turned into subtitles 14:19:01 ... Then it's about changing styling, showing the right number of words at a time, shot changes, etc 14:19:26 ... Because they have a common source for translation, you didn't have to pay to get that done twice 14:19:52 ... If you have the script and mixing instructions available, in cases where people can't see the image, you can have client side rendering of the AD 14:20:08 ... That allows you to change the relative balance of AD and programme audio 14:20:23 ... If you have the text available, you don't have to render as audio, it could be a Braille display 14:20:40 ... They get the description of the video through the fingers, and dialog through their ears 14:20:41 s/Ohnata/Ohmata/ 14:20:43 rrsagent, draft minutes 14:20:45 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:20:57 ... There's cognitive load using AD, to distinguish the description from the dialog 14:21:30 ... Some people can't hear at all, so having all the video description available as text would help people who can't see or hear 14:21:44 ... Then you get a reasonable description of the entire program 14:22:28 Nigel: We have a working draft document, steadily being updated 14:22:55 ... The DAPT is the spec document. The intent is the user can understand how it works without being expert in TTML 14:23:29 ... We use TTML for the underlying structures. We expect it to be useful for tool implementers and player implementers, rather than creating by people in a text editor, which would be hard work 14:24:01 ... Transcripts of pre-existing media and Scripts for media to be created 14:24:10 ... Let's look at some examples 14:24:23 ... It's an XML document with metadata in the head, and body with main content 14:24:45 ... You could create empty in the body. TTML has ideas in common with HTML, like head, body, p 14:25:19 ... From an AD, you have a p with start and end time, describing something in the video image 14:25:38 ... Care has been taken to be clear about language and the source of language. Important to know what state we're in 14:26:16 ... It uses the xml:lang tag, content profile designator. Example of a transcript of original language audio in the programme 14:26:28 ... In the AD example, the source of the language is original, so it's not a translation 14:26:47 ... If I create an audio recording of this description, call it clip3.wav. I can refernce it with an audio element 14:27:11 ... So there's a paragraph with an animated gain value, going from 1 to 0.39 14:27:37 ... This is commonly used in AD, to lower the programme audio volume before the AD enters, then return the gain to 1 14:28:13 ... Another example, we use a tta:speak attribute to direct a player to use text to speech 14:28:42 ... You can include the audio as base64 encoding. Challenge of identifying WAV audio as it doesn't have a proper mime type 14:28:51 ... You can send them to a player for playback 14:29:14 ... For dubbing, there's metadata for character names (and actor names), using existing TTML 14:29:46 ... Once translated, the script type changes in the metadata, to show it as a translation 14:30:08 ... The original can be kept, which is important to be able to refer back. Or for multiple translations 14:30:44 ... A Pivot language to go from Finnish to English, then English to Hebrew, for example 14:31:18 ... If you get words translated strangely, you can go back and adjust 14:31:28 ... Get lip-sync timings right 14:31:43 ... The single document gets worked on, updated to reflect the status 14:32:07 ... As it's an open standards, as opposed to non-standard or proprietary, we hope to create a marketplace for interoperability between tools 14:32:17 ... That's the benefit to having a W3C spec 14:32:17 q? 14:33:22 Nigel: We have a data model (UML class diagram) in the spec 14:33:58 ... A DAPT script has metadata, it can contain characters and styles 14:34:23 ... We may remove styles applied to particular characters. Debate whether it needs to be in the document 14:36:14 ... script events contain timed entities. The three main things are: script contains text, contains events 14:36:14 ... You can apply mixing instructions, audio recording or text to speech 14:36:14 ... Those are the main entities. The rest of the spec describes them in detail 14:36:49 ... It explains how the data model maps to the TTML, e.g., a
element with xml id 14:37:07 ... A p element represents text. You can have styles and metadata 14:37:38 ... You can have audio styles to trigger synthesised audio, describe if original language or translation 14:38:55 ... The audio is designed to be implementable using Web Audio - not the text to speech, the Web Speech API isn't a Rec, and problematic to use it here as typical implementations can't bring the speech into a Web Audio context 14:39:08 ... It goes out directly via the OS 14:39:28 ... But there are other ways to do it, e.g., online services where you post the text and they return audio 14:39:41 ... There's a conformance section that describes formal stuff 14:39:43 q+ 14:40:08 ... Because it's a TTML2 profile, there's formal stuff in the Appendices on what's required, and optional, and extension features 14:40:54 ... TTML2 allows extensions, as "features" of specifications. This is helpful for designing CR exit criteria, as we know the features 14:41:35 ... So conformance criteria is easy generate, and tests 14:41:41 q? 14:42:10 -> https://www.w3.org/TR/dapt/ DAPT 14:42:18 -> https://www.w3.org/TR/dapt-reqs/ DAPT-REQS 14:42:49 Nigel: We're in working draft now, and next stage is Candidate Recommendation 14:43:09 ... So getting wide review is important. So it's the perfect time to review and give feedback 14:43:27 ... I think it's in a reasonable state to do that 14:43:47 ... If you have questions or comments, please open an issue in GitHub, or comment on any open issues 14:44:04 ... You can email the TTWG mailing list too, or directly with me or Cyril 14:44:44 ... Next steps, we'll respond to all review comments, we'll be starting horizontal review soon, and sending liaisons to groups in the TTWG charter 14:45:49 ChrisN: Who are the external groups, and which companies? 14:46:20 Nigel: Some tool suppliers have been very positive about having an open standard format. They have a common problem in how to serialize their work 14:47:03 ... If an organisation like BBC or Netflix wants to specify an AD or dubbing script as a deliverable from a provider, they can require the standard format 14:47:18 ... The alternative, ingesting spreadsheets or CSVs is painful 14:47:51 Kaz: Thank you, this is an interesting and useful mechanism 14:48:27 ... I'm wondering about the order of sentences. Japanese has a different ordering, subject, verb, object 14:49:00 ... So if we simply dub or translate English to Japanese, subtitles tend to split into smaller chunks, subject first 14:49:13 ... So how to deal with that kind of word order difference? 14:49:53 ... Second question, could work on SSML as well as the current text spec, so TTML engine can handle the text information well 14:50:19 Nigel: On the second point, there's good advantage to using SSML. We have an issue (#121) 14:50:30 ... It's a challenge to mix in with the audio content 14:50:55 ... Somebody else working on a similar problem, spoken presentation in HTML, has found the same issue 14:51:13 ... They considered different ways to proceed; multiple attributes or a single attribute 14:51:27 ... https://www.w3.org/TR/spoken-html/ 14:51:37 ... I'd be interested to know how to do that well 14:52:06 Kaz: I need to organise a workshop or event on voice. This should be good input, so please let's work with the strategy team on that 14:52:13 Nigel: Yes, that would be good 14:52:33 ... On your first question, some things are covered by TTML, e.g., different writing directions and embedding 14:52:45 ... Japanese language requirements like rubies and text emphasis can be done 14:52:59 ... But your point is more about the structure of the language itself 14:53:21 ... We didn't feel the need to say anything, as it just contains whatever the translator generates 14:53:31 ... Is there more we need to do? 14:54:00 Kaz: Don't know, maybe you can use the begin and end attribute to specify the timing of the Japanese word to be played 14:54:05 Nigel: Yes, you could do that 14:54:32 Nigel: Timed text elements inside these objects is permitted. p elements can contain spans, and spans can have timing 14:54:49 ... That's important for adaptation 14:54:51 q? 14:54:55 ack k 14:55:41 rrsagent, draft minutes 14:55:42 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 14:56:21 ChrisN: This is really good progress, now ready for review 14:56:52 q+ 14:56:59 Nigel: I see people from Japan are here, if you have particular use cases, I would be interested in your review feedback 14:57:10 ... Some things may not be obvious to me as a non-Japanese speaker 14:57:26 Kaz: We have people from NHK here today 14:58:03 ... NHK Open House was on Sunday, some are working on sign language synchronised with TV content, they may be interested in this mechanism and modalities like sign language 14:58:53 Nigel: What may be attractive, is it is extensible, you could present translation text to a sign interpreter 15:00:48 ChrisN: Also could be interseting from an object based media point of view 15:00:55 ... Anything to follow up? 15:01:16 q+ 15:02:01 ack k 15:02:09 topic: Zoom call for the July meeting 15:02:33 kaz: We need to switch to Zoom, so I'll allocate one for the July meeting 15:02:35 [adjourned] 15:02:45 rrsagent, draft minutes 15:02:46 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 15:03:25 i/We need/scribenick: kaz/ 15:03:25 rrsagent, draft minutes 15:03:26 I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz 15:05:56 rrsagent, bye 15:05:56 I see no action items