IRC log of me on 2023-06-06

Timestamps are in UTC.

13:55:54 [RRSAgent]: RRSAgent has joined #me
13:55:58 [RRSAgent]: logging to https://www.w3.org/2023/06/06-me-irc
13:55:58 [Zakim]: Zakim has joined #me
13:56:08 [cpn]: Meeting: Media & Entertainment IG meeting
13:56:12 [cpn]: scribe+ cpn
13:56:46 [cpn]: Agenda: https://www.w3.org/events/meetings/184d4a81-f7e7-4984-ab2e-8f40e889d558
14:00:02 [tidoust]: tidoust has joined #me
14:01:08 [igarashi]: igarashi has joined #me
14:01:16 [igarashi]: present+
14:01:24 [kaz]: scribenick: Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
14:01:27 [cpn]: present+ Hisayuki_Ohnata, Kinji_Matsumura, Ryo_Yasuoka, Tatsuya_Igarashi, Chris_Needham, Kazuyuki_Ashimura
14:01:30 [ohmata]: ohmata has joined #me
14:01:34 [cpn]: chair: ChrisN, Igarashi
14:01:56 [cpn]: present+ Andreas_Tai
14:02:23 [cpn]: present+ Chris_Lorenzo
14:02:25 [cpn]: chair+ ChrisL
14:03:01 [cpn]: present+ Ewan_Roycroft
14:03:02 [kaz]: agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2023Jun/0000.html
14:03:07 [kaz]: rrsagent, make log public
14:03:11 [kaz]: rrsagent, draft minutes
14:03:12 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:04:29 [kaz]: scribenick: cpn
14:05:15 [atai_]: atai_ has joined #me
14:05:19 [kaz]: rrsagent, draft minutes
14:05:20 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:05:59 [nigel]: nigel has joined #me
14:06:24 [cpn]: Topic: Introduction
14:06:40 [cpn]: ChrisN: Main topic is DAPT
14:06:51 [cpn]: ... Some discussion on the group charter with the APA WG
14:07:22 [kaz]: ... expected collaboration on media requirements
14:07:33 [kaz]: ... discussion during TPAC in September
14:07:54 [tidoust]: present+ Francois_Daoust
14:08:01 [cpn]: ... Charter should go to the AC soon, I hope
14:08:11 [kaz]: i/expected/scribenick: kaz/
14:08:12 [cpn]: Kaz: I'll bring it to the team strategy meeting next Tuesday
14:08:18 [kaz]: i/Charter should/scribenick: cpn/
14:08:44 [kaz]: scribenick- Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
14:08:47 [kaz]: rrsagent, draft minutes
14:08:48 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:08:54 [cpn]: present+ Francois_Daoust
14:09:28 [kaz]: present+ Nigel_Megitt
14:09:29 [kaz]: rrsagent, draft minutes
14:09:30 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:09:40 [cpn]: Topic: Dubbing and AD Profile of TTML2
14:09:56 [cpn]: Nigel: No slides, but here's the document
14:10:08 [cpn]: ... This is a requirements document
14:11:39 [cpn]: ... This is work on an exchange format for workflows for producing dubbing scripts and audio description scripts
14:11:58 [cpn]: ... It defines in an exchangeable document the mixing instructions to produce a version of the video with AD mixed in
14:12:06 [kaz]: i|No slides|-> https://www.w3.org/TR/2023/WD-dapt-20230505/ Dubbing and Audio description Profiles of TTML2 WD
14:12:09 [kaz]: rrsagent, draft minutes
14:12:10 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:12:16 [cpn]: ... Wanted to get this work going for a number of years, we had a CG with not enough momentum
14:12:31 [cpn]: ... Combining the AD and dubbing work created more momentum
14:13:06 [cpn]: ... This is a profile of TTML2. TTML2 provides tools for timed text documents, it has lots of features, e.g., styling for captions or subtitles
14:13:32 [cpn]: ... We don't use those styling features particularly, but the semantic basis of TTML, to make the work of creating this spec a lot easier
14:13:44 [cpn]: ... The use cases are in the requirements document from 2022
14:14:08 [cpn]: ... The spec document is on the Rec track, in TTWG, intended to meet the requirements in the requirements doc
14:14:47 [cpn]: ... The diagram shows where it fits in. An audio program with key times. Then you might be creating a description of images, a script that describes what's in the video at relevent times
14:14:58 [cpn]: ... Or you might transcribe then translate it, to create a dubbing script
14:15:09 [cpn]: ... In both cases, you have a pre-recording script
14:15:19 [cpn]: ... You might record it with actors, or use text-to-speech
14:15:25 [cpn]: ... Then create instructions for mixing
14:15:53 [cpn]: ... You end up with a manifest file that includes everything spoken and times, with links to recorded audio and mixing instructions
14:15:59 [cpn]: q?
14:16:45 [cpn]: ChrisN: Is this mainly used in a production setting, what about for playback in a client?
14:17:08 [cpn]: Nigel: In dubbing workflows, what localisation teams do in the production domain is send the media to someone to do the dubbing
14:17:34 [cpn]: ... The script is useful when making edits. They also send it to a subtitling house, to create a subtitle file that would be presented to the audience
14:17:49 [cpn]: ... The language translation differs significantly between dubbing and subtitles
14:18:16 [cpn]: ... If the words don't match, it's terrible. Cyril Concolato is the co-editor, he described having this experience
14:18:32 [cpn]: ... He couldn't have the subtitles with the dubbed audio, not a good experience
14:18:43 [cpn]: ... Once you have the translated timed text, you can send it to be turned into subtitles
14:19:01 [cpn]: ... Then it's about changing styling, showing the right number of words at a time, shot changes, etc
14:19:26 [cpn]: ... Because they have a common source for translation, you didn't have to pay to get that done twice
14:19:52 [cpn]: ... If you have the script and mixing instructions available, in cases where people can't see the image, you can have client side rendering of the AD
14:20:08 [cpn]: ... That allows you to change the relative balance of AD and programme audio
14:20:23 [cpn]: ... If you have the text available, you don't have to render as audio, it could be a Braille display
14:20:40 [cpn]: ... They get the description of the video through the fingers, and dialog through their ears
14:20:41 [kaz]: s/Ohnata/Ohmata/
14:20:43 [kaz]: rrsagent, draft minutes
14:20:45 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:20:57 [cpn]: ... There's cognitive load using AD, to distinguish the description from the dialog
14:21:30 [cpn]: ... Some people can't hear at all, so having all the video description available as text would help people who can't see or hear
14:21:44 [cpn]: ... Then you get a reasonable description of the entire program
14:22:28 [cpn]: Nigel: We have a working draft document, steadily being updated
14:22:55 [cpn]: ... The DAPT is the spec document. The intent is the user can understand how it works without being expert in TTML
14:23:29 [cpn]: ... We use TTML for the underlying structures. We expect it to be useful for tool implementers and player implementers, rather than creating by people in a text editor, which would be hard work
14:24:01 [cpn]: ... Transcripts of pre-existing media and Scripts for media to be created
14:24:10 [cpn]: ... Let's look at some examples
14:24:23 [cpn]: ... It's an XML document with metadata in the head, and body with main content
14:24:45 [cpn]: ... You could create empty <divs> in the body. TTML has ideas in common with HTML, like head, body, p
14:25:19 [cpn]: ... From an AD, you have a p with start and end time, describing something in the video image
14:25:38 [cpn]: ... Care has been taken to be clear about language and the source of language. Important to know what state we're in
14:26:16 [cpn]: ... It uses the xml:lang tag, content profile designator. Example of a transcript of original language audio in the programme
14:26:28 [cpn]: ... In the AD example, the source of the language is original, so it's not a translation
14:26:47 [cpn]: ... If I create an audio recording of this description, call it clip3.wav. I can refernce it with an audio element
14:27:11 [cpn]: ... So there's a paragraph with an animated gain value, going from 1 to 0.39
14:27:37 [cpn]: ... This is commonly used in AD, to lower the programme audio volume before the AD enters, then return the gain to 1
14:28:13 [cpn]: ... Another example, we use a tta:speak attribute to direct a player to use text to speech
14:28:42 [cpn]: ... You can include the audio as base64 encoding. Challenge of identifying WAV audio as it doesn't have a proper mime type
14:28:51 [cpn]: ... You can send them to a player for playback
14:29:14 [cpn]: ... For dubbing, there's metadata for character names (and actor names), using existing TTML
14:29:46 [cpn]: ... Once translated, the script type changes in the metadata, to show it as a translation
14:30:08 [cpn]: ... The original can be kept, which is important to be able to refer back. Or for multiple translations
14:30:44 [cpn]: ... A Pivot language to go from Finnish to English, then English to Hebrew, for example
14:31:18 [cpn]: ... If you get words translated strangely, you can go back and adjust
14:31:28 [cpn]: ... Get lip-sync timings right
14:31:43 [cpn]: ... The single document gets worked on, updated to reflect the status
14:32:07 [cpn]: ... As it's an open standards, as opposed to non-standard or proprietary, we hope to create a marketplace for interoperability between tools
14:32:17 [cpn]: ... That's the benefit to having a W3C spec
14:32:17 [cpn]: q?
14:33:22 [cpn]: Nigel: We have a data model (UML class diagram) in the spec
14:33:58 [cpn]: ... A DAPT script has metadata, it can contain characters and styles
14:34:23 [cpn]: ... We may remove styles applied to particular characters. Debate whether it needs to be in the document
14:36:14 [cpn]: ... script events contain timed entities. The three main things are: script contains text, contains events
14:36:14 [cpn]: ... You can apply mixing instructions, audio recording or text to speech
14:36:14 [cpn]: ... Those are the main entities. The rest of the spec describes them in detail
14:36:49 [cpn]: ... It explains how the data model maps to the TTML, e.g., a <div> element with xml id
14:37:07 [cpn]: ... A p element represents text. You can have styles and metadata
14:37:38 [cpn]: ... You can have audio styles to trigger synthesised audio, describe if original language or translation
14:38:55 [cpn]: ... The audio is designed to be implementable using Web Audio - not the text to speech, the Web Speech API isn't a Rec, and problematic to use it here as typical implementations can't bring the speech into a Web Audio context
14:39:08 [cpn]: ... It goes out directly via the OS
14:39:28 [cpn]: ... But there are other ways to do it, e.g., online services where you post the text and they return audio
14:39:41 [cpn]: ... There's a conformance section that describes formal stuff
14:39:43 [kaz]: q+
14:40:08 [cpn]: ... Because it's a TTML2 profile, there's formal stuff in the Appendices on what's required, and optional, and extension features
14:40:54 [cpn]: ... TTML2 allows extensions, as "features" of specifications. This is helpful for designing CR exit criteria, as we know the features
14:41:35 [cpn]: ... So conformance criteria is easy generate, and tests
14:41:41 [cpn]: q?
14:42:10 [nigel]: -> https://www.w3.org/TR/dapt/ DAPT
14:42:18 [nigel]: -> https://www.w3.org/TR/dapt-reqs/ DAPT-REQS
14:42:49 [cpn]: Nigel: We're in working draft now, and next stage is Candidate Recommendation
14:43:09 [cpn]: ... So getting wide review is important. So it's the perfect time to review and give feedback
14:43:27 [cpn]: ... I think it's in a reasonable state to do that
14:43:47 [cpn]: ... If you have questions or comments, please open an issue in GitHub, or comment on any open issues
14:44:04 [cpn]: ... You can email the TTWG mailing list too, or directly with me or Cyril
14:44:44 [cpn]: ... Next steps, we'll respond to all review comments, we'll be starting horizontal review soon, and sending liaisons to groups in the TTWG charter
14:45:49 [cpn]: ChrisN: Who are the external groups, and which companies?
14:46:20 [cpn]: Nigel: Some tool suppliers have been very positive about having an open standard format. They have a common problem in how to serialize their work
14:47:03 [cpn]: ... If an organisation like BBC or Netflix wants to specify an AD or dubbing script as a deliverable from a provider, they can require the standard format
14:47:18 [cpn]: ... The alternative, ingesting spreadsheets or CSVs is painful
14:47:51 [cpn]: Kaz: Thank you, this is an interesting and useful mechanism
14:48:27 [cpn]: ... I'm wondering about the order of sentences. Japanese has a different ordering, subject, verb, object
14:49:00 [cpn]: ... So if we simply dub or translate English to Japanese, subtitles tend to split into smaller chunks, subject first
14:49:13 [cpn]: ... So how to deal with that kind of word order difference?
14:49:53 [cpn]: ... Second question, could work on SSML as well as the current text spec, so TTML engine can handle the text information well
14:50:19 [cpn]: Nigel: On the second point, there's good advantage to using SSML. We have an issue (#121)
14:50:30 [cpn]: ... It's a challenge to mix in with the audio content
14:50:55 [cpn]: ... Somebody else working on a similar problem, spoken presentation in HTML, has found the same issue
14:51:13 [cpn]: ... They considered different ways to proceed; multiple attributes or a single attribute
14:51:27 [cpn]: ... https://www.w3.org/TR/spoken-html/
14:51:37 [cpn]: ... I'd be interested to know how to do that well
14:52:06 [cpn]: Kaz: I need to organise a workshop or event on voice. This should be good input, so please let's work with the strategy team on that
14:52:13 [cpn]: Nigel: Yes, that would be good
14:52:33 [cpn]: ... On your first question, some things are covered by TTML, e.g., different writing directions and embedding
14:52:45 [cpn]: ... Japanese language requirements like rubies and text emphasis can be done
14:52:59 [cpn]: ... But your point is more about the structure of the language itself
14:53:21 [cpn]: ... We didn't feel the need to say anything, as it just contains whatever the translator generates
14:53:31 [cpn]: ... Is there more we need to do?
14:54:00 [cpn]: Kaz: Don't know, maybe you can use the begin and end attribute to specify the timing of the Japanese word to be played
14:54:05 [cpn]: Nigel: Yes, you could do that
14:54:32 [cpn]: Nigel: Timed text elements inside these objects is permitted. p elements can contain spans, and spans can have timing
14:54:49 [cpn]: ... That's important for adaptation
14:54:51 [cpn]: q?
14:54:55 [kaz]: ack k
14:55:41 [kaz]: rrsagent, draft minutes
14:55:42 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:56:21 [cpn]: ChrisN: This is really good progress, now ready for review
14:56:52 [kaz]: q+
14:56:59 [cpn]: Nigel: I see people from Japan are here, if you have particular use cases, I would be interested in your review feedback
14:57:10 [cpn]: ... Some things may not be obvious to me as a non-Japanese speaker
14:57:26 [cpn]: Kaz: We have people from NHK here today
14:58:03 [cpn]: ... NHK Open House was on Sunday, some are working on sign language synchronised with TV content, they may be interested in this mechanism and modalities like sign language
14:58:53 [cpn]: Nigel: What may be attractive, is it is extensible, you could present translation text to a sign interpreter
15:00:48 [cpn]: ChrisN: Also could be interseting from an object based media point of view
15:00:55 [cpn]: ... Anything to follow up?
15:01:16 [kaz]: q+
15:02:01 [kaz]: ack k
15:02:09 [kaz]: topic: Zoom call for the July meeting
15:02:33 [kaz]: kaz: We need to switch to Zoom, so I'll allocate one for the July meeting
15:02:35 [kaz]: [adjourned]
15:02:45 [kaz]: rrsagent, draft minutes
15:02:46 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
15:03:25 [kaz]: i/We need/scribenick: kaz/
15:03:25 [kaz]: rrsagent, draft minutes
15:03:26 [RRSAgent]: I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
15:05:56 [kaz]: rrsagent, bye
15:05:56 [RRSAgent]: I see no action items