IRC log of me on 2023-06-06
Timestamps are in UTC.
- 13:55:54 [RRSAgent]
- RRSAgent has joined #me
- 13:55:58 [RRSAgent]
- logging to https://www.w3.org/2023/06/06-me-irc
- 13:55:58 [Zakim]
- Zakim has joined #me
- 13:56:08 [cpn]
- Meeting: Media & Entertainment IG meeting
- 13:56:12 [cpn]
- scribe+ cpn
- 13:56:46 [cpn]
- Agenda: https://www.w3.org/events/meetings/184d4a81-f7e7-4984-ab2e-8f40e889d558
- 14:00:02 [tidoust]
- tidoust has joined #me
- 14:01:08 [igarashi]
- igarashi has joined #me
- 14:01:16 [igarashi]
- present+
- 14:01:24 [kaz]
- scribenick: Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
- 14:01:27 [cpn]
- present+ Hisayuki_Ohnata, Kinji_Matsumura, Ryo_Yasuoka, Tatsuya_Igarashi, Chris_Needham, Kazuyuki_Ashimura
- 14:01:30 [ohmata]
- ohmata has joined #me
- 14:01:34 [cpn]
- chair: ChrisN, Igarashi
- 14:01:56 [cpn]
- present+ Andreas_Tai
- 14:02:23 [cpn]
- present+ Chris_Lorenzo
- 14:02:25 [cpn]
- chair+ ChrisL
- 14:03:01 [cpn]
- present+ Ewan_Roycroft
- 14:03:02 [kaz]
- agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2023Jun/0000.html
- 14:03:07 [kaz]
- rrsagent, make log public
- 14:03:11 [kaz]
- rrsagent, draft minutes
- 14:03:12 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:04:29 [kaz]
- scribenick: cpn
- 14:05:15 [atai_]
- atai_ has joined #me
- 14:05:19 [kaz]
- rrsagent, draft minutes
- 14:05:20 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:05:59 [nigel]
- nigel has joined #me
- 14:06:24 [cpn]
- Topic: Introduction
- 14:06:40 [cpn]
- ChrisN: Main topic is DAPT
- 14:06:51 [cpn]
- ... Some discussion on the group charter with the APA WG
- 14:07:22 [kaz]
- ... expected collaboration on media requirements
- 14:07:33 [kaz]
- ... discussion during TPAC in September
- 14:07:54 [tidoust]
- present+ Francois_Daoust
- 14:08:01 [cpn]
- ... Charter should go to the AC soon, I hope
- 14:08:11 [kaz]
- i/expected/scribenick: kaz/
- 14:08:12 [cpn]
- Kaz: I'll bring it to the team strategy meeting next Tuesday
- 14:08:18 [kaz]
- i/Charter should/scribenick: cpn/
- 14:08:44 [kaz]
- scribenick- Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
- 14:08:47 [kaz]
- rrsagent, draft minutes
- 14:08:48 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:08:54 [cpn]
- present+ Francois_Daoust
- 14:09:28 [kaz]
- present+ Nigel_Megitt
- 14:09:29 [kaz]
- rrsagent, draft minutes
- 14:09:30 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:09:40 [cpn]
- Topic: Dubbing and AD Profile of TTML2
- 14:09:56 [cpn]
- Nigel: No slides, but here's the document
- 14:10:08 [cpn]
- ... This is a requirements document
- 14:11:39 [cpn]
- ... This is work on an exchange format for workflows for producing dubbing scripts and audio description scripts
- 14:11:58 [cpn]
- ... It defines in an exchangeable document the mixing instructions to produce a version of the video with AD mixed in
- 14:12:06 [kaz]
- i|No slides|-> https://www.w3.org/TR/2023/WD-dapt-20230505/ Dubbing and Audio description Profiles of TTML2 WD
- 14:12:09 [kaz]
- rrsagent, draft minutes
- 14:12:10 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:12:16 [cpn]
- ... Wanted to get this work going for a number of years, we had a CG with not enough momentum
- 14:12:31 [cpn]
- ... Combining the AD and dubbing work created more momentum
- 14:13:06 [cpn]
- ... This is a profile of TTML2. TTML2 provides tools for timed text documents, it has lots of features, e.g., styling for captions or subtitles
- 14:13:32 [cpn]
- ... We don't use those styling features particularly, but the semantic basis of TTML, to make the work of creating this spec a lot easier
- 14:13:44 [cpn]
- ... The use cases are in the requirements document from 2022
- 14:14:08 [cpn]
- ... The spec document is on the Rec track, in TTWG, intended to meet the requirements in the requirements doc
- 14:14:47 [cpn]
- ... The diagram shows where it fits in. An audio program with key times. Then you might be creating a description of images, a script that describes what's in the video at relevent times
- 14:14:58 [cpn]
- ... Or you might transcribe then translate it, to create a dubbing script
- 14:15:09 [cpn]
- ... In both cases, you have a pre-recording script
- 14:15:19 [cpn]
- ... You might record it with actors, or use text-to-speech
- 14:15:25 [cpn]
- ... Then create instructions for mixing
- 14:15:53 [cpn]
- ... You end up with a manifest file that includes everything spoken and times, with links to recorded audio and mixing instructions
- 14:15:59 [cpn]
- q?
- 14:16:45 [cpn]
- ChrisN: Is this mainly used in a production setting, what about for playback in a client?
- 14:17:08 [cpn]
- Nigel: In dubbing workflows, what localisation teams do in the production domain is send the media to someone to do the dubbing
- 14:17:34 [cpn]
- ... The script is useful when making edits. They also send it to a subtitling house, to create a subtitle file that would be presented to the audience
- 14:17:49 [cpn]
- ... The language translation differs significantly between dubbing and subtitles
- 14:18:16 [cpn]
- ... If the words don't match, it's terrible. Cyril Concolato is the co-editor, he described having this experience
- 14:18:32 [cpn]
- ... He couldn't have the subtitles with the dubbed audio, not a good experience
- 14:18:43 [cpn]
- ... Once you have the translated timed text, you can send it to be turned into subtitles
- 14:19:01 [cpn]
- ... Then it's about changing styling, showing the right number of words at a time, shot changes, etc
- 14:19:26 [cpn]
- ... Because they have a common source for translation, you didn't have to pay to get that done twice
- 14:19:52 [cpn]
- ... If you have the script and mixing instructions available, in cases where people can't see the image, you can have client side rendering of the AD
- 14:20:08 [cpn]
- ... That allows you to change the relative balance of AD and programme audio
- 14:20:23 [cpn]
- ... If you have the text available, you don't have to render as audio, it could be a Braille display
- 14:20:40 [cpn]
- ... They get the description of the video through the fingers, and dialog through their ears
- 14:20:41 [kaz]
- s/Ohnata/Ohmata/
- 14:20:43 [kaz]
- rrsagent, draft minutes
- 14:20:45 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:20:57 [cpn]
- ... There's cognitive load using AD, to distinguish the description from the dialog
- 14:21:30 [cpn]
- ... Some people can't hear at all, so having all the video description available as text would help people who can't see or hear
- 14:21:44 [cpn]
- ... Then you get a reasonable description of the entire program
- 14:22:28 [cpn]
- Nigel: We have a working draft document, steadily being updated
- 14:22:55 [cpn]
- ... The DAPT is the spec document. The intent is the user can understand how it works without being expert in TTML
- 14:23:29 [cpn]
- ... We use TTML for the underlying structures. We expect it to be useful for tool implementers and player implementers, rather than creating by people in a text editor, which would be hard work
- 14:24:01 [cpn]
- ... Transcripts of pre-existing media and Scripts for media to be created
- 14:24:10 [cpn]
- ... Let's look at some examples
- 14:24:23 [cpn]
- ... It's an XML document with metadata in the head, and body with main content
- 14:24:45 [cpn]
- ... You could create empty <divs> in the body. TTML has ideas in common with HTML, like head, body, p
- 14:25:19 [cpn]
- ... From an AD, you have a p with start and end time, describing something in the video image
- 14:25:38 [cpn]
- ... Care has been taken to be clear about language and the source of language. Important to know what state we're in
- 14:26:16 [cpn]
- ... It uses the xml:lang tag, content profile designator. Example of a transcript of original language audio in the programme
- 14:26:28 [cpn]
- ... In the AD example, the source of the language is original, so it's not a translation
- 14:26:47 [cpn]
- ... If I create an audio recording of this description, call it clip3.wav. I can refernce it with an audio element
- 14:27:11 [cpn]
- ... So there's a paragraph with an animated gain value, going from 1 to 0.39
- 14:27:37 [cpn]
- ... This is commonly used in AD, to lower the programme audio volume before the AD enters, then return the gain to 1
- 14:28:13 [cpn]
- ... Another example, we use a tta:speak attribute to direct a player to use text to speech
- 14:28:42 [cpn]
- ... You can include the audio as base64 encoding. Challenge of identifying WAV audio as it doesn't have a proper mime type
- 14:28:51 [cpn]
- ... You can send them to a player for playback
- 14:29:14 [cpn]
- ... For dubbing, there's metadata for character names (and actor names), using existing TTML
- 14:29:46 [cpn]
- ... Once translated, the script type changes in the metadata, to show it as a translation
- 14:30:08 [cpn]
- ... The original can be kept, which is important to be able to refer back. Or for multiple translations
- 14:30:44 [cpn]
- ... A Pivot language to go from Finnish to English, then English to Hebrew, for example
- 14:31:18 [cpn]
- ... If you get words translated strangely, you can go back and adjust
- 14:31:28 [cpn]
- ... Get lip-sync timings right
- 14:31:43 [cpn]
- ... The single document gets worked on, updated to reflect the status
- 14:32:07 [cpn]
- ... As it's an open standards, as opposed to non-standard or proprietary, we hope to create a marketplace for interoperability between tools
- 14:32:17 [cpn]
- ... That's the benefit to having a W3C spec
- 14:32:17 [cpn]
- q?
- 14:33:22 [cpn]
- Nigel: We have a data model (UML class diagram) in the spec
- 14:33:58 [cpn]
- ... A DAPT script has metadata, it can contain characters and styles
- 14:34:23 [cpn]
- ... We may remove styles applied to particular characters. Debate whether it needs to be in the document
- 14:36:14 [cpn]
- ... script events contain timed entities. The three main things are: script contains text, contains events
- 14:36:14 [cpn]
- ... You can apply mixing instructions, audio recording or text to speech
- 14:36:14 [cpn]
- ... Those are the main entities. The rest of the spec describes them in detail
- 14:36:49 [cpn]
- ... It explains how the data model maps to the TTML, e.g., a <div> element with xml id
- 14:37:07 [cpn]
- ... A p element represents text. You can have styles and metadata
- 14:37:38 [cpn]
- ... You can have audio styles to trigger synthesised audio, describe if original language or translation
- 14:38:55 [cpn]
- ... The audio is designed to be implementable using Web Audio - not the text to speech, the Web Speech API isn't a Rec, and problematic to use it here as typical implementations can't bring the speech into a Web Audio context
- 14:39:08 [cpn]
- ... It goes out directly via the OS
- 14:39:28 [cpn]
- ... But there are other ways to do it, e.g., online services where you post the text and they return audio
- 14:39:41 [cpn]
- ... There's a conformance section that describes formal stuff
- 14:39:43 [kaz]
- q+
- 14:40:08 [cpn]
- ... Because it's a TTML2 profile, there's formal stuff in the Appendices on what's required, and optional, and extension features
- 14:40:54 [cpn]
- ... TTML2 allows extensions, as "features" of specifications. This is helpful for designing CR exit criteria, as we know the features
- 14:41:35 [cpn]
- ... So conformance criteria is easy generate, and tests
- 14:41:41 [cpn]
- q?
- 14:42:10 [nigel]
- -> https://www.w3.org/TR/dapt/ DAPT
- 14:42:18 [nigel]
- -> https://www.w3.org/TR/dapt-reqs/ DAPT-REQS
- 14:42:49 [cpn]
- Nigel: We're in working draft now, and next stage is Candidate Recommendation
- 14:43:09 [cpn]
- ... So getting wide review is important. So it's the perfect time to review and give feedback
- 14:43:27 [cpn]
- ... I think it's in a reasonable state to do that
- 14:43:47 [cpn]
- ... If you have questions or comments, please open an issue in GitHub, or comment on any open issues
- 14:44:04 [cpn]
- ... You can email the TTWG mailing list too, or directly with me or Cyril
- 14:44:44 [cpn]
- ... Next steps, we'll respond to all review comments, we'll be starting horizontal review soon, and sending liaisons to groups in the TTWG charter
- 14:45:49 [cpn]
- ChrisN: Who are the external groups, and which companies?
- 14:46:20 [cpn]
- Nigel: Some tool suppliers have been very positive about having an open standard format. They have a common problem in how to serialize their work
- 14:47:03 [cpn]
- ... If an organisation like BBC or Netflix wants to specify an AD or dubbing script as a deliverable from a provider, they can require the standard format
- 14:47:18 [cpn]
- ... The alternative, ingesting spreadsheets or CSVs is painful
- 14:47:51 [cpn]
- Kaz: Thank you, this is an interesting and useful mechanism
- 14:48:27 [cpn]
- ... I'm wondering about the order of sentences. Japanese has a different ordering, subject, verb, object
- 14:49:00 [cpn]
- ... So if we simply dub or translate English to Japanese, subtitles tend to split into smaller chunks, subject first
- 14:49:13 [cpn]
- ... So how to deal with that kind of word order difference?
- 14:49:53 [cpn]
- ... Second question, could work on SSML as well as the current text spec, so TTML engine can handle the text information well
- 14:50:19 [cpn]
- Nigel: On the second point, there's good advantage to using SSML. We have an issue (#121)
- 14:50:30 [cpn]
- ... It's a challenge to mix in with the audio content
- 14:50:55 [cpn]
- ... Somebody else working on a similar problem, spoken presentation in HTML, has found the same issue
- 14:51:13 [cpn]
- ... They considered different ways to proceed; multiple attributes or a single attribute
- 14:51:27 [cpn]
- ... https://www.w3.org/TR/spoken-html/
- 14:51:37 [cpn]
- ... I'd be interested to know how to do that well
- 14:52:06 [cpn]
- Kaz: I need to organise a workshop or event on voice. This should be good input, so please let's work with the strategy team on that
- 14:52:13 [cpn]
- Nigel: Yes, that would be good
- 14:52:33 [cpn]
- ... On your first question, some things are covered by TTML, e.g., different writing directions and embedding
- 14:52:45 [cpn]
- ... Japanese language requirements like rubies and text emphasis can be done
- 14:52:59 [cpn]
- ... But your point is more about the structure of the language itself
- 14:53:21 [cpn]
- ... We didn't feel the need to say anything, as it just contains whatever the translator generates
- 14:53:31 [cpn]
- ... Is there more we need to do?
- 14:54:00 [cpn]
- Kaz: Don't know, maybe you can use the begin and end attribute to specify the timing of the Japanese word to be played
- 14:54:05 [cpn]
- Nigel: Yes, you could do that
- 14:54:32 [cpn]
- Nigel: Timed text elements inside these objects is permitted. p elements can contain spans, and spans can have timing
- 14:54:49 [cpn]
- ... That's important for adaptation
- 14:54:51 [cpn]
- q?
- 14:54:55 [kaz]
- ack k
- 14:55:41 [kaz]
- rrsagent, draft minutes
- 14:55:42 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 14:56:21 [cpn]
- ChrisN: This is really good progress, now ready for review
- 14:56:52 [kaz]
- q+
- 14:56:59 [cpn]
- Nigel: I see people from Japan are here, if you have particular use cases, I would be interested in your review feedback
- 14:57:10 [cpn]
- ... Some things may not be obvious to me as a non-Japanese speaker
- 14:57:26 [cpn]
- Kaz: We have people from NHK here today
- 14:58:03 [cpn]
- ... NHK Open House was on Sunday, some are working on sign language synchronised with TV content, they may be interested in this mechanism and modalities like sign language
- 14:58:53 [cpn]
- Nigel: What may be attractive, is it is extensible, you could present translation text to a sign interpreter
- 15:00:48 [cpn]
- ChrisN: Also could be interseting from an object based media point of view
- 15:00:55 [cpn]
- ... Anything to follow up?
- 15:01:16 [kaz]
- q+
- 15:02:01 [kaz]
- ack k
- 15:02:09 [kaz]
- topic: Zoom call for the July meeting
- 15:02:33 [kaz]
- kaz: We need to switch to Zoom, so I'll allocate one for the July meeting
- 15:02:35 [kaz]
- [adjourned]
- 15:02:45 [kaz]
- rrsagent, draft minutes
- 15:02:46 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 15:03:25 [kaz]
- i/We need/scribenick: kaz/
- 15:03:25 [kaz]
- rrsagent, draft minutes
- 15:03:26 [RRSAgent]
- I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
- 15:05:56 [kaz]
- rrsagent, bye
- 15:05:56 [RRSAgent]
- I see no action items