IRC log of me on 2023-06-06

Timestamps are in UTC.

13:55:54 [RRSAgent]
RRSAgent has joined #me
13:55:58 [RRSAgent]
logging to https://www.w3.org/2023/06/06-me-irc
13:55:58 [Zakim]
Zakim has joined #me
13:56:08 [cpn]
Meeting: Media & Entertainment IG meeting
13:56:12 [cpn]
scribe+ cpn
13:56:46 [cpn]
Agenda: https://www.w3.org/events/meetings/184d4a81-f7e7-4984-ab2e-8f40e889d558
14:00:02 [tidoust]
tidoust has joined #me
14:01:08 [igarashi]
igarashi has joined #me
14:01:16 [igarashi]
present+
14:01:24 [kaz]
scribenick: Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
14:01:27 [cpn]
present+ Hisayuki_Ohnata, Kinji_Matsumura, Ryo_Yasuoka, Tatsuya_Igarashi, Chris_Needham, Kazuyuki_Ashimura
14:01:30 [ohmata]
ohmata has joined #me
14:01:34 [cpn]
chair: ChrisN, Igarashi
14:01:56 [cpn]
present+ Andreas_Tai
14:02:23 [cpn]
present+ Chris_Lorenzo
14:02:25 [cpn]
chair+ ChrisL
14:03:01 [cpn]
present+ Ewan_Roycroft
14:03:02 [kaz]
agenda: https://lists.w3.org/Archives/Public/public-web-and-tv/2023Jun/0000.html
14:03:07 [kaz]
rrsagent, make log public
14:03:11 [kaz]
rrsagent, draft minutes
14:03:12 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:04:29 [kaz]
scribenick: cpn
14:05:15 [atai_]
atai_ has joined #me
14:05:19 [kaz]
rrsagent, draft minutes
14:05:20 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:05:59 [nigel]
nigel has joined #me
14:06:24 [cpn]
Topic: Introduction
14:06:40 [cpn]
ChrisN: Main topic is DAPT
14:06:51 [cpn]
... Some discussion on the group charter with the APA WG
14:07:22 [kaz]
... expected collaboration on media requirements
14:07:33 [kaz]
... discussion during TPAC in September
14:07:54 [tidoust]
present+ Francois_Daoust
14:08:01 [cpn]
... Charter should go to the AC soon, I hope
14:08:11 [kaz]
i/expected/scribenick: kaz/
14:08:12 [cpn]
Kaz: I'll bring it to the team strategy meeting next Tuesday
14:08:18 [kaz]
i/Charter should/scribenick: cpn/
14:08:44 [kaz]
scribenick- Kaz_Ashimura, Chris_Needham, Hisayuki_Ohmata, Tatsuya_Igarashi, Kinji_Matsumura, Ryo_Yasuoka
14:08:47 [kaz]
rrsagent, draft minutes
14:08:48 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:08:54 [cpn]
present+ Francois_Daoust
14:09:28 [kaz]
present+ Nigel_Megitt
14:09:29 [kaz]
rrsagent, draft minutes
14:09:30 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:09:40 [cpn]
Topic: Dubbing and AD Profile of TTML2
14:09:56 [cpn]
Nigel: No slides, but here's the document
14:10:08 [cpn]
... This is a requirements document
14:11:39 [cpn]
... This is work on an exchange format for workflows for producing dubbing scripts and audio description scripts
14:11:58 [cpn]
... It defines in an exchangeable document the mixing instructions to produce a version of the video with AD mixed in
14:12:06 [kaz]
i|No slides|-> https://www.w3.org/TR/2023/WD-dapt-20230505/ Dubbing and Audio description Profiles of TTML2 WD
14:12:09 [kaz]
rrsagent, draft minutes
14:12:10 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:12:16 [cpn]
... Wanted to get this work going for a number of years, we had a CG with not enough momentum
14:12:31 [cpn]
... Combining the AD and dubbing work created more momentum
14:13:06 [cpn]
... This is a profile of TTML2. TTML2 provides tools for timed text documents, it has lots of features, e.g., styling for captions or subtitles
14:13:32 [cpn]
... We don't use those styling features particularly, but the semantic basis of TTML, to make the work of creating this spec a lot easier
14:13:44 [cpn]
... The use cases are in the requirements document from 2022
14:14:08 [cpn]
... The spec document is on the Rec track, in TTWG, intended to meet the requirements in the requirements doc
14:14:47 [cpn]
... The diagram shows where it fits in. An audio program with key times. Then you might be creating a description of images, a script that describes what's in the video at relevent times
14:14:58 [cpn]
... Or you might transcribe then translate it, to create a dubbing script
14:15:09 [cpn]
... In both cases, you have a pre-recording script
14:15:19 [cpn]
... You might record it with actors, or use text-to-speech
14:15:25 [cpn]
... Then create instructions for mixing
14:15:53 [cpn]
... You end up with a manifest file that includes everything spoken and times, with links to recorded audio and mixing instructions
14:15:59 [cpn]
q?
14:16:45 [cpn]
ChrisN: Is this mainly used in a production setting, what about for playback in a client?
14:17:08 [cpn]
Nigel: In dubbing workflows, what localisation teams do in the production domain is send the media to someone to do the dubbing
14:17:34 [cpn]
... The script is useful when making edits. They also send it to a subtitling house, to create a subtitle file that would be presented to the audience
14:17:49 [cpn]
... The language translation differs significantly between dubbing and subtitles
14:18:16 [cpn]
... If the words don't match, it's terrible. Cyril Concolato is the co-editor, he described having this experience
14:18:32 [cpn]
... He couldn't have the subtitles with the dubbed audio, not a good experience
14:18:43 [cpn]
... Once you have the translated timed text, you can send it to be turned into subtitles
14:19:01 [cpn]
... Then it's about changing styling, showing the right number of words at a time, shot changes, etc
14:19:26 [cpn]
... Because they have a common source for translation, you didn't have to pay to get that done twice
14:19:52 [cpn]
... If you have the script and mixing instructions available, in cases where people can't see the image, you can have client side rendering of the AD
14:20:08 [cpn]
... That allows you to change the relative balance of AD and programme audio
14:20:23 [cpn]
... If you have the text available, you don't have to render as audio, it could be a Braille display
14:20:40 [cpn]
... They get the description of the video through the fingers, and dialog through their ears
14:20:41 [kaz]
s/Ohnata/Ohmata/
14:20:43 [kaz]
rrsagent, draft minutes
14:20:45 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:20:57 [cpn]
... There's cognitive load using AD, to distinguish the description from the dialog
14:21:30 [cpn]
... Some people can't hear at all, so having all the video description available as text would help people who can't see or hear
14:21:44 [cpn]
... Then you get a reasonable description of the entire program
14:22:28 [cpn]
Nigel: We have a working draft document, steadily being updated
14:22:55 [cpn]
... The DAPT is the spec document. The intent is the user can understand how it works without being expert in TTML
14:23:29 [cpn]
... We use TTML for the underlying structures. We expect it to be useful for tool implementers and player implementers, rather than creating by people in a text editor, which would be hard work
14:24:01 [cpn]
... Transcripts of pre-existing media and Scripts for media to be created
14:24:10 [cpn]
... Let's look at some examples
14:24:23 [cpn]
... It's an XML document with metadata in the head, and body with main content
14:24:45 [cpn]
... You could create empty <divs> in the body. TTML has ideas in common with HTML, like head, body, p
14:25:19 [cpn]
... From an AD, you have a p with start and end time, describing something in the video image
14:25:38 [cpn]
... Care has been taken to be clear about language and the source of language. Important to know what state we're in
14:26:16 [cpn]
... It uses the xml:lang tag, content profile designator. Example of a transcript of original language audio in the programme
14:26:28 [cpn]
... In the AD example, the source of the language is original, so it's not a translation
14:26:47 [cpn]
... If I create an audio recording of this description, call it clip3.wav. I can refernce it with an audio element
14:27:11 [cpn]
... So there's a paragraph with an animated gain value, going from 1 to 0.39
14:27:37 [cpn]
... This is commonly used in AD, to lower the programme audio volume before the AD enters, then return the gain to 1
14:28:13 [cpn]
... Another example, we use a tta:speak attribute to direct a player to use text to speech
14:28:42 [cpn]
... You can include the audio as base64 encoding. Challenge of identifying WAV audio as it doesn't have a proper mime type
14:28:51 [cpn]
... You can send them to a player for playback
14:29:14 [cpn]
... For dubbing, there's metadata for character names (and actor names), using existing TTML
14:29:46 [cpn]
... Once translated, the script type changes in the metadata, to show it as a translation
14:30:08 [cpn]
... The original can be kept, which is important to be able to refer back. Or for multiple translations
14:30:44 [cpn]
... A Pivot language to go from Finnish to English, then English to Hebrew, for example
14:31:18 [cpn]
... If you get words translated strangely, you can go back and adjust
14:31:28 [cpn]
... Get lip-sync timings right
14:31:43 [cpn]
... The single document gets worked on, updated to reflect the status
14:32:07 [cpn]
... As it's an open standards, as opposed to non-standard or proprietary, we hope to create a marketplace for interoperability between tools
14:32:17 [cpn]
... That's the benefit to having a W3C spec
14:32:17 [cpn]
q?
14:33:22 [cpn]
Nigel: We have a data model (UML class diagram) in the spec
14:33:58 [cpn]
... A DAPT script has metadata, it can contain characters and styles
14:34:23 [cpn]
... We may remove styles applied to particular characters. Debate whether it needs to be in the document
14:36:14 [cpn]
... script events contain timed entities. The three main things are: script contains text, contains events
14:36:14 [cpn]
... You can apply mixing instructions, audio recording or text to speech
14:36:14 [cpn]
... Those are the main entities. The rest of the spec describes them in detail
14:36:49 [cpn]
... It explains how the data model maps to the TTML, e.g., a <div> element with xml id
14:37:07 [cpn]
... A p element represents text. You can have styles and metadata
14:37:38 [cpn]
... You can have audio styles to trigger synthesised audio, describe if original language or translation
14:38:55 [cpn]
... The audio is designed to be implementable using Web Audio - not the text to speech, the Web Speech API isn't a Rec, and problematic to use it here as typical implementations can't bring the speech into a Web Audio context
14:39:08 [cpn]
... It goes out directly via the OS
14:39:28 [cpn]
... But there are other ways to do it, e.g., online services where you post the text and they return audio
14:39:41 [cpn]
... There's a conformance section that describes formal stuff
14:39:43 [kaz]
q+
14:40:08 [cpn]
... Because it's a TTML2 profile, there's formal stuff in the Appendices on what's required, and optional, and extension features
14:40:54 [cpn]
... TTML2 allows extensions, as "features" of specifications. This is helpful for designing CR exit criteria, as we know the features
14:41:35 [cpn]
... So conformance criteria is easy generate, and tests
14:41:41 [cpn]
q?
14:42:10 [nigel]
-> https://www.w3.org/TR/dapt/ DAPT
14:42:18 [nigel]
-> https://www.w3.org/TR/dapt-reqs/ DAPT-REQS
14:42:49 [cpn]
Nigel: We're in working draft now, and next stage is Candidate Recommendation
14:43:09 [cpn]
... So getting wide review is important. So it's the perfect time to review and give feedback
14:43:27 [cpn]
... I think it's in a reasonable state to do that
14:43:47 [cpn]
... If you have questions or comments, please open an issue in GitHub, or comment on any open issues
14:44:04 [cpn]
... You can email the TTWG mailing list too, or directly with me or Cyril
14:44:44 [cpn]
... Next steps, we'll respond to all review comments, we'll be starting horizontal review soon, and sending liaisons to groups in the TTWG charter
14:45:49 [cpn]
ChrisN: Who are the external groups, and which companies?
14:46:20 [cpn]
Nigel: Some tool suppliers have been very positive about having an open standard format. They have a common problem in how to serialize their work
14:47:03 [cpn]
... If an organisation like BBC or Netflix wants to specify an AD or dubbing script as a deliverable from a provider, they can require the standard format
14:47:18 [cpn]
... The alternative, ingesting spreadsheets or CSVs is painful
14:47:51 [cpn]
Kaz: Thank you, this is an interesting and useful mechanism
14:48:27 [cpn]
... I'm wondering about the order of sentences. Japanese has a different ordering, subject, verb, object
14:49:00 [cpn]
... So if we simply dub or translate English to Japanese, subtitles tend to split into smaller chunks, subject first
14:49:13 [cpn]
... So how to deal with that kind of word order difference?
14:49:53 [cpn]
... Second question, could work on SSML as well as the current text spec, so TTML engine can handle the text information well
14:50:19 [cpn]
Nigel: On the second point, there's good advantage to using SSML. We have an issue (#121)
14:50:30 [cpn]
... It's a challenge to mix in with the audio content
14:50:55 [cpn]
... Somebody else working on a similar problem, spoken presentation in HTML, has found the same issue
14:51:13 [cpn]
... They considered different ways to proceed; multiple attributes or a single attribute
14:51:27 [cpn]
... https://www.w3.org/TR/spoken-html/
14:51:37 [cpn]
... I'd be interested to know how to do that well
14:52:06 [cpn]
Kaz: I need to organise a workshop or event on voice. This should be good input, so please let's work with the strategy team on that
14:52:13 [cpn]
Nigel: Yes, that would be good
14:52:33 [cpn]
... On your first question, some things are covered by TTML, e.g., different writing directions and embedding
14:52:45 [cpn]
... Japanese language requirements like rubies and text emphasis can be done
14:52:59 [cpn]
... But your point is more about the structure of the language itself
14:53:21 [cpn]
... We didn't feel the need to say anything, as it just contains whatever the translator generates
14:53:31 [cpn]
... Is there more we need to do?
14:54:00 [cpn]
Kaz: Don't know, maybe you can use the begin and end attribute to specify the timing of the Japanese word to be played
14:54:05 [cpn]
Nigel: Yes, you could do that
14:54:32 [cpn]
Nigel: Timed text elements inside these objects is permitted. p elements can contain spans, and spans can have timing
14:54:49 [cpn]
... That's important for adaptation
14:54:51 [cpn]
q?
14:54:55 [kaz]
ack k
14:55:41 [kaz]
rrsagent, draft minutes
14:55:42 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
14:56:21 [cpn]
ChrisN: This is really good progress, now ready for review
14:56:52 [kaz]
q+
14:56:59 [cpn]
Nigel: I see people from Japan are here, if you have particular use cases, I would be interested in your review feedback
14:57:10 [cpn]
... Some things may not be obvious to me as a non-Japanese speaker
14:57:26 [cpn]
Kaz: We have people from NHK here today
14:58:03 [cpn]
... NHK Open House was on Sunday, some are working on sign language synchronised with TV content, they may be interested in this mechanism and modalities like sign language
14:58:53 [cpn]
Nigel: What may be attractive, is it is extensible, you could present translation text to a sign interpreter
15:00:48 [cpn]
ChrisN: Also could be interseting from an object based media point of view
15:00:55 [cpn]
... Anything to follow up?
15:01:16 [kaz]
q+
15:02:01 [kaz]
ack k
15:02:09 [kaz]
topic: Zoom call for the July meeting
15:02:33 [kaz]
kaz: We need to switch to Zoom, so I'll allocate one for the July meeting
15:02:35 [kaz]
[adjourned]
15:02:45 [kaz]
rrsagent, draft minutes
15:02:46 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
15:03:25 [kaz]
i/We need/scribenick: kaz/
15:03:25 [kaz]
rrsagent, draft minutes
15:03:26 [RRSAgent]
I have made the request to generate https://www.w3.org/2023/06/06-me-minutes.html kaz
15:05:56 [kaz]
rrsagent, bye
15:05:56 [RRSAgent]
I see no action items