Meeting minutes
<wendyreid> date: 2024-02-05
wendyreid: I'm working on the final revision of the main document
<wendyreid> w3c/
wendyreid: I would like to speak about the issue Hadrien opened last week
Koko: I work in Penguin Random House
wendyreid: while discussing about TTS we understood that there is no standard on top we can develop TTS engines
Hadrien: the issue I created only some of the issues in creating a TTS engine
I left out highlighting, virtual highlighting, switching voices between languages, etc.
… I think we need to document what we see in real files
wendyreid: when I looked at Speech API, I've seen that it is a CG Note, it's not a REC, it's not officiale
… I think there are issues about resources needed to create audio with good voices
Hadrien: in my experience it's wide supported
… but it is quite inconsistent
… its difficult to have feedback from the engines
… actually you can use native APIs if you're developing native apps
… I don't think the status of the document is an issue for us
… because our work should be on what to do before passing data to Speech API
wendyreid: I prefer to work on stable specs
… Hadrien is right, this work is about "pre-API" work
gpellegrino: So I understand the issue, the idea is to define a way to extract text from EPUBs in an abstract layer that then you can pass to the TTS
… or display to the end user as simple text
… meaning for FXL, we can have a reflow version
Hadrien: It's more than text
… it's also semantics: language + context
… may also intercept hidden elements (like page breaks)
… the output should be an object with the text plus metadata
wendyreid: a lot of TTS engines in reading apps do not read alt-text for images
Hadrien: in Readium alt-text is supported (no way to skip it, for the moment)
wendyreid: the base on which we can build this engine is to have a good semantic in the text
… I'm making research on how browsers manage reading mode
… I've found an article about bugs in the browser reading mode
https://
Hadrien: I'm less concerned about that, I think the biggest issue we may have with FXL is similar to liquid mode with a well known fixed format for digital publications
… if the reading mode is correct, we may still have problems in text splitted in multiple spans
… I think we should reconstruct the content
… problems with semantics may raise up with content splitted in multiple pages
wendyreid: I think that we may have a pause across different pages (in the middle of a paragraph)
Hadrien: if we creare a object for the whole publication, and we reconstruct the content using ICU tokenizer, we may be able to recreate sentences
wendyreid: we may have performance issues
Hadrien: it depends, we may have different approaches
wendyreid: In FXL we may use the TOC to reconstruct part of the content
Hadrien: it's not that simple, since you have things splitted in multiple pages
wendyreid: maybe we can benefit from using DPUB ARIA roles
Hadrien: sure, maybe also having something to say where a sentence ends
… I think documenting what developers are already doing
… is a good starting point for creating this document
wendyreid: which question may we ask?
Hadrien: I think we can start with asking how do they break the content (based on HTML, etc.)
… and knowing how do they manage non textual content
with semantics
… I think this system may also be useful for remediation process
gpellegrino: It might be worth asking about MathML
… TTS breaks on MathML, might also be interesting
… reader mode and TTS, we might need multiple approaches to MathML
CharlesL: will we have FXL ebooks with MathML?
gpellegrino: I think the scope of this document is also for reflowable
wendyreid: I think we can mention MathML without speaking about the format
CharlesL: I think there should be an option for reading "invisible" things like alt-text
Hadrien: sure, maybe we should then discuss about skippability and escapability
wendyreid: I think another important element is personalization
Hadrien: we have a call on Readium community this Wednesday, I may add an agenda item about this
… for example Readium mobile and Readium desktop have two different approaches and we may start documenting them
wendyreid: AOB?
Hadrien: the more I look at it, the more it seems to me useful for different usecases