This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Provide an example for overlapping tokens. Like German compound words such as Donaudampfschifffahrtskapitaensmuetze
The Task Force has agreed to provide such an example. In Section 4.1, Tokenization, immediately prior to section 4.1.1, we will insert a paragraph that reads: For some languages, some tokenizers may identify overlapping tokens. For example, the German word "Donaudampfschifffahrtskapitaensmuetzen" might be tokenized into the following tokens: Donaudampfschifffahrtskapitaensmuetzen, Donau, dampf, schiff, dampfschiff, kapitaen, muetzen, kapitaensmuetzen, schifffahrt, dampfschifffahrt, and perhaps others.
Done.
Because you participated in the TF when this bug was resolved, we presume that your concerns are addressed appropriately. We are therefore marking this bug as CLOSED.