This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
As decided at meeting 147, a sentence has been added (in the editor's draft) to section 1.1: In the absence of an implementation-defined way to differentiate, all markup creates token boundaries. However, XML's definition of "markup" <http://www.w3.org/TR/2006/REC-xml-20060816/#syntax> is perhaps broader than what we had in mind when we said "all markup". For instance, it seems unlikely that we meant for a character reference to create a token boundary. Similarly for entity references and perhaps CDATA section delimiters. We could be more specific about which kinds of markup we mean, but instead, maybe we shouldn't be relying on the idea of markup. Full-Text operates on instances of the XQuery/XPath Data Model, where markup doesn't exist. So, for example, we might say: In the absence of an implementation-defined indication otherwise, a token must not contain characters from more than one node. (although we might have to make that more precise).
The FTTF, at its meeting today, decided to modify the sentence in question, replacing "all markup" with "element markup (start-tags, end-tags, and empty-element tags)". The FTTF believes this action resolves the issue.
There have been no objections raised to the solution identified in http://www.w3.org/Bugs/Public/show_bug.cgi?id=4946#c1, so I am marking this bug CLOSED.