This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
From an implementation point of view there are two types of match options, which influences how these can practically be applied: i) match options that control a simple query rewrite step (without regard to what is actually contained in an index) and ii) match options that affect the lexical lookup of tokens in the index. According to this differentiation certain orders for match option application are more natural, because the implementation of a kind i) match option is less complex when that match option is applied in a query rewrite step prior to lexical lookup. The thesaurus options is the typical kind i) match option. Also, stop words (when considered as query expansions, as our spec does) are as well. Stemming is in-between the two kinds, as it typically involves a query-rewrite step, but also affects lexical lookup. On the other hand, wildcard, case, diacritics are typically of kind ii). We defined the match option application order as: 1. ftlanguage 2. ftwildcard 3. ftthesaurus 4. ftstem 5. ftcase 6. ftdiacritics 7. ftstopword This order is in conflict with the semantics of FTStopword and FTThesaurus, as we have defined it in 4.6.2, where stop word filtering and thesaurus expansion are done as query rewrite steps, hence precede all other options, except language. The current semantics assumes an order: 1. ftlanguage 2. ftthesaurus 3. ftstopword 4. ftstem, ftcase, ftdiacriatics, ftwildcard The order between the last four would be implementation-defined. (Actually, I would assume that ftcase, ftdiacritics and ftwildcard are commutative, hence there's no need to define an order between them). I can accept a partial order like above, but would opt for even more flexibility: implementations should be able to choose what order they implement also w.r.t. ftthesaurus vs. ftstopwords and ftstem vs. ftwildcard. Best, /Jochen
WG agreed at F2F to make order implementation-defined subject to the following constraints: (1) ftlanguage must come first (2) ftstem much come before ftcase and ftdiacritics