This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Hi again, I noticed that the evaluation of a combination of several match options with the Thesaurus may lead to different interpretations. My major question is if other match options influence the way the thesaurus works. An example: "improving" ftcontains "improve" with stemming This query should return true. If we add a thesaurus here.. "improving" ftcontains "optimizing" with stemming with thesaurus.. ...and if the thesaurus resvolves "optimize" to "improve", I am wondering if this query will return true, as the thesaurus entries would have to be stemmed as well. The same problem/question occurs with the default match options. E.g.: Are diacritics to be removed in the thesaurus? As a Thesaurus can get pretty large, similar to index structures, I would recommend to apply all match options while building and BEFORE querying the Thesaurus - otherwise, Thesaurus requests could get pretty expensive. This is why I would propose to extend section 3.4 of the specification: 1. The Language Option must be applied first 2. The Stemming Option must be applied before the Case Option and the Diacritics Option -> 3. The Thesaurus Option must be applied after all other options This will also make sense, as the Thesaurus might not be accessed at all if the query and document term equal anyway... "A" ftcontains "A" with thesaurus... -> should yields true without even checking the thesaurus I just discovered the following sentence in the first section of the Specs.. "The WGs particularly solicit feedback regarding how thesauri are to be used in combination." So I hope that my discussion here contributes a little to this issue. Christian
Hi Mary, "full-text-composability-queries-results-q3.xq" is a test suite examples which looks ambiguous to me: ... "quote.{0,5}" with wildcards with thesaurus ... The implementation could... a) either yield all thesaurus entries that match the "quote.{0,5}" wildcard expression - which implies that the thesaurus itself must be able to handle wildcards and is aware of the other match options b) or only look up "quote.{0,5}" in the thesaurus What do you think? Christian
Christian, I have corrected full-text-composability-queries-results-q3 and q3b by adding a second empty result to each - to cover implementations that process the thesaurus match option first. I believe we decided that these were the only test cases where the order of processing for the thesaurus and wildcard match options was an issue, so I am marking this bug fixed. If you agree with my fix, would you mark this bug cloased? Pat
Thanks Pat, I've tested the latest changes, and closed this bug. Christian