This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
I originally sent this an an ordinary (non-bug) e-mail to the public-qt-comments@w3.org list hoping for some kind of comment. I prefer not to write bugs willy-nilly, but get at least one other person who agrees with me before filing a bug. However, in this case, after having gotten no comments, I'm submitting this as a bug anyway because I think it's warranted. That said.... As far as I can tell, nowhere in the spec does it say anything specifically about case sensitivity and the FTStopWordOption. Does the value of FTCaseOption affect stop-word comparisons? E.g.: let $x := <p>BEST OF TIMES</p> return $x contains text "BEST ANY TIMES" using stop words ("any") using case sensitive Should that query return true or false? The spec should say explicitly what the interaction between those two match options is supposed to be or at the very least explicitly state that it's implementation-defined.
This matter was discussed during the joint teleconference of the XML Query WG and the XSL WG on 2010-06-15. The result of your example query hinges on whether the query token "ANY" is a stop word. This in turn depends on whether "ANY" is in the list of stop words defined by the stop word option using stop words ("any") i.e., whether that involves a case-insensitive comparison. Interaction with the 'case' option is not the issue here, because it (like all match options) only affects the matching between query tokens and tokens *in the text being searched*, and the stop word "any" is not in the text being searched. Instead, the WGs decided that this is an implementation-defined matter. Specifically: An implementation-defined comparison is used to determine whether a query token appears in the collection of stop words defined by the applicable stop word option. I was directed to make the necessary changes to the Full Text spec, which I have done. Therefore, I am marking this issue resolved-FIXED. If you are satisfied with this outcome, please mark it CLOSED.
Even though this bug has been "resolved" by making the answer "implementation dependent," the issue, despite Mr. Dyck's statement to the contrary, really does have to do with the query tokens. So, for the record.... From the spec, section 3.4.7: > Stop words are tokens in the *query* that match any token in the text being searched. > Note the asymmetry in the stop word semantics: the property of being a stop word is only relevant to query terms, not to document terms. If my query were instead: let $x := <p>BEST OF TIMES</p> return $x contains text "best any times" using stop words ("any") then the query term would effectively become: "best .* times" using wildcards which matches "BEST OF TIMES" because: > The "stop words" option specifies that if a token is within the specified collection of stop words, it is removed from the search and any token may be substituted for it. Using .* as a replacement for each stop word satisfies the semantics of "any token may be substituted for it." Now, if we return to my original query: if "using case sensitive" were to apply to stop-word determination, then "ANY" would not be found in the list of stop-words of "any"; hence, "ANY" would not be considered a stop-word and therefore it would not be "removed from the search and [allow] any token [to] be substituted for it." So "BEST ANY TIMES" would not match "BEST OF TIMES" and the query would return false. If "using case sensitive" were not to be considered during stop-word determination, then "ANY" would be found in the list of stop-words of "any"; hence "ANY" would be considered a stop-word and therefore would be "removed from the search and [allow] any token [to] be substituted for it." So "BEST .* TIMES" would match "BEST OF TIMES" and the query would return true. Also, and very importantly, it's intentional and entirely the point that "any" is *not* in the text being searched. If the query were instead: let $x := <p>BEST ANY TIMES</p> return $x contains text "BEST ANY TIMES" using stop words ("any") using case sensitive then it would be equivalent to: let $x := <p>BEST OF TIMES</p> return $x contains text "BEST ANY TIMES" since the query text matches the search context tokens exactly whether "ANY" is considered a stop-word or not.
It seems a typo creeped in. My last example query should have been: let $x := <p>BEST ANY TIMES</p> return $x contains text "BEST ANY TIMES" Sorry. Regardless, my points still stand.
(In reply to comment #2) > Even though this bug has been "resolved" by making the answer "implementation > dependent," Implementation-defined, actually; "implementation-dependent" means something else. > the issue, despite Mr. Dyck's statement to the contrary, really > does have to do with the query tokens. Indeed it does. If you think I said it didn't, then it seems you misunderstood me. > [...] > If my query were instead: > > let $x := <p>BEST OF TIMES</p> > return $x contains text "best any times" > using stop words ("any") > > then the query term would effectively become: > > "best .* times" using wildcards > > which matches "BEST OF TIMES" [...] Agreed. > Now, if we return to my original query: if "using case sensitive" were to > apply to stop-word determination, then "ANY" would not be found in the list > of stop-words of "any"; hence, "ANY" would not be considered a stop-word and > therefore it would not be "removed from the search and [allow] any token [to] > be substituted for it." So "BEST ANY TIMES" would not match "BEST OF TIMES" > and the query would return false. Agreed. > If "using case sensitive" were not to be considered during stop-word > determination, then "ANY" would be found in the list of stop-words of "any"; > hence "ANY" would be considered a stop-word and therefore would be "removed > from the search and [allow] any token [to] be substituted for it." So > "BEST .* TIMES" would match "BEST OF TIMES" and the query would return true. Agreed, more or less. In the second paragraph of my comment #1, I summarized what I saw to be the point of your example, and I believe it's consistent with what you've said above. My subsequent point was that, although the matter certainly hinges on whether a particular comparison is case-[in]sensitive, it's incorrect to bring the case option into the discussion, because the case option is not defined to govern comparisons of the two things being compared here. Specifically, the case option governs the matching of a query token vs. a token in the text being searched, not the comparison of a query token vs. a stop word in the collection of stop words defined by a stop word option > Also, and very importantly, it's intentional and entirely the point that "any" > is *not* in the text being searched. Agreed. I think I see the problem. When I said: and the stop word "any" is not in the text being searched. I did *not* mean: and the token "any" does not occur in the text being searched. Rather, I meant something more like: and, in the stop word option using stop words ("any") that "any" is a StringLiteral in an FTStopWordOption, not a token in the text being searched (and so, is not something that the case option is defined to deal with).
I'd be happy to mark this as CLOSED, but I can't see the updated specification from the "outside".
Verified as fixed.