This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The FTDistance functions rely on computing word distance, sentence distance, or paragraph distance, which are implemented in functions wordDistance, sentenceDistance, or paraDistance respectively. These functions do not return the absolute value of the distance, and this leads to some "funny" semantics in the presence of exclusions. For example, in function fts:ApplyFTWordDistanceAtMost, we say that for each stringExclude, there has to be at least one stringInclude from which it is not more than a certain word distance apart. for $stringExcl in $match/fts:stringExclude where some $stringIncl in $match/fts:stringInclude satisfies fts:wordDistance( $stringIncl/fts:tokenInfo, $stringExcl/fts:tokenInfo ) <= $n return $stringExcl But, since distance returned by wordDistance is not absolute, the result can be different depending on whether the stringExclude occcurs "before" and "after" a stringInclude. Intuitively, this does not make sense.
Minor error in the last paragraph. Here is the corrected paragraph: But, since distance returned by wordDistance is not absolute, the result can be different depending on whether the stringExclude occcurs "before" or "after" a stringInclude. Intuitively, this does not make sense.
Intuitive or not, this is a deliberate decision. In the face of overlapping tokens, the absolute value is not particularly more intuitive, and the absolute value gives the wrong answer. We order by token positions to produce determinate results, so I propose we close this bug with no action.
Avoiding the term "absolute value", the problem is that, depending on the order in which you pass two args to fts:wordDistance(), it will (in general) return two different results, only one of which is correct. The onus is on the caller to pass the args in the order that delivers the correct result. But it does not always do so, as pointed out in the original comment.
Thanks for your comments, Mary! As I mentioned in the bug description, and as elaborated on by Michael Dyck, we seem to have an issue when handling a mix of stringIncludes and stringExcludes. So, until there is a resolution, I don't think the bug can be closed.
The resolution is to modify functions xxDistance (xx=word, para, or sentence) to sort their inputs: declare function fts:wordDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPos - $sorted[1]/@endPos - 1 }; declare function fts:paraDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startPara - $sorted[1]/@endPara - 1 }; declare function fts:sentenceDistance ( $tokenInfo1 as element(fts:tokenInfo), $tokenInfo2 as element(fts:tokenInfo) ) as xs:integer { (: Ensure tokens are in order :) let $sorted := for $ti in ($tokenInfo1, $tokenInfo2) order by $ti/@startPos ascending, $ti/@endPos ascending return $ti return (: -1 because we count starting at 0 :) $sorted[2]/@startSent - $sorted[1]/@endSent - 1 };
The changes to the functions resolve the issue. So, closing the bug.