5633 – [FT] INCORRECT DISTANCE COMPUTATION IN FTDISTANCE

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 5633 - [FT] INCORRECT DISTANCE COMPUTATION IN FTDISTANCE

Summary: [FT] INCORRECT DISTANCE COMPUTATION IN FTDISTANCE

Status:	CLOSED FIXED

Alias:	None

Product:	XPath / XQuery / XSLT
Classification:	Unclassified
Component:	Full Text 1.0 (show other bugs)
Version:	Working drafts
Hardware:	PC Windows XP

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jim Melton
QA Contact:	Mailing list for public feedback on specs from XSL and XML Query WGs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2008-04-07 22:53 UTC by Thomas Baby
Modified:	2008-04-15 02:46 UTC (History)
CC List:	0 users

See Also:

Attachments

Description Thomas Baby 2008-04-07 22:53:42 UTC

The FTDistance functions rely on computing word distance, sentence distance, or paragraph distance, which are implemented in functions wordDistance, sentenceDistance, or paraDistance respectively. These functions do not return the absolute value of the distance, and this leads to some "funny" semantics in the presence of exclusions. 

For example, in function fts:ApplyFTWordDistanceAtMost, we say that for each stringExclude, there has to be at least one stringInclude from which it is not more than a certain word distance apart. 

for $stringExcl in $match/fts:stringExclude
where some $stringIncl in $match/fts:stringInclude
      satisfies fts:wordDistance(
                    $stringIncl/fts:tokenInfo,
                    $stringExcl/fts:tokenInfo
                ) <= $n
return $stringExcl

But, since distance returned by wordDistance is not absolute, the result can be different depending on whether the stringExclude occcurs "before" and "after" a stringInclude. Intuitively, this does not make sense.

Comment 1 Thomas Baby 2008-04-07 23:09:14 UTC

Minor error in the last paragraph. Here is the corrected paragraph:

But, since distance returned by wordDistance is not absolute, the result can be
different depending on whether the stringExclude occcurs "before" or "after" a
stringInclude. Intuitively, this does not make sense.

Comment 2 Mary Holstege 2008-04-10 15:09:03 UTC

Intuitive or not, this is a deliberate decision. In the face of overlapping tokens, the absolute value is not particularly more intuitive, and the absolute value gives the wrong answer. We order by token positions to produce determinate results, so I propose we close this bug with no action.

Comment 3 Michael Dyck 2008-04-10 17:43:58 UTC

Avoiding the term "absolute value", the problem is that, depending on the order in which you pass two args to fts:wordDistance(), it will (in general) return two different results, only one of which is correct. The onus is on the caller to pass the args in the order that delivers the correct result. But it does not always do so, as pointed out in the original comment.

Comment 4 Thomas Baby 2008-04-10 21:39:12 UTC

Thanks for your comments, Mary!

As I mentioned in the bug description, and as elaborated on by Michael Dyck, we seem to have an issue when handling a mix of stringIncludes and stringExcludes. So, until there is a resolution, I don't think the bug can be closed.

Comment 5 Thomas Baby 2008-04-15 02:40:04 UTC

The resolution is to modify functions xxDistance (xx=word, para, or sentence) to sort their inputs:

declare function fts:wordDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startPos - $sorted[1]/@endPos - 1
};
            

declare function fts:paraDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer 
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startPara - $sorted[1]/@endPara - 1
};
            

declare function fts:sentenceDistance (
             $tokenInfo1 as element(fts:tokenInfo),
             $tokenInfo2 as element(fts:tokenInfo) )
   as xs:integer
{
   (: Ensure tokens are in order :)
   let $sorted := 
     for $ti in ($tokenInfo1, $tokenInfo2) 
     order by $ti/@startPos ascending, $ti/@endPos ascending
     return $ti
   return
     (: -1 because we count starting at 0 :)
     $sorted[2]/@startSent - $sorted[1]/@endSent - 1
};

Comment 6 Thomas Baby 2008-04-15 02:43:57 UTC

The changes to the functions resolve the issue. So, closing the bug.