This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Dear all, I had a closer look on the test suite and the XQFT Use Cases now and tried to answer them with our BaseX full-text implementation. That's when I came across some possible inconsistencies.. === XQFT Use Cases, 16 May 2008 === - 2.2.7: ....[. ftcontains "improv.* the .... testing" entire content] The query contains the token "improv.*", but the wildcard option is not specified here. If I add "with wildcards", I get the expected results - 4.2.1: ....ftcontains "improve" ftand "web" ftand "usability" with stemming.... If I get the grammar right, the stemming option is only applied on the last token in this example; I get the correct results if I parenthesize all the search tokens: ....ftcontains ("improve" ftand "web" ftand "usability") with stemming.... - 5.2.1: Getting pedantic.. The "s" in the "solution in XQuery" header is written in lower-case (I wrote a simple XQuery script to extract the queries, and this one was left out) - 16.2.9: ...for $cont := $book/content... "for" should probably be replaced with "let" (or ":=" with "in"). - 17.2.4: ...filter( $e/node.... / ....return filter($book.... I can parse this one if I precede the function call with the "local:" prefix. === XQFT Test Suite === Here I mainly stumbled across some minor serialization issues: - As far as I know, new lines inside attribute values are removed while parsing XML documents, so I expected the attribute.. ....url="http://www.useit.com/papers/heuristic /heuristic_list.html">Ten Usability.... ...to yield.. ....url="http://www.useit.com/papers/heuristic /heuristic_list.html">.... The same applies to two other attributes: ....url="http://usability.gov .... /guidelines/index.html".... ....shortTitle="Usabilityguy Manuscript .... Guide">.... Next - another bagatelle - the attribute ....normalize= "1990/1999".... spans two lines whereas Saxon, Qizx, or BaseX keep it in one line: ....<componentDate normalize="1990/1999">1990-1999.... Last but not least, the two test-cases element-queries-results-q7.xq and element-queries-results-q7b.xq use the wildcard and "entire content" option from the above mentioned use-case query (2.2.7). I've noticed another possible inconsistency in the Test Suite queries and Use Cases: many examples, esp. the XPath examples, use the count() function to check if an ftcontains operator yields results. As ftcontains returns a boolean value, I assume that the count function will always return 1.. count( 'abc' ftcontains 'def' ) > 0 -> true That's all I found for now - thanks for listening. Regards, Christian
Thanks for entering this into Bugzilla, Christian. We will take a close look and back to you as soon as possible. Pat
Christian, The following changes are being made to the Full Text Use Cases. I am treating them as editorial changes therefore I am not waiting for Full Text Task Force approval, but I will ask for their review once published internally. Corrected 2.2.7 Q7 Entire Element Content Query Added "with wildcards" to the XQuery and XPath solutions. Corrected 4.2.1 Q1 Query on Attribute Moved "with stemming" to after "improve" and added parentheses around ("improve" with stemming ftand "web" ftand "usability") to make the distance operator applicable to all 3 operands in the XQuery and XPath solutions. Corrected 5.2.1 Q1 One Character Suffix Wildcard Query Capitalized the "S" in "Solution in XQuery". Corrected 16.2.9 Q9 Query Using an XQuery Expression to Determine the Number of Words Allowed in a Window Changed the 2nd "for" to "let" in the XQuery solution. Corrected 17.2.4 Q4 Query Combining Score and XML Structure with a Conditional Return Added the "local" prefix to 2 filter functional calls in the XQuery solution. The changes will appear internally to W3C Members after the next build of the Use Cases. The changes will appear to the public in the next public release. Thank you so much for pointing out these errors. We will address the other issues: >0 and text case issues separately. Pat Case, Member XQuery Full Text Task Force
Christian, We have eliminated unwanted whitespace in the start tags in the expected results files for the test cases in the FT Test Suite (ELEMENT through WILDCARD). Thanks for pointing this out as well. The count > 0 issue remains. Pat Case, Member XQuery Full Text Task Force
Pat, thank you for the quick response. I've had another look at the test cases, and that's what I noticed: [1] Concerning the attribute serialization, I still get other results.. I must say sorry as some whitespaces seem to have get lost in my bug report. This is what's find in several XQFTTS results.. <title shortTitle="Usabilityguy Manuscript Guide">John ..and this is what I expect/get: <title shortTitle="Usabilityguy Manuscript Guide">John ... The same observation applies to the attributes, containing "heuristic_list.html" and "/guidelines": <citation url="http://www.useit.com/papers/heuristic /heuristic_list.html">Ten Usability <citation url="http://usability.gov /guidelines/index.html"> Research-Based This special case is due to the shredding of attribute nodes. The following document/XQuery.. <a b='c d'/> is supposed to return <a b="c d"/> The best approach to fix this trivial one might be to modify the source file and remove the newline and indentation from the attributes. [2] In the XQFTTSCatalog.xml file, "Textsource" occurrences should be replaced with "Testsource" [3] /UseCase-OTHER/other-queries-results-q1.xq still contains the old XQFT Use Case example Feel free to ask for more, thank you, Christian, BaseX Team http://www.basex.org
...a last one for today.. [4] The files in "ExpectedTestResults/Examples" are to be expected in "ExpectedTestResults/Examples/2.2.2" (or, alternatively, XQFTTSCatalog.xml should be fixed the other way round) Regards, Christian, BaseX Team http://www.basex.org
Christian, New items 1 & 3 done. 1. Eliminated whitespace in the attributes in the use cases source file as had done previously for the expected results files. 3. Updated /UseCase-OTHER/other-queries-results-q1.xq to the corrected XQFT Use Case query. Count > 0 and New items 2 & 4 are still awaiting action. Pat Case, Member XQuery Full Text Task Force
Christian, New items 2 & 4 are done. [2] In the XQFTTSCatalog.xml file, "Textsource" occurrences should be replaced with "Testsource" --We have made the correction to "Testsource". [4] The files in "ExpectedTestResults/Examples" are to be expected in "ExpectedTestResults/Examples/2.2.2" (or, alternatively, XQFTTSCatalog.xml should be fixed the other way round) --We have inserted a 2.2.2 sub-directory under "ExpectedTestResults/Examples" and moved the 3 existing files into it. count > 0 --We have decided to remove count > 0 from the vast majority of XQuery and XPath solutions. We had added it as a filter, but it is no longer needed. --Please track our progress in Bug 5829. Old item 4.2.1 --We have recorrected 4.2.1 Q1 Query on Attribute Removing the added parentheses around ("improve" with stemming ftand "web" ftand "usability") is now "improve" with stemming ftand "web" ftand "usability" --By default the distance operator is applicable to any number of FTANDs, so the parentheses are superfluous and we try to only use parentheses in the use cases where they are significant. We think we have addressed all your concerns. If and when you agree (it will take about 2 weeks to get the new solutions into the test cases, you might want to wait to see them), please close this bug. Pat Case, Library of Congress and Member FTTF
Pat, I've finally closed this bug. A last thing I noticed (which is actually based on a little typo by myself): "Testsources" (in XQFTTSCatalog.xml) must be changed to "TestSources" to support tests on Linux systems. Thanks, Christian, BaseX Team http://www.basex.org