This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The following test cases from testSet 'MS-Regex2006-07-15' have the wrong result when using the latest unicode database: group='reS17' test='reS17.v' (፩ not in \p{Nd}) group='reS38' test='reS38.v' (፱ not in \p{Nd}) group='reS51' test='reS51.i' (௦ in \p{Nd}) group='reT17' test='reT17.i' (፩ not in \p{Nd}) group='reT38' test='reT38.i' (፱ not in \p{Nd}) group='reT51' test='reT51.v' (௦ in \p{Nd}) group='reU6' test='reU6.i' (ȿ not in \p{Cn}) group='reZ004v' test='reZ004v.v' (፩-፱ not in \p{Nd}) XSD 1.1 allows implementors to use later versions of the unicode database. This leads to different results for the testcases above when using the latest version of the unicode database. When using the unicode database 3.1 (as referenced from the XSD 1.0 spec) the following testcase has the wrong result: group='reZ003v' test='reZ003v.v' (Ƞ in \p{Cn})
decided: to mark tests as depending on a specific unicode version if required.
Created attachment 1037 [details] Comparison of character categories between Unicode 4.0.0 and 6.0.0
I have confirmed these can be accounted for by differences between Unicode 4.0.0 and 6.0.0. I attach a file that summarises the differences between character categories in these two Unicode versions.
I can further confirm that after converting Saxon to use the Unicode 6.0.0 character categories, the tests that fail in the Microsoft regex test set are exactly those listed.
Fixed m=by making expected results conditional on Unicode version. Schema for metadata updated to accommodate this.