This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The rules for conversion of a float or double to a string do not give an unambiguous answer. The rule as stated in 17.1.2 is: Beyond the one required digit after the decimal point in the mantissa, there must be as many, but only as many, additional digits as are needed to uniquely distinguish the value from all other values for the datatype after rounding the final digit. Most of this comes straight from XPath 1.0, except for the phrase "after rounding the final digit". I've no idea what that phrase is supposed to mean. Final digit of what, pray? Having dealt with that, what does the rest mean? The particular test case that gave rise to this bug report is casthcds14. This generates a float value whose internal IEEE representation is x Consider, for example, the float whose internal IEEE representation is x58901723. The table below shows that this can be produced by parsing any string in the range 1.26743230E15 to 1.26743243E15. What is the correct string representation of this float value? The expected test results are stated as 1.26743233E15. Saxon (and Java) produce 1.26743237E15. The only way I can read the spec, however, suggests that the result should be 1.2674323E15. And even if we can decide what the rules really mean, is there a known efficient algorithm for implementing them? The Java rule is clear and unambiguous: [Where m is the mantissa] Let n be the unique integer such that 10^n <= m < 10^n+1; then let a be the mathematically exact quotient of m and 10^n so that 1 <= a < 10. The magnitude is then represented as the integer part of a, as a single decimal digit, followed by '.' ('\u002E'), followed by decimal digits representing the fractional part of a. I would propose that we adopt the Java rules. Table: Column 1: a string Column 2: the internal (hexadecimal) representation of the IEEE float produced by parsing this string as a float Column 3: the Java string representation of the float in column 2. As it happens, in this sequence 7 digits after the decimal point are always enough to distinguish the value. But will this always be the case, and how does one know? 1.26743200E15 = 58901720 = 1.26743196E15 1.26743201E15 = 58901720 = 1.26743196E15 1.26743202E15 = 58901720 = 1.26743196E15 1.26743203E15 = 58901720 = 1.26743196E15 1.26743204E15 = 58901721 = 1.2674321E15 1.26743205E15 = 58901721 = 1.2674321E15 1.26743206E15 = 58901721 = 1.2674321E15 1.26743207E15 = 58901721 = 1.2674321E15 1.26743208E15 = 58901721 = 1.2674321E15 1.26743209E15 = 58901721 = 1.2674321E15 1.26743210E15 = 58901721 = 1.2674321E15 1.26743211E15 = 58901721 = 1.2674321E15 1.26743212E15 = 58901721 = 1.2674321E15 1.26743213E15 = 58901721 = 1.2674321E15 1.26743214E15 = 58901721 = 1.2674321E15 1.26743215E15 = 58901721 = 1.2674321E15 1.26743216E15 = 58901721 = 1.2674321E15 1.26743217E15 = 58901722 = 1.26743223E15 1.26743218E15 = 58901722 = 1.26743223E15 1.26743219E15 = 58901722 = 1.26743223E15 1.26743220E15 = 58901722 = 1.26743223E15 1.26743221E15 = 58901722 = 1.26743223E15 1.26743222E15 = 58901722 = 1.26743223E15 1.26743223E15 = 58901722 = 1.26743223E15 1.26743224E15 = 58901722 = 1.26743223E15 1.26743225E15 = 58901722 = 1.26743223E15 1.26743226E15 = 58901722 = 1.26743223E15 1.26743227E15 = 58901722 = 1.26743223E15 1.26743228E15 = 58901722 = 1.26743223E15 1.26743229E15 = 58901722 = 1.26743223E15 1.26743230E15 = 58901723 = 1.26743237E15 1.26743231E15 = 58901723 = 1.26743237E15 1.26743232E15 = 58901723 = 1.26743237E15 1.26743233E15 = 58901723 = 1.26743237E15 1.26743234E15 = 58901723 = 1.26743237E15 1.26743235E15 = 58901723 = 1.26743237E15 1.26743236E15 = 58901723 = 1.26743237E15 1.26743237E15 = 58901723 = 1.26743237E15 1.26743238E15 = 58901723 = 1.26743237E15 1.26743239E15 = 58901723 = 1.26743237E15 1.26743240E15 = 58901723 = 1.26743237E15 1.26743241E15 = 58901723 = 1.26743237E15 1.26743242E15 = 58901723 = 1.26743237E15 1.26743243E15 = 58901723 = 1.26743237E15 1.26743244E15 = 58901724 = 1.2674325E15 1.26743245E15 = 58901724 = 1.2674325E15 1.26743246E15 = 58901724 = 1.2674325E15 1.26743247E15 = 58901724 = 1.2674325E15 1.26743248E15 = 58901724 = 1.2674325E15 1.26743249E15 = 58901724 = 1.2674325E15 1.26743250E15 = 58901724 = 1.2674325E15 1.26743251E15 = 58901724 = 1.2674325E15 1.26743252E15 = 58901724 = 1.2674325E15 1.26743253E15 = 58901724 = 1.2674325E15 1.26743254E15 = 58901724 = 1.2674325E15 1.26743255E15 = 58901724 = 1.2674325E15 1.26743256E15 = 58901724 = 1.2674325E15 1.26743257E15 = 58901725 = 1.26743264E15 1.26743258E15 = 58901725 = 1.26743264E15 1.26743259E15 = 58901725 = 1.26743264E15 1.26743260E15 = 58901725 = 1.26743264E15 1.26743261E15 = 58901725 = 1.26743264E15 1.26743262E15 = 58901725 = 1.26743264E15 1.26743263E15 = 58901725 = 1.26743264E15 1.26743264E15 = 58901725 = 1.26743264E15 1.26743265E15 = 58901725 = 1.26743264E15 1.26743266E15 = 58901725 = 1.26743264E15 1.26743267E15 = 58901725 = 1.26743264E15 1.26743268E15 = 58901725 = 1.26743264E15 1.26743269E15 = 58901725 = 1.26743264E15 1.26743270E15 = 58901725 = 1.26743264E15 1.26743271E15 = 58901726 = 1.26743277E15 1.26743272E15 = 58901726 = 1.26743277E15 1.26743273E15 = 58901726 = 1.26743277E15 1.26743274E15 = 58901726 = 1.26743277E15 1.26743275E15 = 58901726 = 1.26743277E15 1.26743276E15 = 58901726 = 1.26743277E15 1.26743277E15 = 58901726 = 1.26743277E15 1.26743278E15 = 58901726 = 1.26743277E15 1.26743279E15 = 58901726 = 1.26743277E15 1.26743280E15 = 58901726 = 1.26743277E15 1.26743281E15 = 58901726 = 1.26743277E15 1.26743282E15 = 58901726 = 1.26743277E15 1.26743283E15 = 58901726 = 1.26743277E15 1.26743284E15 = 58901727 = 1.2674329E15 1.26743285E15 = 58901727 = 1.2674329E15 1.26743286E15 = 58901727 = 1.2674329E15 1.26743287E15 = 58901727 = 1.2674329E15 1.26743288E15 = 58901727 = 1.2674329E15 1.26743289E15 = 58901727 = 1.2674329E15 1.26743290E15 = 58901727 = 1.2674329E15 1.26743291E15 = 58901727 = 1.2674329E15 1.26743292E15 = 58901727 = 1.2674329E15 1.26743293E15 = 58901727 = 1.2674329E15 1.26743294E15 = 58901727 = 1.2674329E15 1.26743295E15 = 58901727 = 1.2674329E15 1.26743296E15 = 58901727 = 1.2674329E15 1.26743297E15 = 58901727 = 1.2674329E15 1.26743298E15 = 58901728 = 1.26743304E15 1.26743299E15 = 58901728 = 1.26743304E15
After doing some further work on this, I'm going to propose three options. Option A: don't prescribe the rounding rules, except that the value must be a lexical representation of the original float or double (to allow round-tripping without loss of precision) Option B: specify that the value must be an exact decimal representation of the original binary value: no rounding or truncation of significant digits is allowed even if the resulting value would round-trip to the original value. (This appears to be what Java does, as distinct from what it says it does.) Option C: specify the rounding rules. Here is text that does that: In 17.1.2, delete the text "Besides these special values, the general form of the canonical form for xs:float and xs:double is a mantissa, .......... the value from all other values for the datatype after rounding the final digit." replacing it with: "For other values, the canonical representation consists of a minus sign '-' (x2D) if the value is negative, followed by the magnitude m (absolute value), represented as follows. Let n be the unique integer such that 10^n <= m < 10^(n+1); then let a be the mathematically exact quotient of m and 10^n so that 1 <= a < 10. The magnitude is then represented as the integer part of a, as a single decimal digit, followed by '.' (x2E), followed by decimal digits representing the fractional part of a, followed by the letter 'E' (x45), followed by a representation of n as a decimal integer, as produced by the rules for converting xs:integer to xs:string. Suppose that the string of decimal digits that exactly represents the fractional part of a is S. So long as the length of S is at least one, if rounding of the last digit in S results in a string that is a lexical representation of the original xs:float or xs:double value, then such rounding is performed; and this process is repeated. The rounding is done up or down according to the rules of the fn:round-half-to-even() function. Note: I haven't reproduced the Java words exactly here, because I don't actually think the Java words are precise. In any case, Java doesn't seem to do what the spec says it should. (They actually use the same phrase "as many, but only as many, more digits as are needed to uniquely distinguish" that appears in XPath 1.0, and then embellish them with a further explanation: but the explanation doesn't seem to explain the results actually produced). I couldn't find a description of the rules for C#, but the actual behaviour of C# appears to lose precision - the result of round-tripping a float to a string and back is not necessarily the original float, which is one of our objectives (but I don't know the language well and might have missed something). My aim here is to reflect the intent of the current words, and simply give a precise interpretation of what I think they were probably intended to mean - not necessarily the only interpretation possible.
The meeting on 24/1/2006 decided on option A, and I was asked to propose concrete wording to implement this. Here is the proposed wording. In F+), section 17.1.2, bullet 4 subbullet 3 add a new subbullet 1 * TV will be a string in the lexical space of xs:double or xs:float that when converted to an xs:double or xs:float under the rules of section 17.1.1 produces a value that is equal to SV. In addition, TV must satisfy the constraints in the following sub-bullets. In the current subsubbullet3, replace the current text including the Note that follows it by: * If SV is NaN, TV is the string "NaN". * If SV is positive or negative infinity, TV is the string "INF" or "-INF" respectively * In other cases, the result consists of a mantissa, which has the lexical form of an xs:decimal, followed by the letter "E", followed by an exponent which has the lexical form of an xs:integer. Leading zeroes and "+" signs are prohibited in the exponent. For the mantissa, there must be a decimal point, and there must be exactly one digit before the decimal point, which must be non-zero. The "+" sign is prohibited. There must be at least one digit after the decimal point. Apart from this mandatory digit, trailing zero digits are prohibited. Note: The above rules allow more than one representation of the same value. For example the xs:float value whose exact decimal representation is 1.26743223E15 might be represented by any of the strings "1.26743223E15", "1.26743222E15" or "1.26743224E15" (inter alia). It is implementation-dependent which of these representations is chosen.
There's a minor oversight in the proposed new first subbullet: change TV will be a string in the lexical space of xs:double or xs:float that when converted to an xs:double or xs:float under the rules of section 17.1.1 produces a value that is equal to SV. to read TV will be a string in the lexical space of xs:double or xs:float that when converted to an xs:double or xs:float under the rules of section 17.1.1 produces a value that is equal to SV, or is NaN if SV is NaN.
Closing bug because commenter has not objected to the resolution posted and more than two weeks have passed.