Impact of precisionDecimal on XPath and XQuery

Don Chamberlin, IBM

16 May 2006

XML Schema 1.1 [1] introduces a new primitive datatype called xs:precisionDecimal.

The new type is independent of xs:decimal in the sense that neither is derived from the other. A value of type xs:precisionDecimal includes both a numeric value and a precision (number of significant digits to the right of the decimal point.) For example, the precision of 0.12 might be 2 while the precision of 0.1200 might be 4. The precision of 5000 might be 0 if all its digits are significant, or might be -3 if only its first digit is significant. The new xs:precisionDecimal type is defined to be aligned with the floating-point decimal type expected to be defined in a forthcoming revision of IEEE 754 called "754r" [2]. Like xs:float and xs:double, the new xs:precisionDecimal type includes the special values negative zero, +INF, -INF, and NaN. XML Schema 1.1 defines two lexical representations for xs:precisionDecimal called "decimalPtNumeral" (example: -0.0054) and "scientificNotationNumeral" (example: -5.4E-3). XML Schema 1.1 defines a ‘lexical mapping’ that maps a lexical representation in either decimalPtNumeral or scientificNotationNumeral form into the value space of xs:precisionDecimal (inferring the precision from the number of digits after the decimal point). XML Schema 1.1 also defines a ‘canonical mapping’ that maps a precisionDecimal value into its canonical lexical representation, which may be either a decimalPtNumeral or a scientificNotationNumeral (generally scientific notation is used for very small or very large values; otherwise the decimalPtNumeral format is used.)

More information about the precisionDecimal type, including motivation for why this type was introduced, can be found in [3].

This memo explores the impact of the new xs:precisionDecimal type on XPath and XQuery. It identifies some issues that will need to be resolved and explores some alternative solutions. The following conventions are used:

The xs: prefix is generally omitted from the names of types (xs:decimal etc.)
"precisionDecimal" is abbreviated as "pDecimal"
"We" refers to the joint Query and XSLT working groups.

(1) We will need to define where pDecimal fits into the numeric promotion hierarchy. Proposal: put pDecimal between decimal and float. In other words, decimal is promotable to pDecimal, and pDecimal is promotable to float. This has the following implications:

(a) Any function that expects a pDecimal parameter can be called with a decimal or integer argument.

(b) Any function that expects a decimal parameter cannot be called with a pDecimal argument. Currently, there are no such functions in the XPath/XQuery built-in function library.

(2) We will need to define the rules for how decimal and integer values are promoted to pDecimal values. One possible way to do this is to convert the input value to its canonical lexical form and then apply the lexical mapping defined by XML Schema 1.1 to convert the lexical form into a pDecimal value. Another possible way is to assume a fixed, implementation-defined precision for each of the existing types, and adopt this as the precision of the resulting pDecimal.

(3) In F&O, we will need to define the semantics of the following arithmetic functions on the pDecimal type:

op:numeric-add

op:numeric-subtract

op:numeric-multiply

op:numeric-divide

op:numeric-mod

op:numeric-unary-plus

op:numeric-unary-minus

Part of this definition will involve specifying the precision of the result, based on the precision of the operands. Presumably this can be done by reference to the forthcoming IEEE 754r standard. In general, the precision of the result of a decimal operation is at least as great as the greater of the precisions of its operands. For example, if (1000 with precision -3) is added to (.001 with precision 3), the result is (1000.001 with precision 3). Each operator (addition, multiplication, etc.) defines its own precision rules. It is also necessary to deal with many special cases (what is the result of -INF modulus negative zero, etc.) Again, this can be done by reference to IEEE 754r.

One important case deals with arithmetic between a pDecimal value and a value of some other type. For example, suppose that a pDecimal value is added to a decimal value (whose precision is not explicit). Should the result get its precision from the pDecimal operand? Or should the decimal operand be promoted to a pDecimal value (with somewhat arbitrary precision) and then the precision of the result computed based on pDecimal rules?

An expert on XQuery Functions and Operators should read the draft 754r standard carefully. It contains some surprises, which apply to both binary and decimal formats, such as the following:

In 754r Section 5.10 we read "totalorder(-NaN, number) is true where -NaN represents a NaN with negative sign bit". (But there is no negative NaN value in XML Schema.)

In 754r Section 6.2 we read "Two different kinds of NaN, signaling and quiet, shall be supported in all operations." (This concept also does not exist in XML Schema. Possibly the distinction between signalling and quiet NaN's can be ignored in XPath/XQuery.)

(4) In F&O, we will need to define the behavior of the following comparison functions on the pDecimal type:

op:numeric-equal

op:numeric-less-than

op:numeric-greater-than

These semantics can be based on the ordering rules for pDecimal given in Schema 1.1. Basically, two pDecimal values are compared based on their numerical values, disregarding precision (for example, 5 is equal to 5.0000). Two zeros with different signs are equal. INF is equal to itself and greater than all other values except NaN. -INF is equal to itself and less than all other values except NaN. NaN is not comparable to any other value, including itself.

(5) We will need to define how pDecimal values participate in "order by" operations. Presumably, when ordering two values of dissimilar type, the values are promoted to their "greatest common type." For this purpose, is pDecimal INF considered equal to float and double INF? Are pDecimal NaN values considered equivalent to float and double NaN values? Does a pDecimal NaN have a sign for ordering purposes? (I think 754r says Yes and Schema 1.1 says No.) In considering these questions, we should probably investigate how the same questions are being handled in the ISO SQL Standard.

(6) We will need to define how to represent a literal pDecimal value in a query. In the simplest approach, we would not define a new literal format, but simply rely on the constructor function named xs:precisionDecimal() that accepts any valid lexical representation and applies the lexical mapping defined by XML Schema 1.1 (example: xs:precisionDecimal("5.000") maps into a pDecimal value with a precision of 3). We could consider additional measures such as defining a new literal format, similar to our current double literals but using a different letter for the exponent, such as "1.23D-5".

(7) We will need to define the semantics of the following numeric functions for pDecimal operands, including the precision of the result:

fn:ceiling()
fn:floor()
fn:round()
fn:round-half-to-even()
fn:avg()
fn:max()
fn:min()
fn:sum()

Again, the IEEE 754r draft has something to say about these functions, which should be considered carefully. We might also consider adding a new function that implements the rounding mode called "Round to Nearest, Ties Away from Zero" (defined in IEEE 754r). For aggregating functions such as fn:max and fn:avg, we will need to give special attention to the case where pDecimal values are mixed with values of other types.

(8) We will need to define general rules for overflow and underflow in pDecimal arithmetic, including intermediate results. These rules might be specified to be consistent with the current rules for decimal or the current rules for double (these sets of rules are not the same). The pDecimal type probably resembles double more than decimal because of the presence of INF and -INF. The IEEE 754r draft also talks about overflow and underflow semantics, which are influenced by a concept called "rounding mode" that does not currently exist in XPath/XQuery.

(9) We will need to define the casting rules for pDecimal. What types can be cast into pDecimal and into what types can it be cast? I expect that the rules will be similar to those for the existing decimal type. On casting other types into pDecimal, we need to specify how the precision of the result is determined. On casting pDecimal into other types, we need to specify what happens to special values like negative zero, INF, and NaN. Casting pDecimal into string should probably be based on the canonical mapping defined by XML Schema 1.1. This mapping sometimes produces "decimalPtNumeral" notation (example: -0.0054) and sometimes produces "scientificNotationNumeral" notation (example: -5.4E-3). We may choose to define additional specialized cast-to-string functions that force one or the other of these notations (but note that forcing decimalPtNumeral notation may result in very long strings.)

(10) Note that the XQuery rules for SequenceType Matching respect subtypes (for example, an integer value matches the decimal type) but not promotion (for example, a decimal value does not match the double type). Essentially, decimal and pDecimal are different types in the same sense that float and double are different types. This is not a problem, but we should be aware of the consequences, including the following:

(a) If a variable is declared to have type pDecimal, it cannot be assigned a decimal value, and vice versa.

(b) "instance of pDecimal" returns false for decimal values and vice versa.

(d) "treat as pDecimal" will not accept decimal values at run-time, and vice versa

(e) In a path expression, the node-test element(*, xs:decimal) will not match an element of type pDecimal and vice versa. Similarly, the node-test schema-element(IQ) will not match an element named IQ of type xs:decimal if the schema definition of IQ calls for pDecimal, and vice versa.

(11) We will need to define rules for the minimum pDecimal precision required of conforming implementations (XML Schema 1.1 has some rules about this, which we could choose to copy.)

References

[1] World Wide Web Consortium (W3C). 2006. XML Schema 1.1 Part 2: Datatypes, ed. David Peterson, Paul V. Biron, Ashok Malhotra, and C. M. Sperberg-McQueen. W3C Working Draft 17 February 2006 [Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium. <URL:http://www.w3.org/TR/xmlschema11-2/>

[2] IEEE (Institute of Electrical and Electronics Engineers). 2001-2007. Draft Standard for Floating-Point Arithmetic P754. Various drafts, 2001-2007. At the time of publication, the most recent drafts, reflecting the resolution of ballot comments, are not publicly available. See <URL:http://754r.ucbtest.org/drafts/archive/> for an archive of older committee drafts.

[3] Cowlishaw, Michael. 2007. Decimal Arithmetic FAQ (Frequently Asked Questions). 21 April 2007. On the Web at <URL:http://www2.hursley.ibm.com/decimal/decifaq.html>.