Impact of precisionDecimal on XPath and XQuery
Don Chamberlin, IBM
16 May 2006
XML Schema 1.1
[1] introduces a new primitive datatype called
xs:precisionDecimal.
The new type is independent of
xs:decimal in the sense that neither is derived from the
other. A value of type
xs:precisionDecimal includes both
a numeric value and a precision (number of significant digits to the
right of the decimal point.) For example, the precision of 0.12 might
be 2 while the precision of 0.1200 might be 4. The precision of 5000
might be 0 if all its digits are significant, or might be -3 if only
its first digit is significant. The new
xs:precisionDecimal type is defined to be aligned with
the floating-point decimal type expected to be defined in a
forthcoming revision of IEEE 754 called "754r"
[2]. Like
xs:float and
xs:double, the new
xs:precisionDecimal type
includes the special values negative zero, +INF, -INF, and NaN. XML
Schema 1.1 defines two lexical representations for
xs:precisionDecimal called
"
decimalPtNumeral" (example: -0.0054) and
"
scientificNotationNumeral" (example: -5.4E-3). XML
Schema 1.1 defines a ‘lexical mapping’ that maps a
lexical representation in either
decimalPtNumeral or
scientificNotationNumeral form into the value space of
xs:precisionDecimal (inferring the precision from the
number of digits after the decimal point). XML Schema 1.1 also defines
a ‘canonical mapping’ that maps a precisionDecimal
value into its canonical lexical representation, which may be either a
decimalPtNumeral or a
scientificNotationNumeral (generally scientific
notation is used for very small or very large values; otherwise the
decimalPtNumeral format is used.)
More information about the
precisionDecimal type,
including motivation for why this type was introduced, can be found in
[3].
This memo explores the impact of the new
xs:precisionDecimal type on XPath and XQuery. It
identifies some issues that will need to be resolved and explores some
alternative solutions. The following conventions are used:
- The xs: prefix is generally omitted from the names
of types (xs:decimal etc.)
- "precisionDecimal" is abbreviated as
"pDecimal"
- "We" refers to the joint Query and XSLT working groups.
(1) We will need to define where pDecimal fits into the
numeric promotion hierarchy. Proposal: put pDecimal
between decimal and float. In other words,
decimal is promotable to pDecimal, and
pDecimal is promotable to float. This has
the following implications:
(a) Any function that expects a pDecimal parameter can be
called with a decimal or integer argument.
(b) Any function that expects a decimal parameter cannot
be called with a pDecimal argument. Currently, there are
no such functions in the XPath/XQuery built-in function library.
(2) We will need to define the rules for how decimal and
integer values are promoted to pDecimal
values. One possible way to do this is to convert the input value to
its canonical lexical form and then apply the lexical mapping defined
by XML Schema 1.1 to convert the lexical form into a
pDecimal value. Another possible way is to assume a
fixed, implementation-defined precision for each of the existing
types, and adopt this as the precision of the resulting
pDecimal.
(3) In F&O, we will need to define the semantics of the following
arithmetic functions on the
pDecimal type:
op:numeric-add
op:numeric-subtract
op:numeric-multiply
op:numeric-divide
op:numeric-mod
op:numeric-unary-plus
op:numeric-unary-minus
Part of this definition will involve specifying the precision of the
result, based on the precision of the operands. Presumably this can be
done by reference to the forthcoming IEEE 754r standard. In general,
the precision of the result of a decimal operation is at least as
great as the greater of the precisions of its operands. For example,
if (1000 with precision -3) is added to (.001 with precision 3), the
result is (1000.001 with precision 3). Each operator (addition,
multiplication, etc.) defines its own precision rules. It is also
necessary to deal with many special cases (what is the result of -INF
modulus negative zero, etc.) Again, this can be done by reference to
IEEE 754r.
One important case deals with arithmetic between a
pDecimal value and a value of some other type. For
example, suppose that a pDecimal value is added to a
decimal value (whose precision is not explicit). Should
the result get its precision from the pDecimal operand?
Or should the decimal operand be promoted to a
pDecimal value (with somewhat arbitrary precision) and
then the precision of the result computed based on
pDecimal rules?
An expert on XQuery Functions and Operators should read the draft 754r
standard carefully. It contains some surprises, which apply to both
binary and decimal formats, such as the following:
In 754r Section 5.10 we read "totalorder(-NaN, number) is true where
-NaN represents a NaN with negative sign bit". (But there is no
negative NaN value in XML Schema.)
In 754r Section 6.2 we read "Two different kinds of NaN, signaling and
quiet, shall be supported in all operations." (This concept also does
not exist in XML Schema. Possibly the distinction between signalling
and quiet NaN's can be ignored in XPath/XQuery.)
(4) In F&O, we will need to define the behavior of the following
comparison functions on the
pDecimal type:
op:numeric-equal
op:numeric-less-than
op:numeric-greater-than
These semantics can be based on the ordering rules for
pDecimal given in Schema 1.1. Basically, two
pDecimal values are compared based on their numerical
values, disregarding precision (for example, 5 is equal to 5.0000).
Two zeros with different signs are equal. INF is equal to itself and
greater than all other values except NaN. -INF is equal to itself and
less than all other values except NaN. NaN is not comparable to any
other value, including itself.
(5) We will need to define how pDecimal values
participate in "order by" operations. Presumably, when ordering two
values of dissimilar type, the values are promoted to their "greatest
common type." For this purpose, is pDecimal INF
considered equal to float and double INF?
Are pDecimal NaN values considered equivalent to
float and double NaN values? Does a
pDecimal NaN have a sign for ordering purposes? (I think
754r says Yes and Schema 1.1 says No.) In considering these questions,
we should probably investigate how the same questions are being
handled in the ISO SQL Standard.
(6) We will need to define how to represent a literal
pDecimal value in a query. In the simplest approach, we
would not define a new literal format, but simply rely on the
constructor function named xs:precisionDecimal() that
accepts any valid lexical representation and applies the lexical
mapping defined by XML Schema 1.1 (example:
xs:precisionDecimal("5.000") maps into a
pDecimal value with a precision of 3). We could consider
additional measures such as defining a new literal format, similar to
our current double literals but using a different letter
for the exponent, such as "1.23D-5".
(7) We will need to define the semantics of the following numeric
functions for
pDecimal operands, including the precision
of the result:
- fn:ceiling()
- fn:floor()
- fn:round()
- fn:round-half-to-even()
- fn:avg()
- fn:max()
- fn:min()
- fn:sum()
Again, the IEEE 754r draft has something to say about these functions,
which should be considered carefully. We might also consider adding a
new function that implements the rounding mode called "Round to
Nearest, Ties Away from Zero" (defined in IEEE 754r). For aggregating
functions such as fn:max and fn:avg, we will need to give special
attention to the case where pDecimal values are mixed
with values of other types.
(8) We will need to define general rules for overflow and underflow in
pDecimal arithmetic, including intermediate results.
These rules might be specified to be consistent with the current rules
for decimal or the current rules for double
(these sets of rules are not the same). The pDecimal type
probably resembles double more than decimal
because of the presence of INF and -INF. The IEEE 754r draft also
talks about overflow and underflow semantics, which are influenced by
a concept called "rounding mode" that does not currently exist in
XPath/XQuery.
(9) We will need to define the casting rules for
pDecimal. What types can be cast into
pDecimal and into what types can it be cast? I expect
that the rules will be similar to those for the existing
decimal type. On casting other types into
pDecimal, we need to specify how the precision of the
result is determined. On casting pDecimal into other
types, we need to specify what happens to special values like negative
zero, INF, and NaN. Casting pDecimal into string should
probably be based on the canonical mapping defined by XML Schema 1.1.
This mapping sometimes produces "decimalPtNumeral"
notation (example: -0.0054) and sometimes produces
"scientificNotationNumeral" notation (example:
-5.4E-3). We may choose to define additional specialized
cast-to-string functions that force one or the other of these
notations (but note that forcing decimalPtNumeral
notation may result in very long strings.)
(10) Note that the XQuery rules for SequenceType Matching respect
subtypes (for example, an integer value matches the
decimal type) but not promotion (for example, a
decimal value does not match the double
type). Essentially, decimal and pDecimal are
different types in the same sense that float and
double are different types. This is not a problem, but we
should be aware of the consequences, including the following:
(a) If a variable is declared to have type pDecimal, it
cannot be assigned a decimal value, and vice versa.
(b) "instance of pDecimal" returns false for
decimal values and vice versa.
(c) typeswitch expressions need separate branches for
decimal and pDecimal.
(d) "treat as pDecimal" will not accept
decimal values at run-time, and vice versa
(e) In a path expression, the node-test element(*,
xs:decimal) will not match an element of type
pDecimal and vice versa. Similarly, the node-test
schema-element(IQ) will not match an element named IQ of type
xs:decimal if the schema definition of IQ calls for
pDecimal, and vice versa.
(11) We will need to define rules for the minimum
pDecimal precision required of conforming implementations
(XML Schema 1.1 has some rules about this, which we could choose to
copy.)
[1]
World Wide Web Consortium (W3C).
2006.
XML
Schema 1.1 Part 2: Datatypes, ed.
David Peterson,
Paul V. Biron,
Ashok Malhotra,
and
C. M. Sperberg-McQueen.
W3C Working Draft 17 February 2006
[Cambridge, Sophia-Antipolis, and Tokyo]: World Wide Web Consortium.
<URL:http://www.w3.org/TR/xmlschema11-2/>
[2]
IEEE (Institute of Electrical and Electronics Engineers).
2001-2007.
Draft Standard for Floating-Point Arithmetic P754.
Various drafts, 2001-2007.
At the time of publication, the most recent drafts, reflecting the
resolution of ballot comments, are not publicly available. See
<URL:http://754r.ucbtest.org/drafts/archive/> for an archive of
older committee drafts.
[3]
Cowlishaw, Michael.
2007.
Decimal Arithmetic FAQ
(Frequently Asked Questions).
21 April 2007.
On the Web at
<URL:http://www2.hursley.ibm.com/decimal/decifaq.html>.