A Query Language for XML
Authors
One important application of XML is the interchange of electronic data
(EDI) between multiple data sources on the Web. We expect that in the near
future, organizations will export data in XML
to facilitate cooperation and exchange with other organizations. For example,
businesses may publish detailed technical data about their products and
services for consumption by potential customers,
or business partners may exchange
internal operational data on secure channels.
New opportunities will arise
for third-party information brokers
to integrate, clean, and aggregate
public data from multiple sources, or
to clean and transform
data to facilitate exchange among partners.
For such a vision to be realized,
the right tools for managing XML data must exist.
Such tools must support:
-
Data conversion, e.g., bidirectionally between relational or object-oriented databases and XML;
-
Extraction of data from large XML documents;
-
Data transformation, e.g., between XML data sources with different DTD's; and
-
Data integration of multiple XML sources.
Considerable experience for building such tools exists in the context
of relational and object-oriented data.
At their core, these tools use
a standard query language, either relational (SQL) or object-oriented
(OQL).
We argue that a standard query language for XML must also exist.
Given the historical influences of SQL and OQL, our recent research
on semistructured data, and XML's own characterisitics, we argue that the
language must have the following properties:
-
It must be declarative;
-
It must be "relationally complete", in particular, it must express joins;
-
It must be simple enough that existing
database techniques, such as optimizations, cost estimation, query
rewriting, etc., can be extended to the new language;
-
It must be able to extract data from existing XML documents and
construct new XML documents; and
-
It must support both ordered documents and unordered XML data.