Copyright © 2004 W3C® ( MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, and document use rules apply.
This document specifies use cases, requirements, and objectives for an RDF query language and data access protocol. It suggests how an RDF query language and data access protocol could be used in the construction of novel, useful Semantic Web applications in areas like web publishing, personal information management, transportation, and tourism.
This is a second Public Working Draft of the Data Access Use Cases and Requirements for review by W3C Members and other interested parties. An HTML diff shows the differences between this document and the previous version. Please send comments to public-rdf-dawg-comments@w3.org, a mailing list with a public archive.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document has been produced by the RDF Data Access Working Group as part of the Semantic Web Activity in the W3C Technology & Society Domain. It reflects the best effort of the editor to incorporate input from various members of the WG, but is not yet endorsed by the WG as a whole. In particular, the design objectives are in development. The status of each design objective indicates whether it has been adopted by the WG. The requirements have all been accepted by the working group.
This document was produced under the 5 February 2004 W3C Patent Policy. The Working Group maintains a public list of patent disclosures relevant to this document; that page also includes instructions for disclosing [and excluding] a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) with respect to this specification should disclose the information in accordance with section 6 of the W3C Patent Policy.
Per section 4 of the W3C Patent Policy, Working Group participants have 150 days from the title page date of this document to exclude essential claims from the W3C RF licensing requirements with respect to this document series. Exclusions are with respect to the exclusion reference document, defined by the W3C Patent Policy to be the latest version of a document in this series that is published no later than 90 days after the title page date of this document.
The W3C's Semantic Web Activity is based on RDF's flexibility as a means of representing data. While there are several standards covering RDF itself, there has not yet been any work done to create standards for querying or accessing RDF data. There is no formal, publicly standardized language for querying RDF information. Likewise, there is no formal, publicly standardized data access protocol for interacting with remote or local RDF storage servers.
Despite the lack of standards, developers in commercial and in open source projects have created many query languages for RDF data. But these languages lack both a common syntax and a common semantics. In fact, the extant query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. The existing languages also exhibit a range of extensibility features and built-in capabilities, including inferencing and distributed query.
Further, there may be as many different methods of accessing remote RDF storage servers as there are distinct RDF storage server projects. Even where the basic access protocol is standardized in some sense—HTTP, SOAP, or XML-RPC—there is little common ground upon which to develop generic client support to access a wide variety of such servers.
The following use cases characterize some of the most important and most common motivations behind the development of existing RDF query languages and access protocols. The use cases, in turn, inform decisions about requirements, that is, the critical features that a standard RDF query language and data access protocol require, as well as design objectives that aren't on the critical path.
Each use case describes a user-oriented context in which the RDF query language or protocol or both are used to solve a real problem. However, it is not necessarily the case that the query language or data access protocol will directly address all of these use cases. (Some of the use cases contain illustrative RDF in Notation 3 form; consult Primer: Getting into the semantic web and RDF using N3 or Notation3: A Rough Guide to N3 for more details about N3.)
George wants to send email to a person named "Johnny Lee Outlaw". George's personal address book, which includes contact information for a "Johnny Lee Outlaw", is stored in RDF using the FOAF Vocabulary Specification.
@prefix foaf: <http://xmlns.com/foaf/0.1/> . [] foaf:name "Johnny Lee Outlaw" ; foaf:mbox <mailto:jlow@example.com> .
George's email client queries his local address book service and, since
there is only one match, uses the query's result to populate the
To:
field.
Motivates: RDF Graph Pattern Matching, Variable Binding Results.
Endeavour, a dealer specializing in British motorcycles, maintains a database that describes spare and replacement parts, including their properties and relationships. Ev, a repair person who specializes in Triumph bikes, is working on an ailing Speed Triple motorcycle when a diagnostic tool produces a report identifying a defect in the fuel management system.
@prefix triumph: <http://triumph.example/schema/#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://triumph.example/part/0d92ie433> rdf:type triumph:part ; rdfs:label "Accelerator Cable MK3" ; triumph:depends-on <http://triumph.example/part/329i2dk39> ; triumph:part-for <http://triumph.example/2004/SpeedTriple> ; triumph:part-number "LCD 100-04BSPT" . <http://triumph.example/part/329i2dk39> rdfs:label "Mounting Bracket" ; triumph:requires [ triumph:has-number "4" ; triumph:part-number "149028ab-MT" ; triumph:type triumph:screwx ] .
Ev uses a query interface to the parts database to ask about the defective part. In response to her query, Ev receives a human-readable description of the part, which provides enough information to obtain a replacement part and tells her about other, dependent parts that must be replaced at the same time.
Motivates: Subgraph Results, Optional Match, Human-friendly Syntax.
Smiley works for a multinational media conglomerate. As part of his job as an editor of foreign market compilations, he needs to be notified whenever the conglomerate's knowledge bases contain information about new media objects—books, movies, and pop music—matching various properties: title, author, and price point.
@prefix baf: <http://big-accounting-firm.example/scheme/1.0/#> . @prefix bmc: <http://big-media-conglomerate.example/ontology/#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . [] baf:dollarPrice "29.99" ; bmc:objectName "J to the LO" ; dc:author <http://big-media.example/author/1929/> .
Smiley uses his web browser to create a query that will be executed regularly against the conglomerate's knowledge bases. Whenever there are new matches for Smiley's query, he receives an email with URIs to resources about the new matches; and Smiley's personal RSS feed is also updated with the new matches, since he uses an RSS aggregator to gather news every day.
Since Smiley's query will operate over knowledge bases structured by at
least four different ontologies—the result of his conglomerate's rapid
expansion—Karla, the staff programmer for Smiley's group, makes sure that
knowledge bases in question contain appropriate
rdfs:subPropertyOf
assertions. For example, Smiley's query
uses the predicate media:ObjectName
, which will also find
properties like dc:title
, doi:title
, and mods:titleInfo
.
Motivates: Human-friendly Syntax, Aggregation Graphs, Aggregate Query, Additional Semantic Information.
Kate wants to see all the television programs that feature information about the Japanese baseball player Ichiro. She wants her personal digital recorder (PDR) to record every television show about Japanese baseball automatically using the Electronic Program Guides (EPGs). She also wants an index page for each week's recorded items.
Her RDF-enabled PDR periodically executes a query against the RDF version of its EPGs, and continues to execute the query every day for new items to record.
Motivates: Result Limits, Aggregate Query, Addressable Query Results.
Niel has to drive every day from home to his office during heavy rush hour traffic in Atlanta, GA, in his new car, which has Bluetooth and wireless Internet access. Using his cell phone, Niel requests that his car query public RDF storage servers on the Web for a description of current Atlanta road construction projects, traffic jams, and roads affected by inclement weather.
Based on the information retrieved efficiently from the public RDF servers, Niel uses the mapping program in his cell phone to plan a different route to work, cutting his commute time by 10%.
Motivates: Bandwidth-efficient Protocol, Result Limits.
Abelard, an independent publisher of web publications, wants to query RSS feed aggregators in order to track RDF assertions people make about articles and stories in his publications. Abelard's client software includes support for three different RDF query languages.
Heloise manages one of the servers that Abelard wants to query. Her server publishes a machine readable description of its capabilities, including the query languages it supports, in RDF. Abelard's client asks Heloise's server whether it supports his preferred query language. Abelard's client software also negotiates with the other servers and uses a common transport protocol to retrieve the results of his queries.
Motivates: Human-friendly Syntax, Aggregation Graphs, Aggregate Query, Yes-No Queries.
José knows that the U.S. Census Bureau provides interesting geographic data in its public domain TIGER database. José attends a conference in Washington, DC, at the new convention center, and he stays in a hotel nearby. José wants to find out the latitude, longitude, name, and type of everything within one mile of the convention center, as well as all events occurring during his stay, so that he can plan his meals and sightseeing time accordingly.
Rather than working with the TIGER database files directly, José sends a query to the Census Bureau's new RDF storage server and requests that his client pass the query results to an XSLT transformation service so that he can print the resulting XHTML.
Motivates: Extensible Value Testing, Limited Datatype Support, Human-friendly Syntax.
Frannie and Zoe, old college friends, live in different countries and keep in daily contact via IRC. Zoe wrote an IRC bot that they use to make assertions—which the bot stores as RDF—about photographs of their family, friends, and vacations. Frannie wants to be able to republish some of these assertions in a human readable form on her weblog. Zoe tells her about a server that accepts and agrees to host documents that describe what they say about web resources, and their IRC bot sends those documents periodically to the server.
Frannie programs her weblog software to query the server that hosts their annotations for vacation images that co-depict her family members with Zoe's family members, as well as for things Zoe and Franny have said about those images. Frannie uses the XSLT processor built into her weblog software to transform the query results into XHTML for display in her weblog.
Motivates: Variable Binding Results, Non-existent Triples, Aggregate Query.
Nada, a Semantic Web developer, has a bug report from a valued user indicating that a software tool is incorrectly emitting the N3 representation of some of the RDF core test cases. Nada wants to create a list of input and output documents for each of the approved test cases, filtering only for those which have an "approved" status, from the RDF core test suite. The list of tests resides in a single file.
Nada can programmatically process the RDF core manifest file with a result which is one line per input/output pair so that a script can easily be written to create the next stage, namely, reading the input document, writing it and checking it.
Motivates: RDF Graph Pattern Matching, Variable Results, Local Queries.
Erasmus Jones, a professor, wants to find some learning materials for his seminar on Renaissance humanism. He is using a recommended web site that provides descriptions of learning materials; he performs a search at the site, chosing the general subject area, student learning level, and provides some keywords. The results include materials returned from multiple learning repositories, where the subject and learning levels have been matched across multiple educational metadata vocabularies, including predicates from the Dublin Core Metadata Element Set and the UK Learning Object Metadata Framework specifications.
Motivates: Aggregation Graphs, Aggregate Query.
Esther, a programmmer for a new social networking site based on FOAF, has written an RDF crawler which
follows foaf:knows
links to determine the publicly available
properties of new people it will invite into the network. While processing
a new FOAF resource, it finds an rdf:Property
referring to a
URI that it has not seen before. The crawler queries an ontology server to
see if the property's domain(s) and range(s) are ones that it has already
encountered, so that it can track where it first discovered this property
and use the property in future searches.
Motivates: Aggregation Graphs, Additional Semantic Information.
Peter is developing a medical knowledge base using OWL/RDF in collaboration with medical domain experts. The knowledge base is used within electronic patient records. To facilitate collaboration and avoid duplication, the team is using a federated ebXML Registry to store the knowledge base they are building.
When adding a new concept to the knowledge base, Peter uses a registry browser application to search the ebXML Registry for similar or related concepts. The registry browser allows Peter to choose a parameterized query from a set of preconfigured parameterized queries and offers a form that Peter uses to enter the query parameters.
Peter enters a few parameters and issues the query. The ebXML Registry returns a large number of matching results. Peter narrows his search by reissuing the query with additional parameters until he find concepts that are most relevant to his concept. Peter then drills down and browses these concepts, as well as their related concepts and metadata, to determine whether to add his new concept.
Motivates: RDF Graph Pattern Matching, Variable Binding Results, Streaming Results.
Lyndie works for a firm that creates market research reports for corporations that have contracts with the US federal government. She has access to an RDF repository, which contains information about accounting firms, corporations, and their customers:
@prefix baf: <http://big-accounting-firm.example/scheme/1.0/#>. @prefix xsd: <http://www.w3.org/2001/XMLSchema#>. <http://www.pwc.com/> baf:hasName "PriceWaterhouseCoopers"^^xsd:string. <http://www.boeing.com/> baf:hasName "Boeing"^^xsd:string. <http://www.labor.gov/> baf:hasName "US Department of Labor"^^xsd:string. <http://www.pwc.com/> baf:accountsFor <http://www.boeing.com/>. <http://www.boeing.com/> baf:hasCustomer <http://www.labor.gov/> .
Lyndie wants to query this RDF repository in order to find the names of accounting firms that do accounts for suppliers of the Department of Labor or that do accounts for the Department of Labor itself.
Motivates: RDF Graph Pattern Matching—Disjunction.
Marty wants to learn which of the ten biggest grossing Hollywood movies of all time also had soundtracks among the ten biggest grossing film soundtracks of all time. Imagine that some future version of the IMDB site exposes its information about movies as RDF. Further imagine that the CDDB site does the same for its information about music. Marty then writes a query to find the titles of the ten biggest grossing films. He uses the results of that query to query CDDB in order to filter the films that did not have top 10 soundtracks.
Motivates: Querying Multiple Sources.
Technical requirements are features or characteristics of either the query language or data access protocol (or, in some cases, of both) that are expected to be in the specification.
The query language must include the capability to restrict matches on a queried graph by providing a graph pattern, which consists of one or more RDF triple patterns, to be satisfied in a query.
Status: Accepted 2004-05-11.
It must be possible for queries to return zero or more bindings of variables. Each set of bindings is one way that the query can be satisfied by the queried graph.
Status: Accepted 2004-05-11.
The query language must make it possible—whether through function calls, namespaces, or in some other way—to calculate and test values extensibly.
Many application domains have specific value testing requirements; for example: the concept of "distance" in geospatial data or calculating the gravitational attraction of two masses, given their mass and the distance between them. Value testing may be more efficient when domain specific functions are available for use.
Status: Accepted 2004-05-04.
It must be possible for query results to be returned as a subgraph of the original queried graph.
Status: Accepted 2004-06-15
The query language must be suitable for use in accessing local RDF data—that is, from the same machine or same system process.
Status: Accepted 2004-05-04.
It must be possible to express a query that does not fail when some specified part of the query fails to match. Any such triples matched by this optional part, or variable bindings caused by this optional part, can be returned in the results, if requested.
Status: Accepted 2004-07-15.
The query language must include support for a subset of W3C XML Schema datatypes and operations on those datatypes.
Status: Accepted 2004-05-11.
It must be possible to specify an upper bound on the number of query results returned.
(Note: The Working Group has discussed and is aware of the connection between result limits and result sorting, as well as the implementation costs of sorting and the tradeoffs between client and server computing power per user.)
Status: Accepted 2004-07-15.
It must be possible, when returning multiple unordered results, for the client to request that results be streamed. When the client requests streaming results, all the data in one result must be available to the client before all the data for the next result.
Status: Accepted 2004-06-29.
The query language must include the capability to restrict matches on a queried graph based on a disjunction of graph patterns, at least one of which must be satisfied.
Status: Accepted 2004-07-16.
Design objectives, which may be features or characteristics of the eventual design, differ from requirements in that the specification may be complete if none, some, or all of them are achieved.
There must be a text-based form of the query language which can be read and written easily by users of the language.
Status: Accepted 2004-07-15.
RDF can be used for data integration and aggregation. RDF repositories are built by merging RDF triples from several other RDF repositories or from non-RDF sources converted to RDF. Such an aggregations can be real or virtual.
It must be possible for the query language and protocol to allow an RDF repository to expose the source from which a query server collected a triple or subgraph.
Status: Pending.
It must be possible to query for the non-existence of one or more triples or triple patterns in the queried graph.
Status: Accepted 2004-07-15.
It should be possible to specify two or more RDF graphs against which a query shall be executed; that is, the result of an aggregate query is the merge of the results of executing the query on each of two or more graphs.
Status: Pending.
It should be possible for a query to specify which of the available RDF graphs it is to be executed against. If more than one RDF graph is specified, the result is as if the query had been executed against the merge of the specified RDF graphs. Query processors with a single available RDF graph trivially satisfy this objective.
Status: Pending.
It should be possible for knowledge encoded in other semantic languages—for example: RDFS, OWL, and SWRL—to affect the results of queries executed against RDF graphs.
It should be possible for a query to indicate that the answers should take into account knowledge encoded in RDF semantic extensions such as RDFS, OWL, etc.
Status: Pending.
The access protocol design shall address bandwidth utilization issues; that is, it shall allow for at least one result format that does not make excessive use of network bandwidth for a given collection of results.
Status: Accepted.
It should be possible for a query to perform substring searches of RDF string literals.
Status: Accepted Accepted 2004-07-16.
It must be possible in the query language to express yes-no questions straightforwardly.
Status: Accepted 2004-07-15.
A common pattern of access is to send a query, which is like a question, to a remote service which evaluates it and returns the results. This access pattern fits naturally into the architecture of the Web by making query results addressable resources.
It must be possible for query results to be addressed in URI space.
Status: Accepted Accepted 2004-07-16.
See the survey of existing RDF query language implementations: "RDF Query Survey", as well as the "RDF Query and Rules Framework".
The editor acknowledges all of the members of the Data Access Working Group for aid and assistance in preparing the present document, especially Andy Seaborne, Yoshio Fukushige, Bryan Thompson, Howard Katz, Dave Beckett, Dan Connolly, and Eric Prud'hommeaux. The editor also acknowledges the support of his University of Maryland MIND Lab colleagues, especially Bijan Parsia and James Hendler.