Six years after their publication, a critical look at the core RDF specifications and the entire RDF stack is definitely justified. Nevertheless, a conservative approach should be taken. RDF adoption has gained momentum and this is not the time to make RDF a moving target. The main opportunities are in codifying practices that have emerged outside of W3C working groups, most importantly the Turtle syntax, the Named Graphs data model, and the follow-your-nose interpretation of URIs in RDF graphs. Furthermore, there is an opportunity for making some underused features of the RDF model more useful by improving their support throughout the stack.
A position paper submitted to the W3C Workshop RDF Next Steps
More than six years have passed since RDF has become a standard. Much has happened since: SPARQL has made programmatic access to large RDF databases practical. The W3C's Technical Architecture Group has settled the httpRange-14 issue, answering the question of how RDF should be deployed on the Web, and paving the road towards the rise of Linked Data. RDF has seen increased use in new areas, from social networks over library catalogs and e-commerce to official government data. It has moved from the Artificial Intelligence and Computer Science departments of universities to the business world, and has taken hold in multinational companies as well as startups. Certainly, the RDF community has learned a lot in these six years. So, should RDF evolve? Is it time to fix the blunders of the past, to embrace change and to design a better RDF 2.0 for the future?
The answer is a complex one, since it has to find a balance between fixing what's broken in the stack, and between not derailing the moving train that has gained momentum.
This document proposes a set of boundary conditions that any successful update to the RDF core specifications should meet; states some high-level goals for such an update; and lists some specific issues that could be addressed. Nothing particularly novel is mentioned in those lists, because it is the author's opinion that W3C can best contribute by standardizing one approach where there are already multiple non-interoperable approaches to the same problem, or where there already is a single de-facto standard way of doing things that could benefit from approval as a W3C Recommendation.
Before we can get into detail, it is worth thinking through some boundary conditions that must be met by any effort at revisiting the foundational specifications of the RDF stack.
What goals should an effort to update the core RDF standards have? If the boundary conditions listed above are to be met, many goals that might be attractive in theory are clearly not feasible. As an extreme example, designing a new RDF 2.0 from scratch, incorporating everything that has been learned since 2004, is clearly not desirable as it violates conditions 1, 2 and 3. Similarly, no new logic foundation for RDF should be standardized, as it fails conditions 3 and 4.
Two general goals seem desirable and achievable.
Before talking more about things that should be done, it is worth noting some flaws of the RDF stack that, despite frequent expressions of pain from the practitioners' community, are better left untouched.
Should we allow literal subjects? No. Literals are not allowed in the subject position for rather accidental and historical reasons. There is no solid design argument for not allowing them as subjetcs. The SPARQL specification has takes first steps towards relaxing the restriction. But this restriction is not a problem in practice. The constraint may be unnecessary, but it doesn't preclude any major usage scenarios, and the situations in which it causes pain are limited. A change to the model would ripple through every syntax and every implementation. This cost is not justified.
Should we fix RDF/XML? No. Experience has shown that RDF/XML is not a good format. It is complex, verbose, and exhibits rather arbitrary restrictions in the graphs that can be serialized. But it has a redeeming feature: After all these years, there are reliable and interoperable RDF/XML parsers for most major computing platforms. Modifying RDF/XML would negate this benefit. The community has to accept that we are stuck with a poor XML syntax, and focus energies on friendlier syntaxes. With Turtle and RDFa, good alternatives are now readily available.
Should we abolish blank nodes? No. They are much reviled, but they are occasionally useful, and people can be taught not to use them.
Turtle: a friendly RDF syntax. Much has been said about the harm that has been done to RDF adoption by the RDF/XML syntax. The solution is not to fix RDF/XML; the solution is to put a better syntax on equal footing with RDF/XML. This syntax is Turtle. Unlike a few years ago, Turtle implementations are now almost as readily available as RDF/XML implementations. The main obstacle to wider use of Turtle is its lack of W3C Recommendation status. Turtle is already a W3C Team Submission. Rubber-stamping it as a Recommendation would also straighten the path towards updating core RDF documents, such as the RDF Primer, with versions that use Turtle examples throughout the document.
Named Graphs. Managing context, provenance and graph updates are extremely important in almost any RDF application. The solution is the Named Graphs data model. It is already part of the SPARQL Recommendation, is widely implemented also outside of SPARQL, and generally well-understood. It should be elevated to a separate Recommendation. Besides codifying existing practice, this would be a welcome support for those practitioners who are trying to improve the general state of provenance tracking and metadata on the RDF-based Web and who are currently fighting a somewhat uphill battle because of Named Graphs' relative obscurity. Furthermore, a Named Graphs standard could galvanize research on the upper layers of the Semantic Web stack, where the availability of rich context information, along with a standard model for its representation, is a key requirement.
Codifying follow-your-nose. RDF statements are assertions about the world. But to understand what a statement means, one has to know what the URIs refer to. One has to know what they name. Despite the centrality of URIs in the RDF data model, the RDF specifications have nothing to say about how a URI actually receives its meaning. This needs fixing. It is possible to get a coherent picture of the process by referring to a number of other documents, in particular the httpRange-14 TAG Finding, the Architecture of the World Wide Web document, the Cool URIs for the Semantic Web Note, and a number of documents published by enthusiasts outside of W3C. Further progress towards codification was made by the TAG's AWWSW task force. Finally completing this job is an important companion to the standardization of Named Graphs; both together allow for a solid account of Web document metadata and thus context information for RDF data published on the Web.
Among the RDF stack's many features, there are some that are rarely liked, rarely used, sometimes mis-used, and generally in a poor shape in terms of actual deployment and tool support.
The existence of underused features is not a major problem. At worst, it increases the cost of conformant implementation, and it might lure newbies down a wrong road. Nevertheless, it is worth exploring the reasons for the lack of love for those features. In some cases, they may have been made redundant by newer developments. In other cases it could simply be a lack of proper support in some other layer of the stack. Especially SPARQL makes it very hard to successfully use some RDF features because SPARQL is so opinionated when it comes to support of the RDF model's full richness. In such cases, there might be an opportunity for making the overall stack better and richer by extending SPARQL's coverage. If, on the other hand, a convincing argument can be made against adding support for these features in SPARQL and elsewhere, then perhaps the features ought to be deprecated in the base RDF model. Notable examples:
The RDF Containers rdf:Alt, rdf:Bag and rdf:Seq are rarely used, and if it wasn't for a strong reliance on rdf:Seq in RSS 1.0, they would probably be forgotten. They suffer from a lack of clear semantics, from a lack of purpose, and from redundancy with newer features such as rdf:List.
RDF Lists are much more widely used, and especially the RDF syntax of OWL relies on them heavily. But they are poorly supported throughout the stack. SPARQL has no syntax for querying them. Their representation in RDFa and in N-Triples is horrible. The implementation record in RDF APIs and in RDF visualizers is spotty.
Reification is a controversial feature. The facts are that it is very rarely used in published RDF and that it has no formal semantics. This author's opinion is that it is misdesigned and that Named Graphs are a superior approach for dealing with the context of a statement in the typical case. If Named Graphs are accepted as a W3C Recommendation, then it would be worth exploring if reification can be handled as a special case of single-statement Named Graphs.
Custom datatypes, beyond the XML Schema datatypes such as xsd:int and xsd:date, are very rarely used. Finding examples where they are used in a sensible way is rather hard, and most uses of non-XSD datatypes on the public Web can be classified as either mistakes, or redundant re-definitions of XSD types, or questionable modelling (such as using datatypes for currencies and units of measurement). A main reason might be the lack of a well-documented method of associating a definition with a custom datatype URI.
This document will not express an opinion on what ought to be done about each of these particular features, but general options include:
Either way, a first step could be done by understanding why these features are not widely supported and deployed.
Innovation in the RDF community is ongoing and healthy, and after six years it is time to revisit the older layers of the RDF stack. Nevertheless, this document recommends a conservative approach. Any updates to the stack must fulfill a number of boundary conditions. Most of all, they must increase interoperability, and they must not require sweeping changes that affect all or most existing RDF tools, libraries and applications. Furthermore, changes should benefit the areas where RDF is most successful—data integration and data exchange via web protocols—because otherwise the changes are unlikely to deliver benefits that offset the cost of disruption.
A number of possible areas of work have been identified. These tasks could be tackled by an “RDF Maintenance” or “RDF Housekeeping” working group with a narrowly defined charter.