Towards Semantic Web Document Engineering
Jacco van Ossenbruggen
Centrum voor Wiskunde en Informatica (CWI), Amsterdam
Jacco.van.Ossenbruggen@cwi.nl
Abstract:
Web publishing systems have to take into account a plethora of Web-enabled
devices, user preferences and abilities. Technologies generating these
presentations will need to be explicitly aware of the context in which the
information is being presented. Semantic Web technology can be a fundamental
part of the solution to this problem by explicitly modeling the knowledge
needed to adapt presentations to a specific delivery context. We propose the
development of a Smart Style layer which is able to use metadata to
improve the presentation of content to human users. We discuss different
uses of metadata and suggest extensions to current Web technology.
As the Web continues to grow not only in size but also in complexity, the
increasingly varying needs of the intended audience marks the end of the
``one size fits all'' era. Delivery contexts [1] can be characterized in terms of
specific user preferences and abilities, capabilities of the access device
and available network resources. Given this heterogeneity, any single
message needs to be adapted to a particular set of circumstances. As a
minimum requirement, the author's intended message needs to be conveyed to
the user given the constraints imposed by the access device. In addition,
the generated presentation should conform as much as possible to the
preferences of the user and the author [2]. These two types of adaptation may
lead to an explosion of potential delivery contexts with which current
stylesheet technology is unable to deal.
Our prototype multimedia presentation generation system Cuypers [3] generates multimedia presentations
adapted to the constraints of a specific delivery context. We claim that the
particular solutions deployed within Cuypers realize a level of adaptivity
that should become generally available on the Web. This introduces new
challenges since the solutions need to be embedded within the current Web
infrastructure. In this paper, we introduce the concept of Smart
Style: an intelligent presentation adaptation layer for the Web that
builds upon two fundamental technologies:
- Web document engineering technology, including delivery formats such as
HTML [4], SMIL [5], SVG [6] and XSL [7], and style and transformation
languages such as CSS [8] and
XSLT [9].
- Semantic Web knowledge representation and metadata technology,
including RDF [10], RDF Schema [11], DAML+OIL [12] and CC/PP [13].
Currently, Semantic Web technology is primarily deployed to improve
Web-based information gathering and brokerage. Our vision is,
however, that the Semantic Web infrastructure should also play a key role in
presenting information in the most appropriate way to each individual reader.
On the other hand, document engineering technology is developing relatively
independently from the Semantic Web. We argue that device independent Web
content engineering requires a large amount of knowledge that needs and could
be made explicit by employing Semantic Web technology. Our proposed Smart
Style layer would deploy Semantic Web technology to improve the
presentation's adaptation, aiming for an optimized design of the presentation
that suits the specific requirements of the user's delivery context.
Ingredients for a Smart Style Layer
To build a Smart Style layer on top of the existing Web infrastructure,
four ingredients are needed: ways of specifying delivery contexts, support
for content descriptions; processing for delivery contexts and content
descriptions.
Assuming that at least a part of the adaptation will need to take place on
the server, it is essential to standardize the communication of
delivery contexts: clients need to be able to send the information in a way
that the server understands. A machine-readable description of a delivery
context that can be sent to the server is often called a profile.
CC/PP [13] provides an
RDF-based framework for defining the vocabularies that are needed to define
profiles. In addition, it also provides a small vocabulary that can be
reused across different profiles. The WAP Forum [14] provides a commonly agreed upon
mechanism to communicate the (technical) capabilities of mobile phones to
servers and proxies. The CC/PP framework, however, is sufficiently flexible
to allow the definition of profiles that focus on more user-centered aspects
of a delivery context, such as language preference or media preference.
Clients need to be able to communicate delivery contexts, but in itself this
is insufficient. Many design decisions will also depend on information that
is only available at the server. Even when this information is not intended
to be published on the Web, having commonly used and standardized solutions
for describing and processing it will greatly reduce the development effort
needed to implement a smart, adaptive Web site.
Intelligent adaptation systems will need some knowledge of the function of
the content they are adapting. To make this type of knowledge explicit,
appropriate use of metadata will be of key importance. Within and
outside W3C, a large amount of work on metadata standardization is currently
in progress, and in most of this work RDF, RDF Schema and DAML+OIL (and the
language being specified within WebOnt) play a central role.
For example, suppose an online museum site has developed an RDF Schema1 for the metadata2 used to annotate their Web
site. Also suppose the site features an HTML page describing a work by the
painter Rembrandt van Rijn, focusing on the use of chiaroscuro (the
painting technique that uses strong contrasts of light and dark paintings).
Figure 1 shows a an example fragment
of the page.
Figure 1: Example XHTML 1.0
fragment from a page about a Rembrandt painting.
<div id="allegory">
<h1>Musical Allegory<h1>
<img src="allegory.jpg"/ >
<p>This is hardly just an ordinary group of musicians.
The figures are too exotically dressed in oriental
...
</div>
|
From an XML/HTML markup perspective, all we know is that we have a
fragment with a first level heading, an image and a text paragraph. The
underlying semantics, however, could be explicitly added by the use of RDF
metadata, as shown in figure 2.
Figure 2: RDF metadata of XHTML
1.0 fragment.
<museum:Painter rdf:ID="Rembrandt">
<museum:fname>Rembrandt<museum:fname>
<museum:lname>Harmenszoon van Rijn<museum:lname>
<museum:painted rdf:resource="#allegory" />
<museum:Painter>
<museum:Painting rdf:about="#allegory">
<museum:title>Musical Allegory<museum:title>
<museum:technique>Chiaroscuro<museum:technique>
<museum:Painting>
|
This explicitly states that our HTML fragment is an instance of a class
Painting, with a title property ``Musical Allegory'', and
that there is a Painter instance that has a painted
relation with this painting. The question is: can we exploit the knowledge
provided by the metadata to improve our style sheets and other adaptation
technology?
While the current focus of this type of Semantic Web technology is on the
use of metadata to achieve a more intelligent model for Web-based information
retrieval (e.g. improving search engines), the use of metadata in our Cuypers
system shows that there is also a huge potential in applying this type of
technology for improving the adaptation and presentation process. Through
the use of metadata to make the intended semantics and function of the
content explicit, adaptation systems should be able to make informed
decisions during the design process. This requires an adaptation process
that is also able to take into account presentation-related metadata.
Based on our experience with Cuypers, we found that most metadata is geared
to information retrieval purposes, but not for information presentation.
Presentation-related metadata provides information about the properties of
the content in the context of its presentation to the user. Examples include
information about the intended audience (e.g. suitability for presentation to
children), the role of the content (e.g. suitability for a specific
presentation role, as introductory material or in-depth explanation), and the
transformations allowed (e.g. to what extent images may be scaled in terms of
minimum/maximum scaling and aspect ratios, or to what extent images can be
displayed in grayscale while still communicateing the intended message).
Assuming that the information upon which we base our design decisions will be
available from the Web through the use of standard Semantic Web technologies
such as CC/PP and RDF, the next ingredient needed for building a Smart Style
layer are efficient tools that are able to take this type of information into
account during the adaptation process. A first step is to make the current
generation presentation-oriented Web technology interoperable with the
next-generation Semantic Web technology. For example, CSS stylesheets are
currently not able to take CC/PP profiles into account. CSS has, however, a
feature that is closely related to CC/PP, and allows the specification of
device dependent style rules: the @media rule. Figure 3 shows an example3 of a stylesheet that uses bigger fonts on
computer screens than on paper printouts of the same document.
Figure 3: Device dependent style
rules as already supported in CSS2.
@media print {
body { font-size: 10pt }
}
@media screen {
body { font-size: 12pt }
}
|
A first step towards a CSS syntax that allows more detailed queries is
suggested in [17]. In this
syntax, queries to specific device features are allowed. For example, the
CSS media rule for screen display above could be further refined by adding
constraints on the minimum width of the screen, as shown in figure 4. Using the constraints,
stylesheets could take into account the information provided by profiles such
as:
Figure 4: Detailed media queries
using a CSS3 extension (work in progress).
@media screen and (min-width: 640px) {
body { font-size: 14pt }
}
@media screen and (min-width: 800px) {
body { font-size: 16pt }
}
|
Even from this extended CSS syntax, however, it is still a long way to
fully CC/PP aware style engines. CC/PP features that will affect style
application include the ability to define new profile vocabularies,
inheritance mechanisms for specifying default values and the description of
the capabilities of transcoding proxies. Style engines need to be able to
deal with these features in order to take full advantage of the information
specified in CC/PP delivery contexts.
Note that the need to take CC/PP information into account also applies to
XSLT transformation engines. While the full details of how this could affect
future versions of XSLT are beyond the scope of this paper, one could, for
example, imagine an extension4 of XSLT's mode concept. For example,
transformation rules could be selected in a way similar to that of the media
rules in CSS. In such a hypothetical extension (see figure 5) one could, for instance, define a
rule for creating a two column layout only if the output medium is print and
the paper is wider than 17cm.
Figure 5: Device dependent rules
by extending XSLT modes (tentative syntax).
<xsl:template match="body"
mode="print and (min-width: 17cm)">
...
<fo:region-body column-count="2"/>
...
</xsl:template>
|
In addition to taking information about delivery contexts into account,
stylesheets also need to take into account the semantic information that is
contained in the metadata associated with the content. Currently, style
selector mechanisms only match on the syntactic properties of the
underlying (XML) document hierarchy. This applies both to the selector
mechanism used by CSS and to the XPath [18] selectors used by XSLT.
In all examples above, the rules were intended to match on the
<body>
element of an HTML document. Similar rules could
be written to match on the syntactic properties of metadata, i.e. on the XML
element and attribute names that are used to encode the RDF statements of
Figure 2. Using the current
generation CSS and XSLT engines to process general metadata it is, however,
not practical to match on the semantic properties of metadata: for CSS
and XSLT processors, RDF is just XML. As a result, it is very hard to write,
for example, a rule that matches on all alternative XML serializations that
are allowed for RDF. A more serious problem, however, is that it is
impossible to write CSS or XSLT rules that make use of the structural
relations of RDF and RDF Schema, for instance a style rule that applies to
all objects that are instances of a specific RDFS (sub)class. Neither is it
possible to write rules for all objects that have a certain DAML+OIL-defined
ontological relation, etc.
Future, Semantic Web-aware, selector mechanisms could allow specification
of style rules in terms of the RDF semantics expressed in the metadata. This
would extend the currently used CSS and XPath selectors, that are based on
the XML syntax encoding the semantics. Consider the extended XSLT example
rule in figure 6, which uses the
RDF-aware query language RQL [15] for
its selector, instead of XPath.
Figure 6: Semantic matching of
XSLT rules using RQL selectors (tentative syntax).
<xsl:template match=
"RQL(http://www.museum.com/schema.rdf#Artifact)">
...
</xsl:template>
|
It matches on all resources that are instances of (subclasses of) the RDF
class Artifact. Given the fact that our RDF Schema would define
Painting as a subclass of Artifact, the rule would also
match on the HTML fragment of Figure 1. Such rules that employ the semantic
relations defined in the metadata are currently impossible to write in
XSLT.
This paper sketches the requirements for an ambitious goal: automatic
adaptation of dynamic text and multimedia content to the requirements of an
individual user's delivery context, while respecting the integrity of the
semantics of the content. If we reduce our ambition levels, however, and
``only'' aim for taking into account processing context information, this
alone would still have major consequences. To prevent CC/PP from becoming a
stand-alone W3C recommendation that can only be processed with proprietary
tools, we need to clearly define how other recommendations, including CSS,
XSLT, XHTML, SMIL and SVG operate in the context of CC/PP. From CC/PP-aware
Web transformations, another step is required towards Semantic Web-aware
transformations that also take metadata semantics into account. Given the
amount of knowledge that needs to be taken into account when adapting Web
resources, we need to integrate the document engineering layers of the Web
with the knowledge engineering layers of the Semantic Web. This will require
tools that can abstract from the underlying XML syntax and operate directly
on the semantics of languages such as RDF, RDFS and DAML+OIL.
Realizing such a level of interoperability among W3C Recommendations will
be a huge effort. It should be clear that the examples given in this paper
serve only to illustrate the discussion, and should by no means be regarded
as readily applicable syntactical solutions to achieve the required
interoperability. Making the current Web infrastructure interoperate
seamlessly with the upcoming Semantic Web will be a huge challenge and a long
term effort.
-
- 1
- W3C, ``Device Independence Principles.'' Work in progress. W3C
Working Drafts are available at http://www.w3.org/TR, 18 September
2001.
Edited by Roger Gimson, co-edited by Shlomit Ritz Finkelstein, Stéphane
Maes and Lalitha Suryanarayana.
- 2
- D. Bulterman, L. Rutledge, L. Hardman, and J. van Ossenbruggen,
``Supporting Adaptive and Adaptable Hypermedia Presentation
Semantics,'' in The 8th IFIP 2.6 Working Conference on Database
Semantics (DS-8): Semantic Issues in Multimedia Systems, (Rotorua,
New Zealand, 5-8 January 1999), 1999.
- 3
- J. van Ossenbruggen, J. Geurts, F. Cornelissen, L. Rutledge, and
L. Hardman, ``Towards Second and Third Generation Web-Based
Multimedia,'' in The Tenth International World Wide Web
Conference, (Hong Kong), pp. 479-488, IW3C2, May 1-5, 2001.
- 4
- W3C, ``XHTML 1.1 - Module-based XHTML.'' W3C Recommendations are
available at http://www.w3.org/TR/, May 31, 2001.
Edited by Murray Altheim and Shane McCarron.
- 5
- W3C, ``Synchronized Multimedia Integration Language (SMIL 2.0)
Specification.'' W3C Recommendations are available at
http://www.w3.org/TR/, August 7, 2001.
Edited by Aaron Cohen.
- 6
- J. Ferraiolo, ``Scalable Vector Graphics (SVG) 1.0 Specification.''
W3C Recommendations are available at http://www.w3.org/TR/, 4 September
2001.
- 7
- W3C, ``Extensible Stylesheet Language (XSL) Version 1.0.'' W3C
Recommendations are available at http://www.w3.org/TR/, 15 October
2001, 2001.
- 8
- B. Bos, H. W. Lie, C. Lilley, and I. Jacobs, ``Cascading Style
Sheets, level 2 CSS2 Specification.'' W3C Recommendations are available
at http://www.w3.org/TR, May 12, 1998.
- 9
- J. Clark, ``XSL Transformations (XSLT) Version 1.0.'' W3C
Recommendations are available at http://www.w3.org/TR/, 16 November
1999.
- 10
- W3C, ``Resource Description Framework (RDF) Model and Syntax
Specification.'' W3C Recommendations are available at
http://www.w3.org/TR, February, 22, 1999.
Editied by Ora Lassila and Ralph R. Swick.
- 11
- W3C, ``Resource Description Framework (RDF) Schema Specification
1.0.'' W3C Candidate Recommendations are available at
http://www.w3.org/TR, 27 March 2000.
Edited by Dan Brickley and R.V. Guha.
- 12
- F. van Harmelen, P. F. Patel-Schneider, and I. Horrocks, ``Reference
description of the DAML+OIL (March 2001) ontology markup language.''
http://www.daml.org/2001/03/reference.html.
Contributors: Tim Berners-Lee, Dan Brickley, Dan Connolly, Mike Dean,
Stefan Decker, Pat Hayes, Jeff Heflin, Jim Hendler, Ora Lassila, Deb
McGuinness, Lynn Andrea Stein, ...
- 13
- W3C, ``Composite Capability/Preference Profiles (CC/PP): Structure
and Vocabularies.'' Work in progress. W3C Working Drafts are available
at http://www.w3.org/TR, 15 March 2001.
Edited by Graham Klyne, Franklin Reynolds, Chris Woodrow and Hidetaka
Ohto.
- 14
- Wireless Application Group, ``WAP-174: WAG UAPROF User Agent Profile
Specification,'' 1999.
- 15
- G. Karvounarakis, V. Christophides, D. Plexousakis, and S. Alexaki,
``Querying Community Web Portals.''
http://www.ics.forth.gr/proj/isst/RDF/RQL/rql.html.
- 16
- J. van Ossenbruggen, L. Hardman, and L. Rutledge, ``Hypermedia and
the Semantic web: A research agenda,'' Tech. Rep. INS-R0105, CWI, 2001.
- 17
- H. W. Lie and T. Celik, ``Media queries.'' Work in progress. W3C
Working Drafts are available at http://www.w3.org/TR, 17 March 2001.
- 18
- J. Clark and S. DeRose, ``XML Path Language (XPath) Version 1.0.''
W3C Recommendations are available at http://www.w3.org/TR/, 16 November
1999.
Footnotes
- ... Schema1
- Museum schema example adapted from [15].
- ... metadata2
- Metadata example adapted from [16]).
- ... example3
- Example taken from the CSS2 Specification [8].
- ... extension4
- We are not advocating a specific syntax, but are only claiming that
future XSLT transformations need to be able to take CC/PP-like
information into account