RDF Content Labels

NOTE: this document is published here as a work-in-progress within the EU-funded Quatro project. It uses W3C technology and explores the application of W3C's RDF/XML format to content labelling in the PICS tradition. It is not currently a work item of any W3C Working Group. I am exploring the possibilities for bringing this work into a chartered W3C Group, eg. as an Interest Group note through the Semantic Web Interest Group. --Dan Brickley


Contents

Introduction

This document provides a method through which content labels expressed in RDF can be applied to any number of resources. That is, a generic description can be created and multiple resources linked to it either directly or through the application of a simple rule set.

The system has been designed for use by trust mark schemes, child advocacy groups, educational institutes etc. It is offered as a successor to the PICS system.

The namespace for the schema defined below is http://www.w3.org/2004/12/q/contentlabel#

Use cases and test data are also available.

Elements of a labelling scheme

A labelling scheme consists of one or more categories which group together related content descriptors and zero or more modifiers which provide further context for a label.

To create a trivial example, a labelling organisation might define "Appearance" as a category within which there were descriptors for:

The labelling organisation further defines "Matt" (m) and "Shiny" (s) as an optional modifier.

In order to create RDF content labels, we define a small set of classes and properties that are the basis for defining labelling schemes. A particular labelling scheme is created by defining instances of these classes and using the properties to define the relationships between those instances.

Assuming relevant namespace declarations have been made, an example of a content label written in RDF/XML might then be:

<label:ContentLabel rdf:ID="label_1">
  <ex:c>2</ex:c>
  <ex:t>20</ex:t>
  <label:hasModifier><ex:s /></label:hasModifier>
</label :ContentLabel>

Example 1: The core of a basic content label

This simply means that, according to the example labelling scheme, the labelled resource is green, 20% transparent and shiny. Note that the context modifier has no associated value - it is either present or absent.

To extend the example, a complete RDF/XML instance might be created as shown in Example 2 and made available at http://www.fabricexamples.com/labels.rdf.

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:ex="http://www.example.org/vocabulary#">

  <label:ContentLabel rdf:ID="label_1">
    <ex:c>2</ex:c>
    <ex:t>20</ex:t>
    <label:hasModifier><ex:s /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_2">
    <ex:c>1</ex:c>
    <ex:t>10</ex:t>
    <label:hasModifier><ex:m /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_3">
    <ex:c>0</ex:c>
    <ex:t>0</ex:t>
</label :ContentLabel>

</rdf:RDF>

Example 2: An RDF/XML instance containing 3 content labels

Resources can now be created that link to specific content labels. Any number of resources that appear shiny and green with 20% transparency can include the link tag below (or its HTTP Response header equivalent):

<link rel="meta" href="http://www.resources.com/labels.rdf#label_1" type="application/rdf+xml" />

Similar tags can be included to link to the labelled resource to be red, only 10% transparent and matt or black and 0% transparent.

Restricting a content label

Any resource, anywhere on the web can link to a content label using the mechanism described above. There are circumstances where this is a useful facility. Equally, however, a content provider or labelling organisation may wish to restrict the scope of their labels. This is achieved by using the hostRestriction property as shown in example 3.

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:ex="http://www.example.org/vocabulary#">

  <label:Ruleset>
    <label:hostRestriction>resources.co.uk</label:hostRestriction>
    <label:hostRestriction>resources.com</label:hostRestriction>
  </label:Ruleset>

  <label:ContentLabel rdf:ID="label_1">
    <ex:c>2</ex:c>
    <ex:t>20</ex:t>
    <label:hasModifier><ex:s /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_2">
    <ex:c>1</ex:c>
    <ex:t>10</ex:t>
    <label:hasModifier><ex:m /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_3">
    <ex:c>0</ex:c>
    <ex:t>0</ex:t>
</label :ContentLabel>

</rdf:RDF>

Example 3: A repeat of example 2 with host restrictions applied

Example 3 declares the same content labels as example 2, however, an agent SHOULD only consider the labels applicable to URIs on either the resources.com or resources.co.uk hosts. If a resource from another host links to the content label, an agent SHOULD disregard the description.

Content labels may be applied to subdomains of the listed host restrictions. In our examples with declared host restrictions of resources.com and resources.co.uk, content labels would be applicable to www.resources.com, support.resources.co.uk etc.

Rules for identifying a label

In examples 2 and 3, a resource was linked to a specific content label. A content provider will need to include a specific link to the correct label in each resource. This is show diagrammatically in figure 1, where, for example, Resource A will include the link tag: <link rel="meta href="labels.rdf#label3" ...

Figure 1. Each resource includes a link to a specific label within the RDF instance at labels.rdf.

Figure 1. Each resource includes a link to a specific label within the RDF instance at labels.rdf.

This will be a convenient approach for some providers. However, a simple set of application rules allows all resources to be linked to a single RDF instance containing multiple labels. An agent can process those rules to select the correct label for a given resource, based on its URI. This is shown diagrammatically in figure 2.

Figure 2. A simple rule set allows all content to link to the same RDF instance and the correct label to be identified.

Figure 2. A simple rule set allows all content to link to the same RDF instance and the correct label to be identified.

The advantage of this system is that a content management system or suite of servers can be configured to include the same link tag, for example:

<link rel="meta" href="http://www.resources,com/labels.rdf" type="application/rdf+xml" />

The labels for an entire site, no matter its size, can be managed by editing a single file. Example 4 shows how such rules can be encoded.

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:ex="http://www.example.org/vocabulary#">

  <label:Ruleset>
    <label:hostRestriction>fabricexamples.co.uk</label:hostRestriction>
    <label:hostRestriction>fabricexamples.com</label:hostRestriction>
    <label:rules rdf:parseType="Collection">
      <label:Matches>
        <label:value>evening</label:value>
        <label:confers>
          <label:PropertySet>
            <label:hasLabel rdf:resource="#label_1"/>
          </label:PropertySet>
        </label:confers>
      </label:Matches>
  
      <label:Matches>
        <label:value>morning</label:value>
        <label:confers>
          <label:PropertySet>
            <label:hasLabel rdf:resource="#label_3"/>
          </label:PropertySet>
        </label:confers>
      <label:Matches>

      </label:Matches>
        <label:value>.*</label:value>
        <label:confers>
          <label:PropertySet>
            <label:hasLabel rdf:resource="#label_3"/>
          </label:PropertySet>
        </label:confers>
      </label:Matches>
    <label:rules>
  </label:Ruleset>

  <label:ContentLabel rdf:ID="label_1">
    <ex:c>2</ex:c>
    <ex:t>20</ex:t>
    <label:hasModifier><ex:s /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_2">
    <ex:c>1</ex:c>
    <ex:t>10</ex:t>
    <label:hasModifier><ex:m /></label:hasModifier>
  </label :ContentLabel>

  <label:ContentLabel rdf:ID="label_3">
    <ex:c>0</ex:c>
    <ex:t>0</ex:t>
  </label:ContentLabel>

</rdf:RDF>

Example 4. A similar listing to previous examples with added application rules.

The rules are processed in turn by matching the URI of the resource in question with a Perl 5 regular expression. The first match leads to the correct label. So, if a URI matches the string "evening", label 1 is correct. If there is no match against "evening" and there is a match against "morning" then label 2 is used. If there is no match with either "evening" or "morning" then label 3 is used since the value of the regular expression given is .* (which matches anything).

Labelling movies, games etc.

Movies, games and other forms of moving content often require different labels at different times. The notion of a movie containing no sex or violence for most of its duration but having "occasional scenes of peril" or "a single scene with nudity" is used by film classifiers the world over. The schema for RDF Content Labels supports these ideas.

As described above, a label is associated with a resource via the hasLabel property. In the context of labelling movies and games, this label should be taken to apply throughout the running time of the resource. In addition, a resource may also have labels that describe content that occurs for limited periods during the running time of the resource. These labels are associated with the resource by properties that indicate the frequency with which they occur.

Figure 3: Movie labelling example

Figure 3: Movie labelling example

In figure 3, the movie's content is described by "label A." That is, the label applies throughout. However, the movie also contains frequent scenes described by label B and a single scene described by label C.

The full list of frequency identifiers included in the schema is:

Labelling scheme operators are, of course, free to provide more precise definitions and synonyms for these terms if so desired.

Provenance of a label

The nature and expected use of content labels is such that the question of who generated and, perhaps, who has since checked the label is of particular importance. An end user may wish to know not only that the labelled resource is green, 20% transparent and shiny, but who says so and to what extent the description can be trusted.

This cannot be done by looking at the data alone. However, a variety of methods may be available, either from the labelling organisation itself or third parties. For this reason, an RDF instance containing content labels may give details of who created it. This is done using the Dublin Core Creator property. It is expected that the creator will be expressed as a URL at which information is made available about how a user may increase their trust in the assertions made in the label. The homepage of the labelling organisation is likely to be a common value for this.

<rdf:Description rdf:about="">
  <dc:creator rdf:resource="http://www.example.org" />
  <label:authorityFor>http://www.example.org/vocabulary#</label:authorityFor>
 </rdf:Description>

Example 5: The creator of the RDF instance is declared, along with the namespace about which information is available from that authority.

As a single RDF instance may contain labels from any number of labelling organisations, there is a facility, therefore, to declare about which descriptions (i.e. which namespaces) a given label creator provides information. In example 5, it is expected that information will be available at http://www.example.org about how a user or an agent might be able to gain trust in assertions made about things like the colour and transparency of a resource. If labels from a different namespace are provided, no information will be available at example.org about their trustworthiness.

It is expected that RDF/XML instances will be subject to processes such as digital signing etc. for the purposes of user-trust.


Schema Description

The namespaces and QNames used in this document are:

rdfhttp://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfshttp://www.w3.org/2000/01/rdf-schema#
labelhttp://www.w3.org/2004/12/q/contentlabel#
xsdhttp://www.w3.org/2001/XMLSchema
dchttp://purl.org/dc/elements/1.1/

The following section describes the label schema.

Class: ContentLabel

An instance of this class is a single descriptive label for content which may be applied to one or more web resources.

Properties.The following properties may be specified for a ContentLabel instance:

Class: Category

A category is a grouping of related content descriptors. These groupings may be thematic but this is not a constraint on category instances in general.

Properties

Property: descriptor

A descriptor defines a single form of content which may or may not be present in a resource. When labelling web resources, a descriptor is used as a property of the content label that it applies to. This means that a descriptor has a range of allowed values. The range of allowed values is not constrained.

Property: hasDescriptor

This property connects a category to the descriptors that make up that category. It can be used by applications to quickly list what all the possible descriptors for a category are.

Class: Modifier

A modifier provides context for a content label as a whole. Each content labelling scheme may define its own set of modifiers.

Property: hasModifier

This property connects an instance of the Modifier class to the ContentLabel that it modifies.

Property: hasLabel

This is a property that links a resource to the ContentLabel that labels that resource.

Property: frequentScenes

This is a property that links a resource to the ContentLabel that labels that resource. It indicates that the resource, typically a movie or game, has frequent scenes of the type described by the content label, however, it is not a complete description.

Property: severalScenes

This is a property that links a resource to the ContentLabel that labels that resource. It indicates that the resource, typically a movie or game, has several scenes of the type described by the content label, however, it is not a complete description.

Property: occasionalScenes

This is a property that links a resource to the ContentLabel that labels that resource. It indicates that the resource, typically a movie or game, has occasional scenes of the type described by the content label, however, it is clearly not a complete description.

Property: singleScene

This is a property that links a resource to the ContentLabel that labels that resource. It indicates that the resource, typically a movie or game, has a single scene of the type described by the content label, however, it is clearly not a complete description.

Class: Ruleset

This is a specialisation of rdf:List which allows only label:Rule instances as list members.

Class: Rule

The class label:Rule is the base class for resource matching rules.

Class: AllOf

The class AllOf is derived from both rdf:Bag and label:Rule. This rule matches a resource if all of the rdf:Rule instances contained in the rdf:Bag match the resource.

When processing the contained rules for matching purposes, any label:confers property on contained label:Rule instances must be ignored.

Class: OneOf

The class OneOf is derived from both rdf:Bag and label:Rule. This rule matches a resource if at least one of the rdf:Rule instances contained in the rdf:Bag match the resource.

When processing the contained rules for matching purposes, any label:confers property on contained label:Rule instances must be ignored.

Class: PropertySet

An instance of the class PropertySet acts as a "placeholder" resource for any resource which matches a given label:Rule.

Property: hostRestriction

This property has a domain of label:Ruleset and a range of rdf:Literal. The value of this property specifies the host to which the contained rules are restricted. The host of an input URI must match at least one of the values of the label:hostRestriction properties of a label:Ruleset. Subdomains are included, so, for example, of the hostRestriction is example.org then the Ruleset is also valid for subdomain.example.org.

Property: rules

This property has a domain of label:Ruleset and a range of label:Rule. An absolute ordering for rules may be specified using either the rdf:first and rdf:rest properties or by using the rdf:parseType="Collection" attribute.

Property: confers

This property has a domain of label:Rule and a range of label:PropertySet. It specifies that when a URI matches the label:Rule instance, then all of the properties of the label:PropertySet instance must be conferred on the matching URI.

Note
The label:AllOf and label:OneOf allow label:Rule instances to be nested within one another. However, for the purposes of conferring properties, only the label:confers property of the outermost label:Rule instance will be considered in the property conference algorithm.

Class: Matches

This rule matches a resource if the resource URI string matches the Perl5 regular expression that is the value of the label:value property of the rule.

Property: value

This property has a domain of label:Matches and a range of xsd:string. The value of the property provides the argument for the label:Matches rule that is the property subject.

Property: authorityFor

This property stands apart from the others. It has a domain of rdf:resource and a range of xsd:string. The value of the property is a labelling scheme's namespace. It is used as a property of a description of the RDF instance itself that SHOULD include a URI, given as the object of a dc:creator predicate (or similar), from which information can be gained about how a user or agent may "test" the content labels. The authorityFor property allows organisations to make statements about the veracity of an RDF instance with respect to labels using the given namespace without making any comment on labels from other namespaces. Although the value will be a URI, it is a string, not an rdf resource since it does not part of the descriptive graph.


Creating a Labelling Scheme

This section describes how to create a specific labelling scheme.

Identifying, Naming and Describing Scheme Components

The schema makes use of basic RDF functionality for identifying, naming and describing the components that make up a labelling scheme.

Each component of the scheme is assigned an ID. This ID, when combined with the base URL of the RDF resource that describes the scheme, gives a unique URI identifier for the component.

Each component should always be assigned a short name. This should be a name suitable for display in a user interface and should be consumer-oriented in nature. A good short name would be "Appearance" or "Colour", a bad short name would be "ax" or "ca". RDF provides a mechanism for these short names by using the rdfs:label property. A component can have any number of rdfs:label property values, although it is STRONGLY recommended that they should be distinguished from each other using an xml:lang attribute and that there should be only one label per language.

<label:Category rdf:ID="ax">
  <rdfs:label xml:lang="en">Appearance</rdfs:label>
  ...
</label:Category>

Example 6. An example of a short name

A component may also be assigned a longer description that might be displayed to a user as pop-up help text. For this description, use the RDF-defined rdfs:comment property. Again, multiple rdfs:comment labels may be provided, but should be distinguished by language using the xml:lang attribute.

<label:Category rdf:ID="ax">
  <rdfs:label xml:lang="en">Appearance</rdfs:label>
  <rdfs:comment xml:lang="en">Colour, Transparency</rdfs:comment>
</label:Category>

Example 7. An example of a short description

Finally, a component may also contain a link to another web resource that provides a much more detailed description. For this link, use the RDF-defined rdfs:seeAlso property. The value of this property MUST be an RDF resource URI.

<label:Category rdf:ID="ax">
  <rdfs:label xml:lang="en">Appearance</rdfs:label>
  <rdfs:comment xml:lang="en">Colour, Transparency </rdfs:comment>
  <rdfs:seeAlso rdf:resource="http://www.example.org/vocabulary/definitions#ax"/>
</label:Category>

Example 8. An example of a reference to a longer description

Define Categories

Each category in a labelling scheme has the identifier, name and descriptions described above, and a list of the descriptors that are part of that category. The descriptors are linked to the category using the label:hasDescriptor property. As there is a list of descriptors, and we want the list to be closed (i.e. no more can be added to the list without modifying our vocabulary file), we specify the hasDescriptors property value as a collection.

Each descriptor must be defined as being a subPropertyOf the descriptor property.

<!-- Appearance category -->
  <label:Category rdf:ID="ax">
    <rdfs:label xml:lang="en">Appearance</rdfs:label>
    <rdfs:comment xml:lang="en">Colour, Transparency</rdfs:comment>
    <rdfs:seeAlso rdf:resource="http://www.example.org/vocabulary/definitions#ax"/>
</label:Category>
    <label:hasDescriptor rdf:parseType="Collection">
      <rdf:Property rdf:ID="c">
	<rdfs:label xml:lang="en">Colour</rdfs:label>
	<rdfs:subPropertyOf rdf:resource="#exampleDescriptor"/>
      </rdf:Property>
      <rdf:Property rdf:ID="t">
	<rdfs:label xml:lang="en">Transparency</rdfs:label>
	<rdfs:subPropertyOf rdf:resource="#exampleDescriptor"/>
      </rdf:Property>
    </label:hasDescriptor>
  </label:Category>

Example 9. Example of a Category Definition

Define Modifiers

Each modifier is simply defined as an instance of the label:Modifier class. Modifiers should be defined with names and descriptions as described above, but there is no need to define any other properties for a modifier.

  <label:Modifier rdf:ID="m">
    <rdfs:label xml:lang="en">This fabric has a matt finish</rdfs:label>
    <rdfs:seeAlso rdf:resource="http://www.example.org//vocabulary/definitions#m"/>  </label:Modifier>

Example 10. Example of a Modifier definition

Processing a Ruleset

This section describes the algorithm for processing a given URI against a label:RuleSet to confer properties on the resource identified by the URI.

Let the input to the algorithm consist of an RDF graph I, and an input URI U. The goal is to produce an output graph O which contains a set of RDF statements about the resource U.

For each resource of type label:Ruleset (RS) in I:

Step 1
For each label:hostRestriction property HR of RS, if the host of U is the same as or a subdomain of the host specified as the value of HR, go to step 3.
Step 2
The host of U does not match any of the specified host restrictions. Return an empty output graph.
Step 3
For each rule R which is the value of a label:rules property of RS, if the resource U matches R, then for each label:PropertySet resource, PS, which is the value of a label:confers property of R:

Example of processing domain match

Given the rule set:

<label:Ruleset rdf:parseType="rdf:List" 
  xmlns:label="http://www.w3.org/2004/12/q/contentlabel#"
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <label:hostRestriction>example.com</label:hostRestriction>
  <label:rules>
    <label:Matches>
      <label:value>http://www.example.com/.*</label:value>
      <label:confers>
        <label:PropertySet>
 	  <label:hasContentLabel rdf:resource="http://www.example.com/labels.rdf#all-ages">
       </label:PropertySet>
      </label:confers>
    </label:Matches>
    <label:Matches>
      <label:value>.*</label:value>
      <label:confers>
        <label:PropertySet>
          <label:hasContentLabel>
 	     <label:ContentLabel rdf:about="http://www.example.com/labels.rdf#default"/>
	  </label:hasContentLabel>
        </label:PropertySet>
	</label:confers>
    </label:Matches>
  </label:rules>
</label:Ruleset>

The input URI "http://www.example.com/foo.html" would be processed as follows:

The host of the URL, www.example.com, is a valid subdomain of the host restriction for this rule set, so processing may continue.

The URL matches the first label:Matches rule by matching the regular expression http://www.example.com/.*, so a subgraph is extracted from the value of the label:confers property. In the extracted graph, the label:PropertySet resource is replaced by the resource URL passed in to the evaluation process, giving the result graph:

<rdf:Description rdf:about="http://www.example.com/foo.html">
  <label:hasContentLabel rdf:resource="http://www.example.com/labels.rdf#all-ages"/>
</rdf:Description>

The input URI "http://adult.example.com/" would be processed as follows:

The host of the URL, adult.example.com, is a valid subdomain of the domain restriction for this rule set, so processing may continue.

The URL does not match the first label:Matches rule.

The URL matches the second label:Matches rule by matching the regular expression ".*", so the following result graph is generated:

<rdf:Description rdf:about="http://adult.example.com/">
  <label:hasContentLabel rdf:resource="http://www.example.com/labels.rdf#default"/>
</rdf:Description>

The input URI "http://www.anotherdomain.com" would be processed as follows:

The domain of the URL, www.anotherdomain.com, is not a valid subdomain of the host restriction for this rule set, so an empty results graph is returned.

Specific links

A resource may link directly to a specific label as discussed above. In this case no processing is required to find a match. However, the ruleset, if present, SHOULD be processed to check for host restrictions. If no host restriction is declared, the agent MAY still apply the content label to the resource but equally may ignore it.