W3C

SKOS Use Cases and Requirements

W3C Working Draft 16 May 2007

This version:
http://www.w3.org/TR/2007/WD-skos-ucr-20070516/
Latest version:
http://www.w3.org/TR/skos-ucr
Previous version:
This is the first public Working Draft
Editors:
Antoine Isaac, Vrije Universiteit Amsterdam, aisaac@few.vu.nl
Jon Phipps, Cornell University, jphipps@madcreek.com
Daniel Rubin, Stanford Medical Informatics, dlrubin@stanford.edu

Abstract

Knowledge organisation systems, such as taxonomies, thesauri or subject heading lists, play a fundamental role in information structuring and access. The Semantic Web Deployment Working Group aims at providing a model for representing such vocabularies on the Semantic Web: SKOS (Simple Knowledge Organisation System).

This document presents the preparatory work for a future version of SKOS. It lists representative use cases, which were obtained after a dedicated questionnaire was sent to a wide audience. It also features a set of fundamental or secondary requirements derived from these use cases, that will be used to guide the design of SKOS.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document is the first public Working Draft of the "SKOS (Simple Knowledge Organization System) Use Cases and Requirements", developed by the W3C Semantic Web Deployment Working Group [SWD]. The SWD Working Group is chartered to advance the November 2005 SKOS Core Vocabulary Specification Working Draft and the SKOS Core Guide Working Draft to W3C Recommendation.

The Use Cases detailed in this document have been selected as representative of the use cases submitted in response to a "Call for Use Cases" published in December 2006. These use cases as well as Issues identified by the working group have resulted in draft Requirements that will guide the design of the future SKOS Recommendaton. Early feedback is therefore most useful. Feedback on use cases that can help to resolve open issues is especially important. Note also that any feature listed under Candidate Requirements should be considered as "at risk" without further feedback.

Comments on this Working Draft are encouraged and may be sent to public-swd-wg@w3.org; please include the text "[SKOS] UCR comment" in the subject line. All messages received at this address are viewable in a public archive. Commentors may wish to review the list of open issues before generating a new comment.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents


1 Introduction

Knowledge organisation systems play a fundamental role in information structuring and access, e.g. for asset description or web site organisation. Such vocabularies, coming in the form of thesauri, classification schemes, subject heading lists, taxonomies or even folksonomies, are developed and used worldwide, by institutions as well as individuals. However these very important knowledge resources are still mostly isolated from the outside world, and not widely used in implementing systems.

The development of new information technologies and infrastructures, such as the World Wide Web, calls for new ways to create, manage, publish and use these knowledge organisation systems. It is especially expected that conceptual schemes will benefit from greater shareability, e.g. by being published via web services. In the meantime, the documentary systems which use them will turn to advanced information retrieval techniques to construct most of their semantic structure and lexical content.

SKOS (Simple Knowledge Organisation System) [SWBP-SKOS-CORE-GUIDE] provides a model to represent and use vocabularies and ontologies in the framework of the Semantic Web. A first version has been produced by the Semantic Web Best Practices and Deployment working group [SWBPD], and is already used in some research projects. The Semantic Web Deployment Working Group [SWD] has been chartered to continue this work, and to "produce guidelines and an RDF vocabulary (SKOS) for transforming an existing vocabulary representation into an RDF/OWL representation" [SWD-Charter].

In order to delimit the scope and elicit the required features for SKOS, the SWD working group has issued a call for use cases, asking for descriptions of existing or planned SKOS applications, according to a specific questionnaire. Following the gathering of these use cases, the Working Group has elicited a number requirements for SKOS which are motivated by the previous work on SKOS, or by contributions received after the call for use cases.

This document gives an account of this process. First, section 2 presents summaries of selected contributions, and pointers to the complete set of cases which were sent to the Working Group. Second, section 3 lists the requirements the Working Group has elicited so far.

2 Use Cases

2.1 Use Case #1 — An integrated view to medieval illuminated manuscripts

(Contributed by Antoine Isaac.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucManuscriptsDetailed and at http://www.w3.org/2006/07/SWD/wiki/EucIconclassDetailed)

The purpose of this application is to provide the user with access to two collections of illuminated manuscripts from the Dutch and French national libraries, Medieval Illuminated Manuscripts and Mandragore (accessible online at http://www.kb.nl/manuscripts and http://mandragore.bnf.fr). The descriptions of images from these two collections follow different metadata schemes, and contain values from different controlled vocabularies for subject indexing. The user should however be able to search for items from the two collections using his preferred point of view, either using vocabulary from collection 1 or vocabulary from collection 2.

The main feature of the application is collection browsing, which uses hierarchical links in vocabularies: if a concept matching a query has subconcepts, the documents indexed against these subconcepts should be returned. The application also uses mapping links between concepts from the two vocabularies. For example, if an equivalence link is found between a query concept from one vocabulary and another concept from the second one, documents indexed by this other concept shall also be included in the query results.

Requires: R-ConceptualRelations, R-IndexingRelationship

Additionally, the application enables search based on free text queries over the collection metadata: documents can be retrieved based on free-text querying of the different fields used to describe the documents (creator, place, subject, etc.). For subject indexing, if a text query matches the label of a controlled vocabulary concept, the documents indexed against this concept will be returned.

The two collections use respectively the Iconclass and Mandragore analysis vocabularies.

Iconclass (http://www.iconclass.nl) contains 28000 items used to describe the subjects of an image (persons, event, abstract ideas). Complete versions are available for English, German, French, Italian, and partial translations for Finnish and Norwegian.

Requires: R-MultilingualLexicalInformation

The main building blocks of Iconclass are subjects, used to describe the subjects of images. An Iconclass subject consists of a notation (an alphanumeric identifier used for annotation) and a textual correlate (e.g. “25F9 mis-shapen animals; monsters”). Subjects are organized in hierarchical trees, as in the following extract:

2 Nature
25 earth, world as celestial body

25F animals

25F(+) KEY

25F1 groups of animals


25F9 mis-shapen animals; monsters

25FF fabulous animals (sometimes wrongly called 'grotesques'); 'Mostri' (Ripa)

Subjects can have associative cross-reference links between them (systematic references) and are linked to keywords that are used to search for them in Iconclass tools. Keywords form a network of their own, featuring see links (from one non-preferred keyword, not attached to any subject, to a preferred one), see also links (between keywords that are semantically or iconographically related) and translation links (between keywords in different languages).

Requires: R-LabelRepresentation, R-RelationshipsBetweenLabels

Iconclass additionally provides auxiliary mechanisms for subject specialization at indexing time. These actually allow for collection-specific vocabulary extension:

Requires: R-ConceptSchemeExtension, R-SkosSpecialization, R-IndexingAndNonIndexingConcepts, R-ConceptCoordination

Maintenance of the vocabulary is done via manual editing of semi-structured source files. As a general rule, the standard version will only be changed in a conservative way, not modifying the existing subjects.

Mandragore contains 16000 subjects. 15800 are descriptors, which are used to describe the illuminations and form a flat list. Additional structure is given by 200 abstract topic classes which form a hierarchy organizing the descriptors according to general domains, but cannot themselves be used to describe documents:

ZOOLOGIE

.zoologie (généralités)

.mollusques

.mammifères

cochon [mammifère ongulé]

girafe [mammifère ongulé]

A descriptor is specified by a French label (“cochon”, for pig), optional rejected forms (“porc”), an optional definition (“mamifère ongulé”, hoofed mammal) and a reference to one or more topic classes (“.mammifères”, mammals). A note can sometimes be found as a complementary definition.

To enable integrated browsing, elements from Mandragore and Iconclass vocabularies must be linked together using equivalence or specialization links as in the following:

25F72 molluscs (Iconclass) is equivalent to mollusques (Mandragore)

25F711 insects (Iconclass) is more specific than autres invertébrés (vers,arachnides,insectes...) ("other invertebrates (worms, arachnida, insects", Mandragore)

11U4 Mary and John the Baptist together with (e.g. kneeling before) the judging Christ, 'Deesis' ~ Last Judgement (Iconclass) is equivalent to the combination of subjects s.marie, s.jean.baptiste, christ and jugement.dernier (Mandragore)

25F(+441) herd, group of animals (Iconclass) is equivalent to troupeau (Mandragore)

Requires: R-ConceptualMappingLinks

2.2 Use Case #2 — Bio-zen ontology framework for representing scientific discourse in life science

(Contributed by Matthias Samwald, Medizinische Universität Wien.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucBiozenDetailed)

Bio-zen (http://neuroscientific.net/index.php?id=43) allows the description of biological systems and the representation of scientific discourse on the web in a highly distributed manner. It is intended to be used by researchers and developers in the life sciences.

SKOS is used in bio-zen for the representation of many existing life sciences vocabularies, taxonomies and ontologies coming from the "Open Biomedical Ontologies" (OBO) collection (http://www.fruitfly.org/~cjm/obo-download/). The size of all converted taxonomies taken together is on the order of millions of concepts. Typical examples are the Gene Ontology or Medical Subject Headings (MeSH), an entry of which is displayed here:

id MESH:A.01.047.025
name abdominal_cavity
def "The region in the abdomen extending from the thoracic DIAPHRAGM to the plane of the superior pelvic aperture (pelvic inlet). The abdominal cavity contains the PERITONEUM and abdominal VISCERA\, as well as the extraperitoneal space which includes the RETROPERITONEAL SPACE." [MESH:A.01.047.025]
synonym abdominal_cavity
synonym cavitas_abdominis
is_a MESH:A.01.047 ! abdomen

To represent such vocabulary elements as well as other types of information, the existing SKOS model has been integrated into a single OWL ontology, together with the DOLCE foundational ontology and the Dublin Core metadata model. In the process, the SKOS model has been extended with special types of concepts, e.g. biozen:sequence-concept. To enable efficient reasoning with the available dataset, it is important to note that existing constructs have been made compatible with the OWL-DL language.

Requires: R-CompatibilityWithOWL-DL

The bio-zen framework will consist of several applications, especially Semantic Wikis. A Bio-zen ontology incorporates constructs to make statements about digital information resources, that is creating "concept tags". This concept-tagging is an important feature of bio-zen, because it eases the integration of information from different sources.

Requires: R-IndexingRelationship

2.3 Use Case #3 — Semantic search service across mapped multilingual thesauri in the agriculture domain

(Contributed by Margherita Sini and Johannes Keizer, Food and Agriculture Organization.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucAimsDetailed)

This application coming from the AIMS project (http://www.fao.org/aims) is a semantic search service that makes use of mapped agriculture thesauri. It allows users to search any available terminology in any of the languages in which the thesauri are provided and retrieve information from resources which may have been indexed by one of the mapped vocabularies. Typical functions are navigating resources, helping to build boolean searches via concept identification, or expanding given searches by extra languages or synonyms.

Requires: R-IndexingRelationship

The service builds on several agriculture vocabularies: the Agrovoc Thesaurus (http://www.fao.org/aims/ag_intro.htm), the Agris/Caris Classification Scheme (ASC), the FAO Technical Knowledge Classification Scheme (TKCS), the subjects from the FAOTERM vocabulary, etc.

Agrovoc contains 35000 terms in 12 languages (not all of the languages feature the same translated terms, however), while ASC, TCKS and FAOTERM range between 100 and 200 categories available in the 5 official FAO languages. Agrovoc terms consist of one or more words and always represent a single concept. Terms are divided into Descriptors and non-descriptors, the first currently only used for indexing. For each descriptor, a word block is displayed showing the relation to other terms: BT (broader term), NT (narrower term), RT (related term), UF (non-descriptor). There are also scope notes, used to clarify the meaning of both descriptors and non-descriptors.

Term code 1939
Term label EN : Cows, FR : Vache, ES : Vaca, AR : بقرات , ZH : ?牛 , PT : Vaca, CS : krávy, JA : 雌牛 , TH : ?ม่โค , SK : kravy, DE : KUH
BT Cattle (code 1391)
NT Suckler cows, Dairy cows (26767, 36875)
RT Heifers, Cow milk, Milk yielding animals, Females (3535, 4833, 15969, 16080)
SNR Females (15969)
Scope Note Use only for cattle and zebu cattle; for other species use "Females" (15969) plus the descriptor for the species

Requires: R-ConceptualRelations, R-LabelRepresentation, R-TextualDescriptionsForConcepts, R-MultilingualLexicalInformation

Actually, the AIMS project includes some more specific links, presented in http://www.fao.org/aims/cs_relationships.htm: Concept-to-Concept relationships (subclass of; caused by; member of; part of), Term-to-Term relationships (related term; synonym; translation) and String-to-String relationships (spelling variant; acronym).

Examples of such links are:

synonym bucket pail
abbreviation_of Corp. Corporation
acronym Food and Agriculture Organization FAO
spelling_variant organisation organization
translation vache cow
scientific_taxonomic_name African violet Saintpaulia

Requires: R-SkosSpecialization, R-RelationshipsBetweenLabels

Currently the Agrovoc management system lacks distributed maintenance, but it is expected that a new system will soon solve this problem, which is crucial since changes are made by experts from all over the world.

For AIMS, Agrovoc has been converted into SKOS (ftp://ftp.fao.org/gi/gil/gilws/aims/kos/agrovoc_formats/skos/2006) and is being mapped to two other vocabularies: the Chinese Agricultural Thesaurus (CAT) and the National Agricultural Library thesaurus (NAL). This mapping uses links inspired by the SKOS mapping vocabulary [SWBP-SKOS-MAPPING], as below:

CAT-ID CAT-EN Map AG-ID AG-EN AG-ID AG-EN
30854 Senta flammea Exact 9748 Cheena
50008 Mayetola destructor Exact-OR 24260 Triticale (gramineae) 7949 Triticales (product)
1160 Two-shear sheep NT1 3662 Hordeum vulgare

Requires: R-ConceptualMappingLinks

2.4 Use Case #4 — Supporting product life cycle

(Contributed by Sean Barker, BAE Systems.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucProductLifeCycleSupportDetailed)

The problem of the Product Life Cycle Support (PLCS) application is to integrate a network of interconnected supply chains, with multiple, large customers buying a wide range of products (from shoes to aircraft) each dictating their own standards, and with every supplier being part of multiple supply chains. Each customer wants to maintain a common approach over all its supply chains. And each supplier wants to maintain the same system for each of the supply chains it works in.

The aim of this application is to propose a data exchange mechanism for managing the life support of complex products (http://www.oasis-open.org), including configuration definition, maintenance definition, maintenance planning and scheduling, and maintenance and usage recording (including configuration change).

For that, an upper ontology of several hundred items for the description of the product life cycle will be defined. There is no chance of the entire supply system (10,000's of businesses) developing a single detailed model. However, given the upper ontology, they will be free to specialize individual ontology terms (playing the role of place holders for local extension) to meet their precise needs.

PLCS is conceptually a co-operatively developed web in XML, with the live version being a set of runtime views assembled from files submitted by a dozen or so contributors. It may be useful, where ontologies diverge, to map terms between the diverging branches, either to indicate where terms can be harmonized to their equivalent, or to identify that there is a similarity link that is not exact equivalence.

Requires: R-ConceptualRelations, R-ConceptSchemeExtension, R-ConceptualMappingLinks

The PLCS vocabulary addresses hundreds of separate functions, including classification of items, classification of information usages (e.g. types of part identifier), classification of entity roles (e.g. date as start date) or classification of relationships (e.g. supersedes).

Typical examples of terms are:

Identification_code An Identification_code is an identifier_type which is encoded according to some convention. Typically but not necessarily concatenated from parts each with a meaning. E.g. tag number, serial number, package number and document number.
Part_identification_code A Part_indentfication_code is an Identification_code that identifies the types of parts. For example, a part number.

CONSTRAINT: An Identification_assignment classified as a Part_identification_code can only be assigned to Part Organization_name

Owner_of An Owner_of is an Organization_or_person_in_organization_assignment that is assigning a person or organization to something in the role of owner.

For example, the owner of the car.

The vocabulary has been encoded using OWL, and is managed via the Protege OWL editor.

Requires: R-TextualDescriptionsForConcepts

2.5 Use Case #5 — CHOICE@CATCH ranking of candidate terms for description of radio and TV programs

(Contributed by Véronique Malaisé and Hennie Brugman, Vrije Universiteit Amsterdam and Max Planck Institute for Psycholinguistics.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucRankingForDescriptionDetailed and at http://www.w3.org/2006/07/SWD/wiki/EucGtaaBrowser)

Radio and television programs at the Dutch national broadcasting archive (Sound and Vision) are typically associated with contextual text descriptions: web site texts, subtitles, program guide texts, texts from the production process, etc. These context documents are used by documentalists at Sound and Vision who manually describe programs using concepts from the GTAA thesaurus (Gemeenschappelijke Thesaurus Audiovisuele Archieven - Common Thesaurus for Audiovisual Archives).

The CHOICE project (part of the Dutch CATCH research program) uses natural language processing techniques to automatically extract candidate GTAA terms from the context documents. The application focused on in this section takes these candidate terms as input, and ranks them on the basis of the structure of the GTAA thesaurus. For example, the fact that "Voting" and "Democratization" are related in GTAA by a two-step path (via the "Election" term and two "related-to" links) will positively influence the ranking of these terms. Ranked terms will be presented to documentalists to speed up their description work.

The GTAA vocabulary covers a wide range of topics, as it is meant to describe anything that can be broadcast on TV or radio. It contains approximately 160,000 terms, divided into 6 disjoint facets: Keywords, Locations, Person Names, Organization-Group-Other Names, Maker Names, and Genres.

The thesaurus mainly uses constructs from the ISO 2788 standard, like Broader Term, Narrower Term, Related Term and Scope Notes. Terms from all facets of the GTAA may have Related Terms, Use/Use For and Scope Notes, but only Keywords and Genres can also have Broader Term/Narrower Term relations, organizing them into a set of hierarchies. In addition to these standard features, Keywords terms are thematically classified in 88 subcategories of 16 top Categories.

Preferred Term ambachten (crafts)
Related Terms ondernemingen (ventures) , beroepen (professions), artistieke beroepen (artistic professions)
Broader Term beroepen (professions)
Narrower Terms boekbinders (bookbinders), bouwvakkers (building workers), glasblazers (glassblowers)
Scope Note niet voor afzonderlijke ambachten maar alleen als verzamelbegrip, bijv. voor (markten van) oude ambachten (not for specific crafts, only in general meaning, e.g. (markets of) old crafts)
Categories 05 economie (economy), 09 techniek (technique)

Requires: R-ConceptualRelations, R-LabelRepresentation, R-SkosSpecialization

The application, envisioned as a SOAP web service, uses a Sesame RDF web repository containing the SKOS version of the GTAA thesaurus to retrieve the 'term contexts' of the terms in the input list, which is stored in a local RDF repository.

This term context includes, for one given term, all terms that are directly connected to it by Broader Term, Narrower Term or Related Term relations. This includes pre-computed inter-facet links that are not part of the ISO standard, though allowed by the GTAA data model. For example, one can link a "King" in the Person facet to the general subject "Kings" and the country which this King rules.

For the ranking, it is now assumed that candidate terms that are mutually connected by thesaurus relations (directly or indirectly) are more likely to be good descriptions than isolated candidate terms. Later on, it might be interesting to differentiate between types of thesaurus relations, or to use more complex patterns of these relations.

The thesaurus-based recommendation system can also be integrated with a recommendation system that is based on co-occurences between terms that are used in previously existing descriptions of programs.

2.6 Use Case #6 — BIRNLex: a lexicon for neurosciences

(Contributed by William Bug, Drexel University College of Medicine.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucBirnLexDetailed)

BIRNLex is an integrated ontology+lexicon used for various purposes — some end-user/interactive, others back-end/infrastructure — within the BIRN Project to support semantically-formal data annotation, semantic data integration, and semantically-driven, federated query resolution.

Requires: R-ConceptualMappingLinks, R-IndexingRelationship, R-LexicalMappingLinks

Below are examples of BIRNLex class definitions that illustrate the need for lexical support and links to external knowledge sources. The general design goals have been to use both the Dublin Core metadata elements and SKOS where ever possible. The goal is to use SKOS for all lexical qualities. There are certain annotation properties that should be shared across all biomedical knowledge resources. There are other required elements specific to the specific needs in BIRN (the group producing BIRNLex).

Class Anterior_ascending_limb_of_lateral_sulcus
birn_annot:birnlexCurator Bill Bug
birn_annot:birnlexExternalSource NeuroNames
birn_annot:bonfireID C0262186
birn_annot:curationStatus raw import
birn_annot:neuronames ID 49
birn_annot:UmlsCui C0262186
obo_annot:createdDate "2006-10-08"^^http://www.w3.org/2001/XMLSchema#date
obo_annot:modifiedDate "2006-10-08"^^http://www.w3.org/2001/XMLSchema#date
skos:prefLabel Anterior_ascending_limb_of_lateral_sulcus
skos:scopeNote human-only


Class Medium_spiny_neuron
birn_annot:birnlexCurator Maryann Martone
birn_annot:birnlexDefinition The main projection neuron found in caudate nucleus, putamen and nucleus accumbens...
birn_annot:bonfireID BF_C000100
birn_annot:curationStatus pending final vetting
dc:source Maryann Martone
obo_annot:createdDate "2006-07-15"^^http://www.w3.org/2001/XMLSchema#date
obo_annot:modifiedDate "2006-09-28"^^http://www.w3.org/2001/XMLSchema#date
skos:prefLabel Medium_spiny_neuron

Requires: R-CompatibilityWithDC, R-CompatibilityWithOWL-DL, R-ConceptualRelations, R-LabelRepresentation, R-ConceptSchemeExtension

The following is a subset of BIRNLex applications, either extant or in the offing:

In all of these applications, it is critical to have a clear, distinct, and shared representation for the associated lexicon. For instance, when integrating BIRN segmented brain images with those from other projects across the net, use of lexical variants from a variety of public terminologies and thesauri such as SNOMED and MeSH can provide a powerful means to largely automate semantic integration of like entities - e.g., corresponding brain region, equivalent behavioral assays described using different preferred labels/names. In providing a community shared formalism for representing the associated lexicon, SKOS can greatly simplify this task. If, for instance, the lexical repository (collection of Lexical Unique Identifier, each lexical variant of a term getting one LUI) contained in UMLS were represented according to SKOS, this would provide an extremely valuable resource to the community of semantically-oriented bioinformatics researchers, as well as a powerful tool to support latent semantic analysis or natural language processing when linking to unstructured text.

The following are the collection of terminologies and ontologies being linked into BIRNLex: Neuronames, Brainmap.org classification schemes, RadLex, Gene Ontology, Reactome, OBI, PATO, Subcellular Anatomy Ontology (CCDB - http://ccdb.ucsd.edu/), MeSH.

Neuronames concerns brain anatomy and is about 750 classes and thousands of associated lexical variants. Brainmap.org classification includes hierarchies to describe neuroanatomy, subject variables, stimulus conditions, and experimental paradigms associated with functional MRI of the nervous system The Subcellular Anatomy Ontology is designed to describe the subcellular entities associated with ultrastructural and histological imaging of neural tissue. Currently the application is only dealing with English lexical entries.

BIRNLex curators are working with the National Center for Biomedical Ontology (NCBO) to adopt the OBO Foundry recommendations in the construction of BIRNLex. Use of SKOS elements can be useful, so that, for instance, software applications can draw on "skos:prefLabel", "obo_annot:synonym", "obo_annot:definition", etc.

The management of BIRNLex is currently done manually in Protege-OWL.

Requires: R-CompatibilityWithOWL-DL

However, the ultimate goal is to adopt a client-server infrastructure that will created an RDF-based backend store and support both curation of the ontology and annotation using the ontology via Java Portlet-based applications. BIRN has a core infrastructure staff dedicated to use of the GridSphere Java Portlet implementation framework (www.gridsphere.org).

2.7 Use Case #7 — Radlex: a lexicon for radiology

(contributed by Curt Langlotz.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/EucRadlexDetailed)

RadLex provides a structured vocabulary of terms used in the field of radiology. Currently completed are listings of anatomic terms and "findings", which includes things that can be seen on or inferred from images produced by radiologists. These two sets include a total of about 7500 terms. A list of the terms used to describe the creation of such images, including information about the equipment used and the various imaging sequences performed, will be complete by the end of 2007.

An example application demonstrating functionality is an image annotation program that reads in RadLex and provides users the ability to search for and use particular RadLex terms to associate with images, post-coordinating them if necessary. Users would want to be able to retrieve RadLex terms by name or synonym.

Requires: R-ConceptualRelations, R-LabelRepresentation, R-TextualDescriptionsForConcepts, R-ConceptCoordination

RadLex, which can be searched and browsed online at www.radlex.org, is a taxonomy currently built predominantly using is-a relations. But there are also part-of and other relations (especially for anatomy), and new relations will be added as RadLex expands. Each term has a rich set of metadata fields to include provenance information and terminological data such as synonyms, definition, and related terms from other vocabularies.

The practical fields include:

and optionally, any

Requires: R-ConceptualRelations, R-AnnotationOnLabel, R-RelationshipsBetweenLabels, R-LexicalMappingLinks

The relationships used among terms include:

For instance, “nervous system” has a part called “brain”, and “nervous system” contains “nervous system spaces”. The view of the hierarchy itself does not reveal the relationships among the terms; this information is found within the term features, shown in this format on the right-hand side. In this framework, the hierarchy is generated from the different relationships among terms, using either SPARQL or a custom interface to an application that consumes the terminology.

There are 9 separate hierarchies in the vocabulary: Treatment; Image acquisition, Processing and Display; Modifier; Finding; Anatomic Location; Uncertainty (to be renamed Certainty); Teaching Attribute; Relationship; and Image Quality (as seen in the screenshots above). There are currently no relations holding between terms in different hierarchies, though this could be developed in future (e.g. linking of particular Findings to potential Anatomic Locations).

The Radlex vocabulary is provided in English, with plans to include other languages (e.g., German).

Requires: R-MultilingualLexicalInformation

Protégé has been used to create a machine-readable version of the vocabulary, which is available at http://www.radlex.org/radlex/docs/downloads.html. RadLex will be available in OWL-DL in the future.

Requires: R-CompatibilityWithOWL-DL

During the design of the vocabulary, basic guidelines from Cimino and Chute were used, such as ensuring that a term only corresponds to one concept. As the terminology is being developed into a more structured form, with more types of relationships, different parents are being allowed as long as the relationship type is different. E.g. one IS-A parent, one PART-OF parent, etc.

Potential changes in the vocabulary are submitted to the chair of the RadLex Steering Committee of the Radiological Society of North America, who consults with the relevant lexicon development committee. Accepted changes are periodically incorporated into the vocabulary. The first release was made public in November 2006.

Currently, a mapping is being developed between RadLex and the corresponding terms/codes in SNOMED (Systematized Nomenclature of Medicine) and the ACR (American College of Radiology) Index, the vocabularies that were used as a starting point for terminology development.

From a representational point of view, this mapping shall consist of equivalence and specialization links. Later, we expect people to compose atomic terms (post-coordination) to describe composite entities.

Requires: R-ConceptCoordination

2.8 Use Case #8 — NSDL Metadata Registry

(Contributed by Jon Phipps, Cornell University.
Complete description available at
http://www.w3.org/2006/07/SWD/wiki/RucMetadataRegistryExtended)

The NSDL Registry is intended to provide a complete vocabulary development and management environment for development of controlled vocabularies. Services are primarily directed at vocabulary owners and include provisions for:

The registry currently has a number of vocabularies registered. A sample entry of a vocabulary/scheme and a single concept is below (taken from http://metadataregistry.org/uri/NSDLEdLvl.html).

Scheme NSDLEdLvl
Name NSDL Education Level Vocabulary
Owner National Science Digital Library
Community Science, Mathematics, Engineering, Technology
URL http://metamanagement.comm.nsdl.org/cgi-bin/wiki.pl?VocabDevel
Concept NSDLEdLvl/1023
Label Middle School
Top Concept No
Status published
history note Term source: http://www.ed.gov
has narrower Grade 6
has narrower Grade 7
has broader Grades Pre-K to 12
alternative label Junior High School

2.9 Other use cases

The SWD Working Group maintains on its wiki site the complete list of descriptions that were sent following its call for use cases:

3 Requirements

The use cases presented in the previous section motivate a number of requirements that the SKOS specification must or should meet in order to fulfill its aim as a standard model for porting simple concept schemes on the semantic web. Depending on the level of consensus reached in the Working Group, these requirements are categorized into accepted and candidate requirements.

Note: in the following, to avoid ambiguities, vocabulary will be used to refer to the SKOS vocabulary, that is, the set of constructs (classes, properties) introduced in the SKOS model. Concept Scheme will be used to refer to the objects built with SKOS, i.e. the application-specific collections of concepts that are mentioned in SKOS use cases.

@@ Some requirements are linked to issues that are still being examined by the Working Group, as found on the wiki site http://www.w3.org/2006/07/SWD/wiki/SkosIssuesSandbox. @@

3.1 Accepted requirements

R-ConceptualRelations
Representation of relationships between concepts
The SKOS model shall provide semantic relationships between concepts, for display or search purposes. Typical examples are the hierarchical relations broader than (BT), narrower than (NT) and the non-hierarchical associative relation related to (RT).

Motivation: Tgn, Manuscripts, Aims, ProductLifeCycleSupport, RankingForDescription, etc.

R-ConceptSchemeExtension
Extension of concept schemes
A concept scheme might be locally extended with new concepts referring to existing ones, e.g. as specializations of these.

Motivation: Manuscripts, BirnLex, ProductLifeCycleSupport

Correspondence/Mapping links between concepts from different concept schemes
In order to build links between concepts coming from different concept schemes, SKOS should provide proper semantic relationships. Possible links, similarly to the ones found existing SKOS and SKOS mapping [SWBP-SKOS-MAPPING] vocabularies, include concept equivalence and specialization/generalization relations.

Motivation: Manuscripts, Aims, ProductLifeCycleSupport, BirnLex, MetadataRegistry

R-LabelRepresentation
Representation of basic lexical values (labels) associated to concepts
The SKOS model shall provide means to represent the labels (preferred or not) of a concept, for display or search purposes.

Motivation: Tgn, Manuscripts, Aims, RankingForDescription, etc.

R-MultilingualLexicalInformation
Representation of lexical information in multiple natural languages
The lexical information specified in concept schemes (labels, but also definitions and notes) could come in different natural languages. A typical example is the case of a multilingual concept scheme with concepts having labels translated in several languages.

Motivation: Manuscripts , Aims, RadLex

R-SkosSpecialization
Local specialization of SKOS vocabulary
For particular situations, the designer of a SKOS concept scheme should be able to introduce new model-level classes and properties, and link them to existing SKOS constructs. Possible cases include the creation of specific kinds of textual definitions or notes for concepts, or the specification of new types of concepts.

Motivation: Manuscripts, Tgn, Aims, Biozen, RankingForDescription

@@ Linked to SKOS-I-extension-6, SKOS-I-SpecializationOfRelationships @@

R-TextualDescriptionsForConcepts
Representation of textual descriptions attached to concepts
The SKOS model shall provide means to represent descriptive notes that could help understanding the elements of concept schemes, e.g. scope notes explaining the way concepts are used to describe documents.

Motivation: Aims, ProductLifeCycleSupport, TacticalSituationObject, BirnLexDetailed, etc.

3.2 Candidate requirements

R-AnnotationOnLabel
Ability to represent annotations on lexical items
Labels, which are currently modeled as literals in SKOS, as well as possibly other literals, are valid subjects of discourse when modeling concept schemes, e.g. when recording the dates during which a particular label was in common use. However, in RDF only resources may be subjects of statements, and literals may only be objects of statements. The question then arises, how are we to annotate labels and other literals, that is to relate them as subjects, to other entities.

Motivation: RadLex

@@ Linked to SKOS-I-AnnotationOnLabel @@

R-CompatibilityWithDC
Compatibility between SKOS and Dublin Core Abstract Model
Using SKOS model shall be compatible with using Dublin Core Abstract Model [DCAM]. When there are links between SKOS features and Dublin Core ones, these shall be specified.

Motivation: BirnLex

@@ Linked to SKOS-I-CompatibilityWithDC @@

R-CompatibilityWithISO11179
Compatibility between SKOS and ISO11179[Part 3]
SKOS model shall be compatible with part 3 of ISO 11179 specifications [ISO11179-3].

@@ Linked to SKOS-I-CompatibilityWithISO11179 @@

R-CompatibilityWithISO2788
Compatibility between SKOS and ISO2788
SKOS model shall be compatible with ISO 2788 specifications [ISO2788].

@@ Linked to SKOS-I-CompatibilityWithISO2788 @@

R-CompatibilityWithISO5964
Compatibility between SKOS and ISO5964
SKOS model shall be compatible with ISO 5964 specifications [ISO5964].

@@ Linked to SKOS-I-CompatibilityWithISO5964 @@

R-CompatibilityWithOWL-DL
OWL-DL compatibility
SKOS should provide a legal OWL-DL ontology, to be compatible with most common editors and reasoners.

Motivation: Biozen, BirnLex, RadLex

@@ Linked to SKOS-I-owlImport-7, SKOS-I-Semantics-10 @@

R-ConceptCoordination
Coordination of concepts
SKOS should provide the ability to create new concepts from existing ones, e.g. by using special qualifiers that add a shade of meaning to a normal concept.

Motivation: Manuscripts, RadLex, UDC, Rameau

R-ConceptSchemeContainment
Ability to explicitly represent the containment of any SKOS individual or statement within a concept scheme
It shall be possible to explicitly represent the containment of any individual which is an instance of a SKOS class (e.g. skos:Concept) or statement that uses SKOS property as predicate (e.g. skos:broader) within a concept scheme.

@@ Linked to SKOS-I-ConceptSchemeContainment @@

R-ConsistencyChecking
Checking the consistency of a concept scheme
Some SKOS applications might require testing the integrity of their concept scheme data. For example, conceptual relationships should only apply to individuals of type skos:Concept, and not for example between the (non-preferred) labels of concepts.

Motivation: GtaaBrowser, MetadataRegistry

@@ Linked issue: SKOS-I-Semantics-10 @@

R-GroupingInConceptHierarchies
Ability to include grouping constructs in concept hierarchies in thesauri
Concept schemes can contain elements (arrays, guide terms, etc.) used to group normal concepts together, e.g. based on a shared semantic property. While these special elements cannot be used for description purposes, they can be introduced in a concept scheme's hierarchy by means of generalization and specialization links.

@@ Linked to SKOS-I-GroupingInConceptHierarchies, SKOS-I-collections-5 @@

R-IndexingAndNonIndexingConcepts
Ability to distinguish between concepts to be used for indexing and for non-indexing
SKOS should provide different classes for conceptual entities that can be used for indexing resources and for those that cannot be used for such a purpose (e.g. specific qualifiers that can only be used to narrow down the meaning of an existing concept).
Motivation: Manuscripts, UDC, Rameau

@@ Linked to SKOS-I-IndexingAndNonIndexingConcepts, SKOS-I-coordination-8 @@

R-IndexingRelationship
Ability to represent the indexing relationship between a resource and a concept that indexes it
The SKOS model should contain mechanisms to attach a given resource (e.g. corresponding to a document) to a concept the resource is about, e.g. to query for the resources described by a given concept.

Motivation: Manuscripts, Biozen, Aims, BirnLex

@@ Linked to SKOS-I-IndexingRelationship @@

Correspondence mapping links between lexical labels of concepts in different concept schemes
In the process of mapping different concept schemes, it should be possible to identify correspondence links not only between concepts from these concept schemes, but also between the labels that can be attached to these concepts.

Motivation: RadLex, BirnLex

@@ Linked to SKOS-I-LexicalMappingLinks @@

R-MappingProvenanceInformation
Ability to record provenance information on mappings between concepts in different concept schemes
It shall be possible to record provenance information on mappings between concepts in different concept schemes.

Motivation: MetadataRegistry

@@ Linked to SKOS-I-MappingProvenanceInformation @@

R-RelationshipsBetweenLabels
Representation of links between labels associated to concepts
The SKOS model shall provide means to represent relationships between the terms associated with concepts. Typical examples are translation links between labels from different languages, or the link between one label and its abbreviation, when this stands for an alternative label for the concept.

Motivation: Manuscripts, Aims, RadLex

4 Conclusion

To elicit the requirements that a new version of the Simple Knowledge Organisation System (SKOS) should meet, the Semantic Web and Deployment working group has issued a call for use cases to the different communities that are concerned by the use of SKOS.

More than 25 submissions have been sent to the working group, which illustrates the variety of usages one can make of such a proposal. In this document, eight of them were selected as being the most representative.

Some of these use cases have come with very high-quality descriptions, and most correspond to development efforts that are presently being carried out, going therefore beyond pure research hypotheses. This gives a sound basis for the process of gathering requirements for SKOS, which the second part of this document describes.

Currently, requirements are divided into accepted and candidate requirements, reflecting the level of consensus they have reached in the Working Group at the time this document was created. In the near future, the Working Group will have to make a final decision regarding the candidate requirements, either accepting them or rejecting them. It will of course have to adapt the existing SKOS material so that it meets the accepted requirements.

References

[DCAM]
DCMI Abstract Model, A. Powell, M. Nilsson, A. Naeve, P. Johnston, 7 March 2005.
[ISO11179-3]
ISO/IEC 11179-3: 2003(E), Information Technology – Metadata Registries (MDR) – Part 3: Registry metamodel and basic attributes, Second edition. R. Gates, Editor, 15 February 2003.
[ISO2788]
ISO 2788:1986 Documentation - Guidelines for the establishment and development of monolingual thesauri. Second edition. ISO TC 46/SC 9, 1986.
[ISO5964]
ISO 5964:1985 Documentation - Guidelines for the establishment and development of multilingual thesauri. First edition. ISO TC 46/SC 9, 1985.
[SWBP-SKOS-CORE-GUIDE]
SKOS Core Guide, A. Miles, D. Brickley, Editors, W3C Working Draft (work in progress), 2 November 2005. Latest version available at http://www.w3.org/TR/swbp-skos-core-guide/.
[SWBP-SKOS-CORE-SPEC]
SKOS Core Vocabulary Specification, A. Miles, D. Brickley, Editors, W3C Working Draft (work in progress), 2 November 2005. Latest version available at http://www.w3.org/TR/swbp-skos-core-spec/.
[SWBP-SKOS-MAPPING]
SKOS Mapping Vocabulary Specification, A. Miles, D. Brickley, Editors, W3C Working Draft (work in progress), 11 November 2004. Latest version available at http://www.w3.org/2004/02/skos/mapping/spec/.
[SWBPD]
The Semantic Web Best Practices and Deployment Working Group
[SWD]
The Semantic Web Deployment Working Group
[SWD-Charter]
Semantic Web Deployment Working Group (SWDWG) Charter

Acknowledgments

The editors gratefully acknowledge contributions from Lora Aroyo, Hugh Barnes, Bruce Bargmeyer, Sean Barker, Sean Bechhofer, Pieter Bellekens, Hennie Brugman, Dario Cerizza, Irene Celino, Thierry Cloarec, Francesco Corcoglioniti, Sarah Currier, Emanuele Della Valle, Diane Hillmann, Chris Holmes, Bernard Horan, Julian Johnson, Simon Jupp, Johannes Keizer, Walter Koch, Véronique Malaisé, George Macgregor, Frédéric Martin, John McCarthy, Emma McCulloch, Alistair Miles, Mitsuharu Nagamori, Dennis Nicholson, Matthias Samwald, Margherita Sini, Aida Slavic, Davide Sommacampagna, Robert Stevens, Doug Tudhope, Andrea Turati, Bernard Vatant, Anna Veronesi.