Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document describes an RDF Schema and OWL ontology for representing Wordnet.
This document is an initial strawman draft being developed by the Wordnet Task Force of the Semantic Web Best Practices and Deployment Working Group.
The following section describes the intended status of this document if and when it is pubished.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
We encourage public comments. Please send comments to public-swbp-wg@w3.org
Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
Open issues, todo items:
Sections where further work are marked within the document with @@ and a comment. Open issues and comment can be found in Appendix F
Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The Wordnet @@ref lexicon is proving to be an useful resource for semantic web developers. This document presents an RDF/OWL representation of the entire structure of Wordnet. By doing so, we allow Wordnet data to be accessed via RDF APIs and query languages, and to be mixed with non-Wordnet data, as well as with other lexically-oriented material, such as extensions to, and derivatives of, Wordnet and Wordnet-tagged corpuses.
A related but distinct activity would be to describe the use of Wordnet as a basis for RDF/OWL class and/or property hierarchy. Wordnet's noun term (hypernym) hierarchy captures "an X is a kind of Y" relationships between English category terms based on conventional usage. While there are several projects working in this area, it is not a task we currently address in this document.
This current document does not explore the issues raised by the mapping of Wordnet structures into RDF (eg. noun terms and/or synsets into classes). Future revisions of this document, or companion documents, may address some of the issues this raises, such as the different assumptions underlying lexical databases when contrasted with formal ontologies. Here we concentrate on reflecting into RDF/XML the core structures and content of Wordnet, without consideration for mapping those notions into RDF's own notions of classes, properties and instances.
This approach echoes that of SKOS @@ref , which reflects into RDF the broader/narrower relationships used by thesauri, without requiring that each thesaurus be re-engineered as an RDF/OWL class hierarchy. Unlike SKOS, the structuring vocabulary used here draws directly from the conceptual framework underpinning Wordnet, allowing for concepts such as 'antonym' to be used to relate concepts/synsets. It may be possible for future versions of this document and SKOS to share more common structure, since the structuring vocabularies address similar (yet distinct) problems.
This section describes the structure of the RDFS and OWL representation of Wordnet described here. For a full explanation of Wordnet terms and concepts, the reader should refer to the Wordnet documentation@@ref.
In Wordnet, a word form@@ref is closest to the commonsense meaning of the term word. It is typically a sequence of characters such as "cat", or "dog", or "chat". The same word form can have different meanings in different langauges, for example, in English, the word form "chat" means to converse informally, whilst in French it means a cat. A word represents a word form in a language. The French word "chat" is a different word to the English word "chat", though they both have the same word form "chat". The same word can be used in different senses, for example the word ""dog" (in English) can mean a kind of animal or to follow. A word sense represents a word used in particular sense.
As can be seen from figure 1, words are represented by resources of
type wn:Word
@@ref. The properties
wn:hasWordForm
and wn:hasLanguage
(@@check
these names) relate a word to its word form and language respectively.
The property wn:hasSense
relates words to their
senses. (@@ is there some ordering information we are losing here?)
@@update to include Aldo's changes.
A central concept in Wordnet is the synset@@ref. A synset represents a set of synonyms, that is word senses with similar meanings. Synsets may also be considered to represent concepts in a thesaurus or ontology, but such considerations are beyond the scope of this document at this time. For our present purposes, synsets are considered to be collections of word senses@@ref.
Synsets are represented by resources of type
wn:SynSet
. The properties wn:inSynSet
and
wn:hasWordSense
relate synsets and word senses. A
wn:WordSense
resource can be thought of as representing a
(word, synset) pair.
Wordnet defines semantic relations between resources in this basic
structure that represent linguistic and conceptual relationships
between the terms. For example, the wn:hasHypernym
and
hasHyponym
properties represent hypernym (@@explain) and
hyponym (@@explain) relations between synsets.
This section describes each of the Wordnet classes and properties. @@incomplete
wn:Word
is the class of Wordnet words. A
wn:Word
is a word form in a specific language. The
English word "chat" is different to the French word "chat".
wn:WordSense
is a class of words with specific senses.
Thus the word "plant in the sense of a growing organism is a different
wn:WordSense
to the word "plant" in the sense of a factory.
All Wordnet resources are named with URIs@@ref having a common prefix. Throughout this document, we will refer to this prefix as $WNBASE.
$WNBASE/2-0/ontology
.
The word $$ is named by the URI $WNBASE/word/$$#
.
The synset whose identifier is $$ is named by the URI
$WNBASE/synset/$$#
@@I'm assuming the id's are ok for
inclusion inthe uri - defining escaping rules if not.
A word sense, which is sense number N in the synset with identifier
$$ is named by $WNBASE/sense/$$/N#
@@reconsider in the
light of danbri noting
existing of sense id's in wordnet 2.
RDF version of the schema. @@ fix formatting
@@TBD
@@TBD A listing of the ontology.
This section lists some simple queries, expressed in RDQL, that illustrate features of the design of this Wordnet ontology.
This section sets out the technical requirements to be satisfied by this design for representing Wordnet using semantic web languages. @@Currently incomplete.