Wordnet in RDFS and OWL

Editor's Draft

This version:: $Id: wordnet-sw-20040713.html,v 1.3 2004/08/05 15:45:45 bmcbride Exp $
Latest version:: ...
Previous versions:: This is the first public version
Editor:: @@TBA
Contributers:: @@TBA

This document describes an RDF Schema and OWL ontology for representing Wordnet.

Status of this Document

The following section describes the intended status of this document if and when it is pubished.

Sections where further work are marked within the document with @@ and a comment. Open issues and comment can be found in Appendix F

Publication as a draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Introduction

The Wordnet @@ref lexicon is proving to be an useful resource for semantic web developers. This document presents an RDF/OWL representation of the entire structure of Wordnet. By doing so, we allow Wordnet data to be accessed via RDF APIs and query languages, and to be mixed with non-Wordnet data, as well as with other lexically-oriented material, such as extensions to, and derivatives of, Wordnet and Wordnet-tagged corpuses.

A related but distinct activity would be to describe the use of Wordnet as a basis for RDF/OWL class and/or property hierarchy. Wordnet's noun term (hypernym) hierarchy captures "an X is a kind of Y" relationships between English category terms based on conventional usage. While there are several projects working in this area, it is not a task we currently address in this document.

This current document does not explore the issues raised by the mapping of Wordnet structures into RDF (eg. noun terms and/or synsets into classes). Future revisions of this document, or companion documents, may address some of the issues this raises, such as the different assumptions underlying lexical databases when contrasted with formal ontologies. Here we concentrate on reflecting into RDF/XML the core structures and content of Wordnet, without consideration for mapping those notions into RDF's own notions of classes, properties and instances.

This approach echoes that of SKOS @@ref , which reflects into RDF the broader/narrower relationships used by thesauri, without requiring that each thesaurus be re-engineered as an RDF/OWL class hierarchy. Unlike SKOS, the structuring vocabulary used here draws directly from the conceptual framework underpinning Wordnet, allowing for concepts such as 'antonym' to be used to relate concepts/synsets. It may be possible for future versions of this document and SKOS to share more common structure, since the structuring vocabularies address similar (yet distinct) problems.

The Structure of Wordnet

This section describes the structure of the RDFS and OWL representation of Wordnet described here. For a full explanation of Wordnet terms and concepts, the reader should refer to the Wordnet documentation@@ref.

In Wordnet, a word form@@ref is closest to the commonsense meaning of the term word. It is typically a sequence of characters such as "cat", or "dog", or "chat". The same word form can have different meanings in different langauges, for example, in English, the word form "chat" means to converse informally, whilst in French it means a cat. A word represents a word form in a language. The French word "chat" is a different word to the English word "chat", though they both have the same word form "chat". The same word can be used in different senses, for example the word ""dog" (in English) can mean a kind of animal or to follow. A word sense represents a word used in particular sense.

As can be seen from figure 1, words are represented by resources of type wn:Word@@ref. The properties wn:hasWordForm and wn:hasLanguage(@@check these names) relate a word to its word form and language respectively. The property wn:hasSense relates words to their senses. (@@ is there some ordering information we are losing here?)

A central concept in Wordnet is the synset@@ref. A synset represents a set of synonyms, that is word senses with similar meanings. Synsets may also be considered to represent concepts in a thesaurus or ontology, but such considerations are beyond the scope of this document at this time. For our present purposes, synsets are considered to be collections of word senses@@ref.

Synsets are represented by resources of type wn:SynSet. The properties wn:inSynSet and wn:hasWordSense relate synsets and word senses. A wn:WordSense resource can be thought of as representing a (word, synset) pair.

Wordnet defines semantic relations between resources in this basic structure that represent linguistic and conceptual relationships between the terms. For example, the wn:hasHypernym and hasHyponym properties represent hypernym (@@explain) and hyponym (@@explain) relations between synsets.

Wordnet Classes and Properties

wn:Word

wn:Word is the class of Wordnet words. A wn:Word is a word form in a specific language. The English word "chat" is different to the French word "chat".

wn:WordSense

wn:WordSense is a class of words with specific senses. Thus the word "plant in the sense of a growing organism is a different wn:WordSense to the word "plant" in the sense of a factory.

Naming Wordnet Resources

All Wordnet resources are named with URIs@@ref having a common prefix. Throughout this document, we will refer to this prefix as $WNBASE.All terms and concepts used in the Wordnet ontology are named in the namespace $WNBASE/2-0/ontology.

The synset whose identifier is $$ is named by the URI $WNBASE/synset/$$# @@I'm assuming the id's are ok for inclusion inthe uri - defining escaping rules if not.

A word sense, which is sense number N in the synset with identifier $$ is named by $WNBASE/sense/$$/N# @@reconsider in the light of danbri noting existing of sense id's in wordnet 2.

References

Appendix A - Wordnet Ontology

Appendex B - Glossary

Appendix C - Example Use Case

Appendix D - Test Cases

This section lists some simple queries, expressed in RDQL, that illustrate features of the design of this Wordnet ontology.

Appendix E - Requirements

This section sets out the technical requirements to be satisfied by this design for representing Wordnet using semantic web languages. @@Currently incomplete.