Specification of Core Model
Motivation
Ontologies have numerous applications and they represent the conceptual backbone of the Semantic Web. In fact, significant efforts have gone into standardization efforts under the auspices of the W3C to produce „recommendations“ for data and knowledge representation languages, i.e. the Resource Description Framework (RDF) and the Web Ontology Language (OWL). While such ontology languages allow us to define logical theories consisting of ungrounded symbols and corresponding axioms, a grounding in language is crucial in order to render such ontologies for human consumption and thus support meaningful interaction with them by human users. Going further, it seems reasonable to assume that access to the Semantic Web will be to a large extent mediated by language as this is the natural means of expression and communication employed by humans. However, current web-based knowledge representations languages such as OWL and RDF(S) lack the rich linguistic grounding that is required for language-mediated access to ontologies. OWL and RDF(S) rely on a property rdfs:label to capture the relation between a vocabulary element and its (preferred) lexicalization in a given language. This lexicalization in some sense provides a lexical anchor that makes the concept, property, individual etc. understandable to a user. The mechanisms for linguistic grounding available in OWL and RDF(S) can be seen at best as rudimentary. They are far from being able to capture the necessary linguistic and lexical information that NLP applications working with a particular ontology need. Such NLP applications are for example:
- Natural language generation systems that produce coherent discourses verbalizing a set of triples.
- Question Answering systems that interpret user questions with respect to ontologies.
- Text interpretation systems that interpret texts with respect to a given ontological vocabulary, extracting triples with respect to this vocabulary
- Information retrieval systems
Mission and Goal
The mission of the Ontology-Lexicon community group is to:
- Develop models for the representation of lexica (and machine readable dictionaries) relative to ontologies. These lexicon models are intended to represent lexical entries containing information about how ontology elements (classes, properties, individuals etc.) are realized in multiple languages. In addition, the lexical entries contain appropriate linguistic (syntactic, morphological, semantic and pragmatic) information that constrains the usage of the entry.
- Demonstrate the added value of representing lexica on the Semantic Web, in particularly focusing on how the use of linked data principles can allow for the re-use of existing linguistic information from resource such as WordNet.
- Provide best practices for the use of linguistic data categories in combination with lexica.
- Demonstrate that the creation of such lexica in combination with the semantics contained in ontologies can improve the performance of NLP tools.
- Bring together people working on standards for representing linguistic information (syntactic, morphological, semantic and pragmatic) building on existing initiatives, and identifying collaboration tracks for the future.
- Cater for interoperability among existing models to represent and structure linguistic information.
- Demonstrate the added value of applications relying on the use of the combination of lexica and ontologies.
General Requirements on the Model
Five important meta-requirements can be already advanced:
- R1: The actual model will be an OWL ontology, while a specific lexicon instantiating the model will be a plain RDF document.
- R2: (“Multilinguality”): The model should support the specification of the linguistic grounding with respect to any language
- R3: (“Semantics by reference”): The meaning of lexical entries will be specified through a principle we call semantics by reference by which the semantics of a lexical entry with respect to a given ontology will essentially be specified by referencing the URI of the concept or property in question.
- R4: (“Openness”): the lexicon-ontology model will be “open” in two ways; first, it will also be extensible by new constructs as needed, e.g. by a certain application. Second, it will not make unnecessary choices with respect to which linguistic data categories to use, leaving open the possibilties to have very different instantiations of the model.
- R5: (“Reuse of relevant standards”) We will aim to reuse as many standards as possible, in particular lexicon models such as LMF as well as terminology models such as TMF as well as linguistic data categories
Rationale for the design of the model
The main purpose of the lexonto model is to capture the meaning of lexical entries with respect to a given domain ontology in a so called ontology lexicon. Let the ontology lexicon entry contain a set of lexical entries L. Let the ontology introduce a an ontological vocabulary O consisting of predicates of any arity as well as constants. We will refer to such predicates and constants as ontology entities.
The main relation between a lexical entry and a vocabulary elements of the domain ontology is the relation denotes, which is defined as follows:
- Definition
- A lexical entry l in L denotes an ontology entity o in O iff for any situation s in which l is used to refer to r, r is contained in the extension of e. More formally, we have that: forall s situation(s) & refersTo(s,l,r) -> exists e in O s.t. r in ext(e)
The domain of denotes is then clearly a Lexical Entry, while the range of denotes is what we informally call an extensional entity, i.e. an ontological entity that has an extension in some interpretation/model/world.
In OWL manchester syntax, we can axiomatize this as follows:
ObjectProperty: ontolex:denotes
Domain: ontolex:LexicalEntry
Class: ontolex:LexicalEntry
SubClassOf: semio:Expression
Of course, a lexical entry can denote different ontology entities within one ontology as well as across several ontologies.
Here is a picture of the current proposal for the model: