Specification of Use Cases
This document specifies relevant use cases that the ontology-lexicon model is expected to support.
Each use case has an identifier consisting and owners.
A description of a use case consists of
- a general description/motivation,
- a detailed example involving specific ontologies and triples as well as
- a specification of minimal necessary knowledge that is needed to account for the example.
Ontology-based Information Extraction and Ontology Population from Text | |||||||||||
|
IE | ||||||||||
|
Ontology-based Information Extraction and Ontology Population from Text | ||||||||||
|
Philipp Cimiano, Tobias Wunner, Paul Buitelaar | ||||||||||
|
This use cases describes how the ontology-lexicon model can be used as background knowledge in ontology-based information extraction systems. The system is assumed to be ontology-based in the sense that the triples that it extracts from text conform to the vocabulary of the given ontology. | ||||||||||
|
Imagine that our ontology models the gross domestic product as follows:
Assuming we want to populate our example ontology by analysing textual data, we might want to extract the triples below from the following sentence: "The GDP of Germany was $3.306 trillion in 2010.":
The same triples should be extracted from equivalent sentences in other languages, e.g. Das Bruttosozialprodukt von Deutschland betrug in 2010 3.306 Billionen Dollar. (DE) El producto interior bruto de Alemania ascendió a 3.306 trilliones de dólares en el 2010 (ES).
| ||||||||||
|
|
Ontology-based Question Answering | |
|
QA |
|
Ontology-based Question Answering |
|
Philipp Cimiano, Nitish Aggarwal |
|
This use cases describes how the ontology-lexicon model can be used as background knowledge in an ontology-based question answering system that interprets natural language questions with respect to a given ontology. |
|
Imagine a user asking the question "Who painted the Mona Lisa?" to a semantic question answering engine that has indexed all the RDF data available on the Web, in particular DBPedia. DBPedia contains a triple:
<http://dbpedia.org/resource/Mona_Lisa> <http://dbpedia.org/property/artist> "Leonardo Da Vinci” The ontology-lexicon should contain all the relevant lexico-semantic knowledge that is needed to interpret the question above "Who painted the Mona Lisa?” into a SPARQL queries such as the following:
select ?who where { <http://dbpedia.org/resource/Mona_Lisa> <http://dbpedia.org/property/artist> ?who } The following sentences in other languages should be mapped to the same query:
This SPARQL query would retrieve the answer from the data graph above, i.e. "Leonardo Da Vinci”. The lexico-linguistic knowledge necessary to map the above questions into the right SPARQL query includes at least the following:
|
|
|
Natural Language Generation from Triples | ||||||||||||
|
NLG | |||||||||||
|
Natural language generation from triples | |||||||||||
|
Philipp Cimiano, Brian Davis | |||||||||||
|
This use cases describes how the ontology-lexicon model can be used as background knowledge in a system that verbalizes triples, i.e. it generates natural language text from a given set of triples using the vocabulary defined by some ontology. | |||||||||||
|
Imagine we have the following triples given:
Assume further a NLP system should generate the following sentences for the above triples in different languages:
In order to generate such discourses from the above triples, the following lexico-linguistic knowledge is needed (for English and similar for the other languages):
| |||||||||||
|
|
Integration and publishing of legacy language resources | |
|
LLD |
|
Lexical Linked Data |
|
John McCrae, Aldo Gangemi, Armando Stellato |
|
There is a large number of machine-readable lexical resources already available in various format, and structured according to different schemat. However, thse resoruces are not accessible on the Web, and the schemata are both implicit and mutually incompatible. Through a set of good practices proposed by this CG, it will be possible to publish the resources in a homogeneous format (RDF), and link them to the LOD cloud. In addition, the alignment of the resource schemata to an open lexicon ontology designed by this CG (or directly the representation of those resources according to the open lexicon ontology) will enable a deeper and sounder semantic interoperability. |
|
WordNet is a first example, as it is a widely-used, broad-coverage resource for English. It has already been published as linked data in two versions: 2.0 (work of the W3C Task Force on porting WordNet to the Semantic Web), and 3.0 (work of the Free University of Amsterdam). However the schema (ontology) used in (RDF) WordNet is a conservative refactoring of the original WordNet database schema, and is not easily comparable to other resources using different schemata. For example, the class: wn20:Synset in the WordNet ontology can be aligned to some other class in a RDF lexical dataset, e.g. to the class: framenet:Frame. Other lexical resources have been ported to the LOD cloud, e.g. FrameNet (work of STLab at ISTC-CNR), each with its own ontology conservatively based on the original schema. Wiktionary is another resource that could be improved through the use of linked data publishing. The publicly available version of Wiktionary is in MediaWiki (presentation) mark-up and hence is difficult for machines to understand and often inconsistent. There are a number of ongoing attempts to fix this, however there is no consensus on the correct representation of the data and how this should interact with other resources in the linked data cloud (notably WordNet and DBPedia). However, it is clear that Wiktionary is a very rich resource and contains significantly richer information than is in WordNet and hence any existing format would need to be significantly extended to capture all the information in Wiktionary. It's therefore very relevant for interoperability at the ontology and data layers of lexical resources to establish equivalences or similarities between the respective original ontologies, and an open lexicon format. /* For example, it is important to run linking algorithms generating owl:sameAs or skos:closeMatch triples between data of the same or similar type e.g. between a WordNet synset and a FrameNet frame, and not between a WordNet word and a FrameNet frame. In the first case, we could get a (useful) union of documents annotated with wn20Synset:Desire and those annotated with fnFrame:Desiring, while in the second case, we could get a noisy union of documents annotated with wn20Word:desire and fnFrame:Desiring. The noisy case is evident if we consider a philologist's paper on the history of the word desire (annotated with wn20Word:desire), and a biochemistry article on the physiology of desire (annotated with fnFrame:Desiring). */ One possibility (compatible with suggesting a lexicon model) is also to keep existing resources already expressed in RDF, and find a good metamodel to fit their content and make it compatible/comparable. Here's a simple example of a pattern for linking external ling resources to an ontology through a metamodel: <wn20schema:NounSynset rdf:about="wn20instances:synset-entity-noun-1" rdfs:label="entity"> <wn20schema:synsetId>100001740</wn20schema:synsetId> </wn20schema:NounSynset> <rdf:Description rdf:about="wn20schema:Synset"> <rdfs:subClassOf rdf:resource="ml:SemanticDescriptor"/> </rdf:Description> <someOntology:XYZ> <otml:semanticDescriptor rdf:resource="wn20instances:synset-entity-noun-1"> </someOntology:XYZ> where: the prefixes ml and otml indicate two possible subvocabularies for "meta-lexicon" and "ontology to metalexicon linking" ml:SemanticDescriptor indicates a very general concept of "semantic collector" in lexicons (such as synonymy sets, i.e. synsets, in wordnet) otml:semanticDescriptor is a pointer to ml:SemanticDescriptor(s) thus the wordnet synset 100001740 is declared in wn20schema as a NounSynset (and thus a Synset). Synset is an ml:SemanticDescriptor. A given XYZ concept from an ontology is then "decorated" with a reference to the wordnet synset. Obviously, all names are just few-seconds-thoughts... |
|
The requirements for this use case concern the ability of an open lexicon ontology to enable a smooth representation of data from existing LRs. This includes but is not limited to:
The objective here will be to develop a minimal yet appropriate and suitable expressive model that allows to represent various lexical resources in a uniform and principled way and uspports the linking between them. |
Representation of Translations in the Web of Data | |
|
TRANS |
|
Representation of Translations in the Web of Data |
|
Elena Montiel-Ponsoda, John McCrae |
|
This use case describes how the representation mechanisims provided by the lexicon-ontology model can be used to represent translations in the Web of Data taking advantage of the linguistic descriptions associated to ontologies and data sets. |
|
Example 1: Imagine that we have two data sets in the Web of Data about Administrative and Governmental organizations in Europe. One represents the administrative organization of Great Britain and is documented in English, and the other represents the Spanish administrative organization of the country and has labels in Spanish. In each of the ontologies we find the concept of "head of the executive branch of the government". In the English ontology the label describing this concept is "Prime Minister". In the Spanish ontology, the corresponding label is "Presidente del Gobierno". For certain purposes, such as interoperability at a European level, we may want to express that one label is the cultural equivalent translation of the other. This means that although they cannot be considered "exact conceptual equivalents", in certain contextual conditions it may be convenient or adequate to express that one is translation of the other.
Example 2: Now let us imagine that we have one data set in the Web of Data about e-commerce. This could be described according to the GoodRelations ontology. We may want to translate the GoodRelations ontology into Spanish in order to annotate information about e-commerce in Spanish. In this case we also need to represent translation relations even when descriptions in English and Spanish are pointing to the same ontology concepts. For example, we may want to express that the English label "payment methods" is translated as "medios de pago" into Spanish. |
|
|
Benefits of rich Linguistic Descriptions for Ontology Translation | |
|
LDOT |
|
Benefits of rich Linguistic Descriptions for Ontology Translation |
|
Mihael Arcan, Elena Montiel-Ponsoda |
|
This use case describes how rich linguistic descriptions associated to ontology elements can help in the activity of ontology translation or localization.
The type of linguistic descriptions associated to ontology elements can range from simple part-of-speech annotations or terminology variation, to a more deep morphosyntactic analysis of labels or description of syntactic frames (subcategoration), for example. In the following we include several examples of how linguistic descriptions may contribute to obtaining the most adequate translation candidates. |
|
Example 1: It is well known that part-of-speech annotations may help in disambiguating translation candidates. Let us take the example of "book". If "book" is describing an element of the ontology, it can refer to the noun "a book" as a set of written, printed sheets bound together, or to the verb "to book", as to make a room reservation in a hotel. In Spanish, for example, the noun would be translated as "libro" and the verb as "reservar". Apart from the semantic context provided by the ontology, this simple linguistic analysis can already give some hints about the most appropriate translation.
Example 2: If term variation is captured in the linguistic descriptions associated to the ontology, i.e., if several variants (such as orthographical variants, dialectal variants or register variants) are used to describe ontology elements, the chances of getting the correct translation will increase. Let us imagine that we have the concept "headache" in our ontology, defined as "a pain in the region of the head or the neck", and two register variants associated to it: "headache" and "cephalalgia". We have more probabilities of obtaining the right translation for those terms in general and specialized dictionaries. Example 3: A richer analysis of labels can help in the way that extracting subterms of a complex label can improve the translation of labels, which holds the same subterm, i.e. "cost of raw materials, consumables and supplies, and of purchased merchandise" holds a financial subterm stored in the ontology "raw materials, consumables and supplies", which is linked with the German translation "Roh-, Hilfs- und Betriebsstoffe". In this cases, it is possible to match subterms and translate them as a one unit, which gives the proper translations instead of splitting the subterm into even smaller pieces. |
|
|
Ontology-based Machine Translation | |
|
OMT |
|
Ontology-based Machine Translation |
|
John McCrae, Elena Montiel Ponsoda |
|
Similarly to the above use case (LDOT), the use of ontology semantics should help in the translation of text documents. The usage of ontology-lexicon allows for elements in text to be identified and extra information about their semantic content to be obtained, aiding principally in the problem of deducing valid ontology candidates. In addition the use of more sophisticated linguistic description allows the syntax of the term to be better understood by means of ontological constraints |
|
Example 1: An ontology-lexicon can be used to disambiguate terms in text based on their syntactic information and semantic context. For example the sentence "Push the rudder to the left to bank the airplane" has a highly ambiguous term "to bank". Firstly this can be disambiguated by applying part-of-speech tagging to deduce that this entry is a verb, hence ruling out translations such as "Bank" (financial institute) or "Ufer" (river bank). Further, looking at the surrounding words ("rudder", "airplane"), indicates that word is likely to be associated to the domain of air travel, indicating the correct translation should be related. For reference, Google Translate generated "Schieben Sie das Ruder nach links, um das Flugzeug Bank", where "Bank" should be "zu neigen". Example 2: Semantic role labeling has been shown by many authors to be beneficial for machine translation. This involves identifying the subcategorization of verbs and its arguments. For this a semantic role lexicon is required. If such a lexicon were bound to an ontology then it would be possible to constrain the possible applications of a frame to not just the syntactic role of the arguments but also the semantic type of the arguments. For example the verb "to know" is translated into German as either "kennen" or "wissen", the former generally when the object is a person, and the latter when the object is a fact. Example 3: Reference resolution is the process of finding the referent of anaphors such as pronouns. Most languages use pronouns in very different manner, for example English uses pronouns differentiated by gender and number, but uses the neutral gender for all inanimate objects, in contrast German uses gendered pronouns for inanimate objects, based on the gender of the referent. Furthermore, some languages do not use gender for pronouns (e.g., Turkish) or do not use pronouns (e.g., zero-anaphor in Japanese). As the referent does not necessarily occur close to the anaphor in text, deep processing of the document is required and the selection of the correct translation requires knowledge about the grammatical gender, number and semantic class of the referent. |
|
|
Semantic Search | |||||||||||||||||||
|
SS | ||||||||||||||||||
|
Semantic search | ||||||||||||||||||
|
Nitish Aggarwal, Philipp Cimiano | ||||||||||||||||||
|
This use case is concerned with how the ontology-lexicon model can support the monolingual and cross-lingual semantic search problem. Semantics of queries and documents can be interpreted by annotating them with domain ontologies, enriched with lexical information provided by the lemon model. The semantic annotation enables cross-lingual matching of a query in one language to relevant documents or document segments in other languages. | ||||||||||||||||||
|
Example. 1: A sophisticated cross-lingual semantic similarity assessment can help to match a query in one language to available information in other languages. E.g. the term "Umlaufvermögen" refers to the term "current assets" in English and contains two lexical entries as follows: 1. Umlauf/JJ (modifier) - circulate, orbital, fluid 2. Vermögen/NN (head) - assets, wealth, money The meaning obtained from the above lexical entries refers to other terms "Fluid assets", "Liquid assets" and "Circulating money" as they have semantically similar head and modifier, in English. This example also shows that the terms "Current assets" "Fluid assets" and "Liquid assets" refer to the same concepts in English. This semantic similarity can help to interpret a given query "Current assets of Vestas in 2011" which is described below. Imagine that we have a finance ontology as follows:
The concept appeared in given query "current assets" is semantically similar to ontology concept "liquid assets". Therefore this query is interpreted as searching for a monetary value tagged with "liquid assets" in a report, which is tagged for company name "Vestas" and year "2011"
| ||||||||||||||||||
|
|
Semantic Tagging of text | ||||||||||||
|
CLWS | |||||||||||
|
Cross-lingual Web Service Retrieval | |||||||||||
|
Philipp Cimiano, Maria Maleshkova (Open University) | |||||||||||
|
This use case is a specialization of the above use case on cross-lingual information retrieval. This use case describes how the ontology-lexicon model could support the retrieval of web services across languages. | |||||||||||
|
Assume that a web service is described by the following RDF description:
| |||||||||||
|
Support to Automatic Ontology Mediation through exploitation of Linguistic Metadata | |
|
SAOM |
|
Support to Automatic Ontology Mediation through exploitation of Linguistic Metadata |
|
Armando Stellato |
|
Many ontology mapping/matching tools available from the ontology research communities need to be tuned and configured according to the specific scenario in which they are employed: the modeling language adopted (RDFS, OWL, SKOS), the language(s) in which labels are expressed (or if localnames of URIs are the sole source available), the ontology constructs used to provide these labels etc.. (see [1] and [2] for some application scenarios).
Though many of these tools are claimed to be usable (with little or no adaptation at all) in scenarios where mappings are to be computed on demand, actually the context of a mediation does not allow any fine tuning of the matching processes as ontologies are not both locally available (e.g. wrapped by resource agents on the Web, available as linked data etc..) and there’s no need to align the whole ontologies, only to "negotiate the meaning” of specific concepts involved in a remote query. This results, in practice, in:
A vocabulary supporting OM should provide:
|
|
I’m trying to depict in an informal "story" a dialogue between two ontology agents: Merlin and Djinni, which benefit of the above descriptors for their ontologies. Text between square brackets explains what happens behind the scenes Merlin: Hi, I’m Merlin the Wizard. I see you are a Genie, so i suppose we can talk about magic* Djinni: Oh yes, I like talking about magic. My reference ontology for magic is: Xxxxx/magic.owl Merlin: Erm…sorry, mine is: YYYYY/mana.owl Djinni: Well, ok, what’s (are) your language(s)? Merlin: actually I’m a good english speaker [ontology natively filled with english terms] Djinni: Mmm…I just speak arabian, and I’m able to express some of my ideas in a very simple english [Freelang (http://www.freelang.net/) bilingual vocabulary, automatic translation with 23% coverage of ontology concepts] Merlin: That’s much better than nothing, I can summon a familiar of mine who is a good English speaker [a Wordnet 3.0 resource agent] and I’ve just found on the yellow pages an english/arabian translator [Dict (http://www.dict.org) english/arabian dictionary Semantic Web Service], maybe they can help us a bit… Another example is given by the possibility a lexical resource as an interlingua. Given a dialogue such as the one above, if the two agents discover they both have references to WordNet (also, checked on a quantitative ground), they can use synsets as a less ambiguous language (but still, not a formal constraint) to match their concepts. |
|
The dialogue above could represent a real (formal) dialogue between two distinct agents, or an inspection done by a single agent having complete access to the ontology data (as of Open Linked Data). In both cases, it implies the availability of explicit knowledge about the linguistic expressiveness of mediated ontologies. Potentially useful information is:
|
|
All of the above info are not difficult to be produced (i.e. it is not "unrealistic" to think of people adding this metadata), as these can be generated easily and automatically by dedicated tools. So, what we would need here is to reuse core-elements needed to describe linguistic-resources at a meta-level (that is, able to wrap different lexical resources) thus coming from requirments of the LLD use-case, and add specific elements of the lemon vocabulary to describe the ontology-lexicon interaction on a qualitative and quantitative ground, thus:
|
Lexicon driven Ontology Evolution | ||||||||||
|
LOE | |||||||||
|
Lexicon driven Ontology Evolution | |||||||||
|
Dagmar Gromann, Thierry Declerck | |||||||||
|
Large-scale ontologies inevitably change and evolve, which renders it necessary to propagate these changes to dependent elements. This use case focuses on the evolution of elements of an existing domain ontology itself rather than instance data by applying bootstrapping and ontology learning methods. The ontology-lexicon model augments both ontology learning and ontology evolution by contributing rich linguistic resources to the process of obtaining valid ontology concepts and relations from text.
Either a changing view of the world or an altered usage situation may necessitate change in the ontology. Furthermore, an elaborate and rich lexical description of tokens occurring in natural language text might call for changing elements of the ontology. On the event of modified structures in the domain, the ontology has to evolve to reflect these changes and its consistency needs to be maintained in all its parts. | |||||||||
|
Example1: The Industrial Classification Benchmark ontology [3] contains the subsequent concept
On the basis of the following sentence, new ontology nodes can be derived as subclass of “Financial Services”. These strategies had proved resilient for asset managers, including hedge funds. Hearst Pattern: [NP0 asset_NN managers_NNS], [VBG including] [NP1 hedge_NN fund_NN] This allows us to extract the following triple from the sentence above and its translations, and extend the ICB ontology by one sub-concept “hedge fund”:
This evolution is further underpinned by the fact that most companies classified as ICB 8771 “Asset Managers“ explicitly refer to hedge funds in their company profiles, such as the Bank of New York Mellon Corp. or the State Street Corp. According to a recent article in The Economist, 67% of hedge funds were below their high-water marks at the end of 2011. Thus, some companies in the industry decided to amend “the lexicon of hedge funds” by changing their title to “alternative asset managers” and turning form “absolute” to “relative returns”, as incisively depicted by a letter to investors from Zilch Capital, LLC . “Hedge fund” and “alternative asset manager” point to the same ontology concept, but represent different aspects of the concept, i.e. differ semantically. The ontology-lexicon model represents this term variation by linking each term to different lexical senses, which will point to the same ontology entity. :hedge_fund_lemon:canonicalForm [lemon:writtenRep "hedge fund"@en]; lemon:otherForm [lemon:writtenRep "hedge funds"@en]; lemon:sense [lemon:reference ontology:HedgeFund] . lemon:otherForm [lemon:writtenRep "hedge funds"@en]; :alternative_asset_manager lemon:canonicalForm [lemon:writtenRep "alternative asset manager"@en]; lemon:otherForm [lemon:writtenRep "alternative asset managers"@en]; lemon:sense [lemon:reference ontology:HedgeFund]. | |||||||||
|
|
Ontology alignment | |
|
OA |
|
Ontology alignment |
|
Jorge Gracia, John McCrae |
|
Ontology alignment or ontology matching is the task of discovering correspondences between entities in different ontologies. It forms the basis of several other tasks, such as information integration, ontology evolution, semantic-based query answering, etc. There is a range of techniques that can be used for ontology matching such as graph-based comparison, common extension comparison, terminology similarity, etc. However these techniques ground ultimately on comparisons of the lexical information contained in the ontology and as such rely on NLP processing tools.
External lexicons associated to ontologies could be used as a sort of interlingua and as such exploited to better discover correspondences between ontology entities on the basis of their lexical commonalities. |
|
Some situations in which richer external lexical information would help OA:
1) Ontology alignment often consists of matching entities based on labels, this is often based on term variation such as
Such term variations could be represented in the ontology lexica and used to get a better lexical mapping between the compared concepts 2) If links from different ontology entities to common lexical entries (in external lexicons) can be discovered, lexical similarity between the entities could be inferred.
3) Even if the compared ontologies point to different lexicons, the associated lexical descriptions contained in such lexicons can be compared (in a way that is richer than traditional label-based comparisons, as there are more features than "label" to consider) and lexical similarities inferred.
|
|
|
Ontology transformation enhanced by lexical information | |
|
OT |
|
Ontology transformation |
|
Ondřej Zamazal, Vojtěch Svátek |
|
With the help of lexical information, different applications can better work with alternative modelling styles employed in an ontology. Ontology transformation [4] in this context means modification of an ontology in terms of its structural and naming aspects. Lexical information would help detect ontology fragments to be transformed and generate the new variants of those ontology fragments. |
|
Ontology transformation basically consists of three steps: detection of ontology fragments to be transformed, generation of transformation instructions, and transformation as such. Ontology fragments to be transformed are detected using the structural aspect (employed axioms) and naming aspect (names of entities). Regarding the names of entities (local fragment of an IRI), they are usually very short and their analysis can hardly reveal much of the underlying lexical background. Using labels can only slightly improve the situation. Therefore, explicit representation of lexical characteristics of entity names could contribute to resolution of this 'detection bottleneck' and consequently increase the quality of the transformation result, see example. |
|
Currently, NLP-based techniques applied on local IRI fragments or labels of entities often label fail due to lack of material for proper analysis and conclusion. |
|
Let us consider that we want to change (here, unfold) the representation of a concept by a named entity to its alternative modelling style, i.e. class A will be replaced by the definition 'p some B'. Particularly, we would like to transform the 'AcceptedPaper' named entity into 'Paper and (hasDecision some Acceptance)'. In order to properly perform such transformation, we would need to get the following lexical information:
|
|
Basic syntactic knowledge (e.g., adjectival inflection for modification) Patterns matching syntactical constructs to semantic constructs |
[1] Sure, Y., Corcho, O., Euzenat J., and Hughes, T. eds.: Proceedings of the 3rd Evaluation of Ontology-based tools (EON), located at the 3nd International Semantic Web Conference ISWC 2004, Hiroshima, Japan, November 2004
[2] http://ontologymatching.org/
[3] This ontology is a transformation of the ICB taxonomy (http://www.icbenchmark.com/) into OWL, performed by the partner DFKI in the project Monnet (http://www.monnet-project.eu/).
Note 1. This usecase much relies on outcome from the LLD scenario proposed by John McCrae. I was originally publishing two use cases, when I noticed the one from John had an almost complete overlap with my first one so I preferred to publish only SAOM and put a dependency on the already existing LLD. To John: I didn’t add anything to the description of LLD to avoid stating things which could be not in your original intention. However, whenever you feel that some of the things described in SAOM are perfectly aligned to your idea, feel free to copy them in LLD and leave in SAOM only those things peculiar to ontology mapping support + links to LLD