Section 0. Contact and confidentiality
Contact e-mail:
Irene Celino <irene DOT celino AT cefriel DOT it> (main contact)
Emanuele Della Valle <emanuele DOT dellavalle AT cefriel DOT it>
Dario Cerizza <dario DOT cerrizza AT cefriel DOT it>
Andrea Turati <andrea DOT turati AT cefriel DOT it>
Do you mind your use case being made public on the working group website and documents?
We'd be very glad if you'll publish our use case.
Section 1. Application
In this section we ask you to provide some information about the application for which the vocabulary(ies) and or vocabulary mappings are being used. Please note:
- If your use case does not involve any specific application, but consists rather in the description of a specific vocabulary, skip straight to Section 2.
- If your application makes use of links between different vocabularies, do not forget to fill in Section 3!
1.1. What is the title of the application?
Squiggle: an application framework for model-driven development of real-world Semantic Search Engines
1.2. What is the general purpose of the application?
Squiggle is a framework to support the development of domain-specific search engines that exploit the semantics of domain ontologies to improve the search functionalities. It supports both the conceptual indexing phase and the semantic search (the runtime interaction). A more detailed description can be found at http://squiggle.cefriel.it, where you can also find the links to some running search engines: Squiggle Music to find songs and artists (at http://squiggle.cefriel.it/music) and Squiggle Ski to find images of alpine skiers (at http://squiggle.cefriel.it/ski).
- What services does it provide to the end-user?
1.3. Provide some examples of the functionality of the application. Try to illustrate all of the functionalities in which the vocabulary(ies) and/or vocabulary mappings are involved.
- Syntactic search: when the user submits a textual query, Squiggle performs a traditional syntactic search over the index (of course, this is not an innovative feature, but we added it for compliance with traditional search engines) - Semantic Interpretation of user query: besides the syntactic search, in the meantime, Squiggle analyzes the user query in order to identify possible meanings of the request. This activity is performed by "comparing" the content of the query with the labels of the concepts contained in the domain ontology; Squiggle compares both against the preferred labels (skos:prefLabel) the alternative labels (skos:altLabel) and misspelled labels (skos:hiddenLabel), and, when it identifies a matching result, Squiggle displays its preferred label in a lateral box (a "Did you mean...?" disambiguation box), using the language of the user by exploiting the support of xml:lang in RDF. The user can therefore disambiguate between the identified meanings of his/her query by selecting the one(s) that fit the original request. - Semantic Search: when the user operates the disambiguation on the suggested meanings of the previous step, Squiggle performs a semantic search over its indexes and returns all the results that, during the conceptual indexing phase, were indexed against the selected concept(s), disregarding the possible syntactic variants of the textual annotations. - Semantic Suggestions: after the user disambiguation, the meaning of the query, in terms of concepts from the domain ontology, is specified; therefore, Squiggle is able to exploit the knowledge about the domain, in terms of relationships between the identified concepts and other concepts within the ontology, and consequently suggest other "meanings" to the user for expanding his/her search to related contents. For example, Squiggle is natively able to exploit some SKOS primitives (e.g., skos:related, skos:broader, skos:narrower, skos:relatedPartOf, etc.) and can be configured to exploit domain-specific relations.
1.4. What is the architecture of the application?
- What are the main components?
- Are the components and/or the data distributed across a network, or across the Web?
Squiggle is composed of two main parts: the Conceptual Indexer, which takes as input the contents to be indexed and the domain ontology and produces as output a set of indexes, and the Semantic Searcher, which queries the indexes to return matching results and semantic suggestions in response to the user. There are two kinds of indexes: the syntactic ones that are queried for textual matching (based on Apache Lucene) and the semantic ones that are queried for ontological matching (based on Sesame). In the running systems, the components are on a single server, however there's no technical reason to avoid the distribution of some of the components. Data to be indexed are obviously distributed over the network, like for any other search engine, as well as it is possible to distribute the ontologies used to annotate and describe those data.Of course, the access to the search interface is available on the Web (try Squiggle Music on http://squiggle.cefriel.it/music and Squiggle Ski on http://squiggle.cefriel.it/ski).
1.5. Briefly describe any special strategy involved in the processing of user actions, e.g. query expansion using the vocabulary structure.
As explained before, the vocabulary is used in the semantic interpretation of the query (by accessing to all the labels in the knowledge base) and during the semantic suggestions (by following some relationships between the concepts).
1.6. Are the functionalities associated with the controlled vocabulary(ies) integrated in any way with functionalities provided by other means? (For example, search and browse using a structured vocabulary might be integrated with free-text searching and/or some sort of social bookmarking or recommender system.)
As explained before, the final user is provided with an interface he/she is already accustomed to, i.e. a textual search engine; however, the semantically-enriched functionalities are hidden "behind the scenes" and are employed to give the user the value added of the semantic searching.
1.7. Any additional information, references and/or hyperlinks.
Squiggle is a framework for building semantic search engines. Two domain specific search engines were built on top of Squiggle framework and are available on-line: - Squiggle Music http://squiggle.cefriel.it/music - Squiggle Ski on http://squiggle.cefriel.it/ski Some publications about Squiggle are available on the web at http://swa.cefriel.it/Publications#squiggle-pub.
Section 2. Vocabulary(ies)
In this section we ask you to provide some information about the vocabulary or vocabularies you would like to be able to represent using SKOS. Please note:
- If you have multiple vocabularies to describe, you may repeat this section for each one individually or you may provide a single description that encompasses all of your vocabularies.
- If your use case describes a generic application of one or more vocabularies and/or vocabulary mappings, you may skip this section.
- If your vocabulary case contains cross-vocabulary links (between the vocabularies you presented or to external vocabularies), please fill in section 3!
2.1. What is the title of the vocabulary? If you're describing multiple vocabularies, please provide as many titles as you can.
The Squiggle framework uses the SKOS vocabulary. Squiggle Music uses a Music ontology. Squiggle Ski uses a Ski ontology.
2.2. Briefly describe the general characteristics of the vocabulary, e.g. scope, size...
The Music ontology contains more that 2 millions triples in RDF/OWL. The knowledge base, derived by freely accessible sources like MusicBrainz (http://www.musicbrainz.org) and MusicMoz (http://www.musicmoz.org), describes artists and bands, songs, albums, music genres, etc. The Ski ontology contains more that 2000 triples in RDF/OWL. The knowledge base was derived by information of the International Ski Federation (http://www.fis-ski.com/) about athletes, disciplines, races, podiums, etc.
2.3. In which language(s) is the vocabulary provided?
- In the case of partial translations, how complete are these?
The Music ontology is not multilingual (the names of artists and songs are not "translatable"). The Ski ontology contains the name of the disciplines in 7 languages (English, Italian, German, French, Swedish, Norwegian and Finnish.)
2.4. Please provide below some extracts from the vocabulary. Use the layout or presentation format that you would normally provide for the users of the vocabulary. Please ensure that the extracts you provide illustrate all of the features of the vocabulary.
Music ontology: - Artists and Bands: Beatles, John Lennon, Red Hot Chili Peppers, etc. - Songs: "All you need is love", "Imagine", "Otherside", etc. - Music genres (the arrow below means "is broader than"): Rock --> Heavy Metal --> Death Metal World, Celtic, Pop --> Celtic Pop Ski ontology: - Athletes: Giorgio Rocca, Hermann Maier, Benjamin Raich, etc. - Disciplines: Slalom, Giant Slalom, Downhill, etc.
2.5. Describe the structure of the vocabulary.
- What are the main building blocks?
- What types of relationship are used? If you can, provide examples by referring to the extracts given in paragraph 2.4.
Music ontology: - The music genres are related via skos:broader and skos:narrower relations Rock skos:broader HeavyMetal - The artists can be reciprocally connected Queen skos:related PeterGabrieland the bands can be connected to their components JohnLennon skos:relatedPartOf Beatles- The artists can have multiple labels Beatles skos:prefLabel "The Beatles" Beatles skos:altLabel "Fab four" Beatles skos:hiddenLabel "Beetles" - The songs are connected to the performing artists Imagine isPerformedBy JohnLennon JohnLennon performs Imagine performs rdfs:subPropertyOf skos:related Ski ontology: - Athletes are related to the disciplines they practice GiorgioRocca practice GiantSlalom practice rdfs:subPropertyOf skos:related - Disciplines have different names in different languages GiantSlalom skos:prefLabel "Giant Slalom" (English) GiantSlalom skos:altLabel "Slalom Gigante" (Italian) GiantSlalom skos:altLabel "Riesenslalom" (German)
2.6. Is a machine-readable representation of the vocabulary already available (e.g. as an XML document)? If so, we would be grateful if you could provide some example data or point us to a hyperlink.
Yes, all the data were derived from available knowledge (as explained in section 2.2) and rendered in RDF/OWL using some of the SKOS primitives (as explained in section 2.5). The ontologies are not publicly available on the Web, hereafter some sample triples. Music ontology: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:skos="http://www.w3.org/2004/02/skos/core#" xmlns:m="URN:it:cefriel:music#" <rdf:Description rdf:about="URN:it:cefriel:music#artist-8bfac288-ccc5-448d-9573-c33ea2aa= 5c30"> <skos:prefLabel>Red Hot Chili Peppers</skos:prefLabel> <skos:altLabel>RHCP</skos:altLabel> <skos:altLabel>The Red Hot Chili Peppers</skos:altLabel> <skos:hiddenLabel>The Red Hot Chilli Peppers</skos:hiddenLabel> <skos:hiddenLabel>Red Hot Chilli Peppers</skos:hiddenLabel> <skos:hiddenLabel>Red Hot Chilly Peppers</skos:hiddenLabel> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2150036357"/> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2147564081"/> <m:hasStyle rdf:resource="URN:it:cefriel:music#style-2149982882"/> <m:performs> <rdf:Description rdf:about="URN:it:cefriel:music#song-1599985063"> <skos:prefLabel>Californication</skos:prefLabel> </rdf:Description> </m:performs> <m:performs rdf:resource="URN:it:cefriel:music#song-0432808595"/> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2149982882"> <skos:prefLabel>Punk</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2147564081"> <skos:prefLabel>Pop</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#style-2150036357"> <skos:prefLabel>Rock</skos:prefLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:music#song-0432808595"> <skos:prefLabel>Otherside</skos:prefLabel> </rdf:Description></rdf:RDF> Ski ontology: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:ski="URN:it:cefriel:ski#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:skos="http://www.w3.org/2004/02/skos/core#"> <rdf:Description rdf:about="URN:it:cefriel:ski#Athlete"> <skos:prefLabel>Athlete</skos:prefLabel> <skos:altLabel>Atleta</skos:altLabel> </rdf:Description> <ski:Athlete rdf:about="URN:it:cefriel:ski#athlete-1298393383"> <ski:practice rdf:resource="URN:it:cefriel:ski#Slalom"/> <ski:practice rdf:resource="URN:it:cefriel:ski#Combined"/> <skos:prefLabel>ROCCA Giorgio</skos:prefLabel> </ski:Athlete> <rdf:Description rdf:about="URN:it:cefriel:ski#Slalom"> <skos:altLabel>Slalom Speciale</skos:altLabel> <skos:prefLabel>Slalom</skos:prefLabel> <skos:altLabel>Slal=E5m</skos:altLabel> <skos:altLabel>SL</skos:altLabel> <skos:altLabel>Speciale</skos:altLabel> </rdf:Description> <rdf:Description rdf:about="URN:it:cefriel:ski#Combined"> <skos:altLabel>Combinata</skos:altLabel> <skos:prefLabel>Combined</skos:prefLabel> </rdf:Description> </rdf:RDF>
2.7. Are any software applications used to create and/or maintain the vocabulary?
- Are there any features which these software applications currently lack which are required by your use case?
No, the vocabulary maintenance is performed through an RDF/SKOS editor.
2.8. If a database application is used to store and/or manage the vocabulary, how is the database structured? Illustration by means of some table sample is welcome.
We use Sesame repositories with a MySQL backend to store the knowledge bases in RDF format, using Sesame pre-defined structures (therefore we didn't need to define any table structure).
2.9. Were any published standards, textbooks or written guidelines followed during the design and construction of the vocabulary?
- Did you decide to diverge from their recommendations in any way, and if so, how and why?
In the modeling of our SKOS-based ontologies, we made use of the "Quick Guide to Publishing a Thesaurus on the Semantic Web" (http://www.w3.org/TR/swbp-thesaurus-pubguide/) and the "SKOS Core Guide" (http://www.w3.org/TR/swbp-skos-core-guide/).
2.10. How are changes to the vocabulary managed?
The vocabulary maintenance is performed manually.
2.11. Any additional information, references and/or hyperlinks.
The sources of information to build the knowledge bases are: - MusicBrainz (http://www.musicbrainz.org) and MusicMoz (http://www.musicmoz.org) for the Music ontology; - the website of the International Ski Federation (http://www.fis-ski.com/) for the Ski ontology.
Section 3. Vocabulary Mappings
In this section we ask you to provide some information about the mappings or links between vocabularies you would like to be able to represent using SKOS. Please note:
- If your use case does not involve vocabulary mappings or links, you may skip this section!
3.1. Which vocabularies are you linking/mapping from/to?
We are mainly linking to SKOS and other common vocabularies like Dublin Core.
3.2. Please provide below some extracts from the mappings or links between the vocabularies. Use the layout or presentation format that you would normally provide for the users of the mappings. Please ensure that the examples you provide illustrate all of the different types of mapping or link.
The use of the vocabularies is explained in section 2.5, since we use them as building blocks to build the domain-specific ontologies.
3.3. Describe the different types of mapping used, with reference to the examples given in paragraph 3.2.
We use hyperonymy/hyponymy, meronymy/holonymy (part-of relation), multiple wordings (homonymy/pseudonymy/synonymy) and generic semantic relationship (when two items are "related").
3.4. Any additional information, references and/or hyperlinks.
See http://squiggle.cefriel.it for more information.Details about Squiggle can be also found in some publications about Squiggle, available on-line at http://swa.cefriel.it/Publications#squiggle-pub.