Questions (and Answers) on the Semantic Web
Munchen, Germany, 7 October 2005
Ivan Herman, W3C
Slides of the presentation held at the Semantic Web Days, on the 7th of
October 2005, in Munich, Germany. The event is
coordinated by the European Networks of Excellence REWERSE
(Reasoning on the Web with Rules and Semantics) and
Knowledge Web (Realizing the Semantic Web).
You Know Everything On Semantic Web Already, So…
Questions?
Is the Semantic Web AI on the Web?
The Semantic Web is not AI…
- RDF and OWL are relatively simple things (compared to AI, that is…)
- They offer:
- a simple way to express and store metadata
- a way to “structure” and characterize the terms
- means to make some inference within a restricted framework
- and that is it!
- One goal in SW is to keep things relatively simple and not necessarily seek absolute completeness (the famous 80/20 rule…)
Some things may come to the SW
- Application dependent rules, e.g.,
- the “uncle” relationship: ∀x,z: ((∃y: (y parent x) ∧ (y brother z)) ⇒ (z uncle x))
- Fuzzy and/or probabilisitic logic
- …
But AI is more…
- There are things that are not part of the SW (and will not in near future):
- associative thinking
- spatial reasoning
- recognition of images, text content, gestures, …
- complex decision procedures (like Big Blue…)
- drawing conclusions from incomplete and/or context dependent
information
- etc.
- Just as Prolog is not AI but merely a useful tool for it, SW might be just a good tool for AI
Isn’t the RDF Model (and RDF/XML) way too complex?
(just look at RDF/XML…)
RDF is a graph!
- An (s,p,o) triple can be viewed as a labelled edge in a graph
- i.e., a set of RDF statements is a directed, labelled graph
- both “objects” and “subjects” are the graph nodes
- “properties” are the edges
- One should “think” in terms of graphs, RDF/XML is only a tool for practical usage!
- RDF authoring tools often work with graphs, too (XML is done “behind the scenes”)
- If one thinks in graphs, things become simple!
RDF/XML has its Problems
- RDF/XML was developed in the “prehistory” of XML
- e.g., even namespaces did not exist!
- Coordination was not perfect, leading to problems
- the syntax cannot be checked with XML DTD-s
- XML Schemas are also a problem
- encoding is verbose and complex (simplifications lead to
confusions…)
- but there is too much legacy code
- Don’t be influenced (and set back…) by the XML format
- the important point is the model, XML is just syntax
- other “serialization” methods may come to the fore
Other Encoding Examples…
- Turtle, n3, N-triples (variants of one another):
:subject :pred_1 [:pred_2 :object_1; :pred_3 :object_2; ]
<triple>
<subject uri="..."/>
<predicate uri="..."/>
<object>A Literal</object>
</triple>
- OWL-DL “Abstract Syntax”:
Class(animate)
Class(animateMotion)
Class(animationEntity complete
unionOf(animate animateMotion …)
)
- Again: these are all just syntactic sugar!
Where is the “Web” in SW?
The “Web” is in the URI-s!
- On the SW, resources are identified by URI-s, e.g.:
- http://www.ivan-herman.net
- ftp://ftp.cwi.nl
- mailto:ivan@w3.org
- tel:+31641044153
- …
- Anybody can create metadata on any resource on the Web
-
URI-s ground RDF into the Web (modulo the problems listed by Massimo yesterday…)
Merging
- URI-s make it easy to merge and share metadata and
ontologies on the Web
- one of the most important features of RDF and OWL
- merging makes in easy to connect communities
- OWL brought sharing and distribution
to the knowledge representation community
- it has facilities built into the language like:
owl:sameAs
, owl:equivalentProperty
,
owl:equivalentClass
, … (all using URI-s)
- it can import other ontologies using URI-s via
owl:import
-
one of the features that makes OWL-DL different from other DL
dialects
Another Web Aspect: Open vs. Closed Worlds
- A source of difference between the Semantic Web and other systems
- closed World: “if something cannot be proven, it is false”
- open World: “if something cannot be proven, we do not know”
- i.e., the notion of “truth” is different
- Logic/expert systems very often refer to the former; the Semantic Web is based on the latter
- this makes more sense on a Web scale…
Why should I use RDF?
(Couldn’t I simply use XML with XML Schema instead?)
(or: Couldn’t I simply use a relational database instead?)
It Depends…
- XML’s model is
- a tree, i.e., a strong hierarchy
- applications may rely on hierarchy position (e.g.,
li
in HTML)
- relatively simple syntax and structure
- not easy to combine trees
- RDF’s model is
- a loose collections of relations
- applications may do “database”-like search
- not easy to recover hierarchy
- easy to combine relations in one big collection (great for the integration of heterogeneous information)
RDF’s Force is its Flexibility
- If you want to modify your XML structure:
- you have to modify your DTD or Schema (and
you may not have access and/or permission to those…)
- tools depending on the hierarchy (e.g., XSLT) might go wrong…
- Similar problems with a DBMS:
- you have to modify the database record definition (and you may not have the right to do so…)
- In the triple store model you just merge…
Finding New Relationships
- RDF(+OWL) helps in finding new relationships
- e.g., in Life Sciences:
- most of the drug experiments are unsuccessful
- but the information from each experiment may be valuable
- by “binding” this information new insights can be gained
- Sharing and aggregation of data becomes easier
- may be determinant for future R&D, for example
- great tool for general community building
But... RDF Does Not Make XML Obsolete!
- Do not try to describe an HTML page in terms of triplets:
- it is technically doable…
- but things would be much more complicated!
-
I.e.: the choice depends on what you want to do!
What are these OWL layers?
Reminder…
- OWL has three layers:
- OWL Full
- OWL DL (ie, OWL Description Logic)
- OWL Lite
OWL Full
- No constraints on the various RDFS+OWL constructs
- e.g., one can make statements on RDFS constructs
- declare
rdf:type
to be functional…
- … thereby modifying their semantics
- unrestricted class hierarchies can be defined
- etc.
- Is a real superset of RDFS
- But: an OWL Full ontology may be undecidable!
Example for a Possible Problem (in OWL Full)
<owl:Class rdf:ID="A">
<owl:equivalentClass>
<owl:Restriction>
<owl:onProperty rdf:resource=".../22-rdf-syntax-ns#type"/>
<owl:allValueFrom rdf:about="#B"/>
</owl:Restriction>
</owl:equivalentClass>
</owl:Class>
<owl:class rdf:ID="B">
<owl:complementOf rdf:parseType="Collection">
<owl:Class rdf:about="#A"/>
</owl:complementOf>
</owl:class>
- Question: does the following make sense?
<owl:Thing rdf:ID="C">
<rdf:type rdf:resource="#A"/>
</owl:Thing>
- if
C
is of type A
then it must be of
the complement type, i.e., not of A
…
OWL Description Logic (DL)
Goal: maximal subset of OWL Full against which current research can
assure that a decidable reasoning procedure is realizable
-
Class
, Thing
, ObjectProperty
,
DatatypePropery
are strictly separated
- No statements on RDFS resources (e.g.,
rdf:type
)
- No characterization of datatype properties possible
- No cardinality constraint on transitive properties
- …
OWL Lite
-
Goal: provide a minimal useful subset, easily implemented
- All of DL’s restrictions, plus some more:
- class construction can be done only through intersection
or property constraints
- cardinality restriction with 0 and 1 only
- …
“Description Logic”
- The term refers to an area in knowledge representation
- a special type of “structured” First Order Logic (logic with safety guards…)
- formalism based on “concepts” (i.e., classes), “roles” (i.e., properties), and “individuals”
- based on model theoretic semantics (like RDF, RDFS, and OWL!)
- There are several variants of Description Logic
- i.e., OWL DL is an embodiment of a Description Logic
- for connaisseurs: OWL DL ≈
SHOIN
(D), OWL Lite ≈
SHIF
(D)
- some major differences: usage of URI-s, reference to XML
Schema datatypes, version control…
Note on OWL layers
- OWL Layers were defined to reflect compromises on expressability vs. implementability
- They were subject of passionate discussions…
- Some application just need to express and interchange terms (with possible scruffiness); OWL Full is fine
- they may build application specific reasoning instead of using a general one
- Some applications need rigor; then OWL DL might be the good choice
- Research may lead to new decidable subsets of OWL!
- see, e.g., H.J. ter Horst’s paper at ISWC2004
Why “OWL” and not “WOL”?
Why “OWL” and not “WOL”?
- Some urban legends…
- e.g., reference to Owl from Winie the Pooh, who misspelled his name
as “WOL”
- A reference
to an AI project at MIT of the mid 70’s by Bill Martin, called
“One World Language”…
- an early attempt for a KR language and associated ontology,
intended to be a universal language for encoding meaning for
computers
- “Why not be inconsistent in at least one aspect of a language which
is all about consistency” (Guus Schreiber)
Where does the metadata and ontologies come from?
(Should we really expect the author to type in all this metadata?)
It May Be Around Already…
- Part of the metadata information is present in tools … but thrown away at output
- e.g., a business chart can be generated by a tool…
- …it “knows” the structure, the
classification, etc. of the chart but, usually, this information is
lost
- Storing it in metadata would be easy!
- “SW-aware” tools are coming (even if you do not know it…):
- Photoshop CS stores metadata in RDF in, say, jpg files (referred to as XMP)
-
RSS feeds are generated by (almost) all blogging systems (a HUGE amount of RDF data!)
- easy to get RDF data from images stored on flickr
- …
RDF Can Also Be Generated
- Use intelligent “scrapers” or “wrappers” to extract a structure (hence RDF) from a Web page…
- using conventions in, e.g., class names
- using the header conventions
- … and then generate RDF automatically (e.g., via an XSLT script)
Formalizing the Scraper Approach: GRDDL
-
GRDDL formalizes the scraper approach. For example:
<html xmlns="http://www.w3.org/1999/">
<head profile="http://www.w3.org/2003/g/data-view">
<title>Some Document</title>
<link rel="transformation"
href="http://www.w3.org/2000/06/dc-extract/dc-extract.xsl" />
<meta name="DC.Subject" content="Some subject"/>
...
</head>
...
</html>
- yields, by running the file through
dc-extract.xsl
<rdf:Description rdf:about="">
<dc:subject>Some subject</dc:subject>
</rdf:Description>
- The user still has to provide
dc-extract.xsl
, but the mechanism is general
- Still a W3C Team Submission, may get a more formal status
Another Future Solution: XHTML2
-
XHTML2
defines general attributes to add metadata to any elementss
<span property="dc:date">March 23, 2004</span>
<span property="dc:title">High-tech rollers hit casino for £1.3m</span>
By <span property="dc:creator">Steve Bird</span> …
- may yield, by running the file through some processor
<rdf:Description rdf:about="">
<dc:date>March 23, 2004</dc:date>
<dc:title>High-tech rollers hit casino for £1.3m</dc:title>
<dc:creator>Steve Bird</dc:creator>
</rdf:Description>
- Note: the same text is part of XHTML and is an RDF Literal
- XHTML2 is still in a Working Draft phase, though…
And for Ontologies?
- The hard work is to create the ontologies in general
- requires a good knowledge of the area to be described
- some communities have good expertise already (e.g., librarians)
-
OWL is just a tool to formalize ontologies
- Large scale ontologies are often developed in a community process
- leading to versioning issues, too
- OWL includes predicates for versioning, deprecation, “same-ness”, …
- There is also R&D in generating them from a corpus of data
- still mostly a research subject
- Sharing ontologies may be vital in the process (remember the “Where is the Web” question?)
There are Already Ontologies Around…
Must I use OWL for my vocabularies?
Must I use OWL for my vocabularies?
- No! You can be a very proper SW citizen without using OWL…
- OWL is very powerful, but may be too powerful for your application, your vocabulary…
- Sometimes all that you need is to properly organize your terms, without any inference
Simple Knowledge Organisation System (SKOS)
- Goal: porting (“Webifying”) thesauri: representing and sharing
classifications, glossaries, thesauri, etc, as developed in the “Print
World”
- Examples of existing knowledge structure:
Example: Entries in a Glossary (1)
- “Assertion”
- “(i) Any expression which is claimed to be true. (ii) The act of
claiming something to be true.”
- “Class”
- “A general concept, category or classification. Something used
primarily to classify or categorize other things.”
- “Resource”
- “(i) An entity; anything in the universe. (ii) As a class name: the
class of everything; the most inclusive category possible.”
(from the RDF Semantics Glossary)
Example: Entries in a Glossary (2)
Example: Entries in a Glossary (3)
Example: Taxonomy (1)
Illustrates “broader” and “narrower”
- General
-
- SemWeb
-
(From MortenF’s weblog categories. Note that the
categorization is arbitrary!)
Example: Thesaurus (1)
- Term
- Economic cooperation
- Used For
- Economic co-operation
- Broader terms
- Economic policy
- Narrower terms
- Economic integration, European economic cooperation, European
industrial cooperation, Industrial cooperation
- Related terms
- Interdependence
- Scope Note
- Includes cooperative measures in banking, trade, industry etc.,
between and among countries
(from UK Archival Thesaurus)
SKOS Core Overview
- Basic description (
Concept
, ConceptScheme
,
inScheme
, hasTopConcept
)
- Labelling (
prefLabel
, altLabel
, prefSymbol
, altSymbol
…)
- Documentation (
definition
, scopeNote
,
changeNote
, historyNote
,
editorialNote
, publicNote
,
privateNote
)
- Semantic relations (
broader
, narrower
,
related
)
- Subject indexing (
subject
, isSubjectOf
,
primarySubject
, isPrimarySubjectOf
)
- Grouping (
Collection
, OrderedCollection
,
CollectableProperty
, member
,
memberList
)
Complementarity of SKOS and OWL (Why Having SKOS?)
- OWL’s precision not always necessary or even appropriate
- “OWL a sledge hammer / SKOS a nutcracker”, or “OWL a Harley /
SKOS a bike”
- complement each other, can be used in combination to optimize
cost/benefit
- Role of SKOS is
- to bring the worlds of library classification and Web technology
together
- to be simple and undemanding enough in terms of cost and required
expertise
- SKOS should be finalized in 2006
“Core” Vocabularies
- A number of public “core” vocabularies evolve to be used by
applications, e.g.:
- SKOS Core:
about knowledge systems
-
Dublin Core: about
information resources, digital libraries, with extensions for rights,
permissions, digital right management for, e.g., books, mainstream
journal content by PRISM
-
FOAF: about people and
their organizations
-
DOAP: on the descriptions
of (mainly open source) software projects
-
MusicBrainz: on
the description of CDs, music tracks, …
- …
- They share the underlying RDF model (provides mechanisms for
extensibillity, sharing, …)
How do I extract triplets from and RDF Graph?
Querying RDF Graphs
- The fundamental idea: use graph patterns to define subgraphs:
- a pattern contains unbound symbols
- by binding the symbols, subgraphs of the RDF graph may be matched
- if there is such a match, the query returns the bound resources or a subgraph
- This is the how SPARQL (Query Language for RDF) is defined
- is programming language-independent query language
- is in a last call working draft phase (Recommendation in 2006?)
Simple SPARQL Example
SELECT ?cat ?val
WHERE { ?x rdf:value ?val. ?x category ?cat }
- Returns:
[["Total Members",100],["Total
Members",200],…,["Full Members",10],…]
- Note the role of
?x
: it helps defining the pattern, but is
not returned
Other SPARQL Features
- Define optional patterns
- Limit the number of returned results; remove duplicates, sort them,…
- Add functional constraints to pattern matching
- Use datatypes and/or language tags when matching a pattern
- …
- SPARQL is in last call Working Draft, i.e., the technical aspects are
now fixed (modulo further community comments and/or implementation problems)
SPARQL Usage in Practice
-
Locally, i.e., bound to a programming environment like RDFLib
or Jena
- details are language dependent
-
Remotely, e.g., over the network or into a database
- very important usage: a growing number of RDF
depositories…
- separate documents define the protocol and the result format
- return is in XML: can be fed, e.g., into XSLT for direct
display
- An application pattern evolves: use (XHTML) forms to create a SPARQL
Query to a database and display the result in XHTML
Why Yet Another W3C Query Language?
- After all, we already have XQuery in the making…
- A query language reflects the underlying data model!
- XQuery is adapted to XML (to be precise: to XML Infosets)
- SPARQL is adapted to the RDF Model
- remember: RDF/XML is only syntax, and not the only serialization of RDF!
Isn't This Research Only?
(or: does this have any industrial relevance whatsoever?)
Not Any More…
- SW has indeed a strong foundation in research results…
- …but we see more and more companies embracing it!
- Remember:
- the Web was born at CERN…
- …was first picked up by high energy physicists…
- …then by academia at large…
- …then by small businesses and start-ups…
- “big business” came only later!
- network effect kicked in early…
- Semantic Web is now at #4, and moving to #5!
- Let us see some examples (we already saw some yesterday!)
Lots of Tools
- (Graphical) Editors: IsaViz (Xerox Research/W3C), RDFAuthor (Univ. of Bristol),
Protege 2000 (Stanford Univ.),
SWOOP (Univ. of Maryland), Orient (IBM)
- Programming Environments: Jena (for Java, includes OWL reasoning),
RDFLib (for Python),
Redland (in C, with interfaces to Tcl, Java,
PHP, Perl, Python, …), SWI-Prolog, IBM’s Semantic Toolkit, …
- Inclusion into Oracle (a.k.a. Oracle's Network Data Model)
- you can store RDF triplets in Oracle and retrieve the in SPARQL
- part of the Oracle Database 10.2 and beyond
- Triple based database systems: Kowari,
Gateway,
Sesame, …
- RDF and OWL validators: W3C’s RDF Validator,
BBN OWL Validator,
Pellet OWL Reasoner …
Data integration examples
- Semantic integration of corporate resources or different databases
- RDF/RDFS/OWL based vocabularies as an
“interlingua” among system components (early experimentation at
Boeing, see, e.g., a WWW11
paper)
- Similar approaches: Sculpteur project, MITRE Corp.,
MuseoSuomi, …
- There are companies specializing in the area
Portals
- Vodaphone's Live Mobile Portal
- search application (e.g. ringtone, game, picture) using RDF
- better search: page views per download decreased 50%
- increased revenue: ringtone up 20% in 2 months
- RDF was key factor in making this possible
- Sun’s SwordFish
- Public queries for support, handbooks, etc, go through an internal RDF engine:
- the queries are answered via an internal RDF database
- Nokia has a somewhat similar support portal
Baby CareLink
- Centre of information for the treatment of premature babies
- Provides an OWL based service as a Web Service
- combines disparate vocabularies like medical, insurance, etc
- users can add new entries to ontologies
- complex questions can be asked through the service
Are we done?
Not Yet…
- The “core” infrastructure is around
- New infrastructural elements are being developed:
- querying RDF data (e.g, SPARQL)
- specialized vocabularies (e.g., SKOS)
- …
- There is also a need for a very strong outreach:
- outreach to user communities (life sciences, geospatial information systems,
libraries and digital repositories, …)
- a separate Interest Group on health care and life sciences may start soon
- intersection of SW with other technologies (Web Services, Privacy issues, …)
- There is a separate Working Group on “Deployment and Best Practices”
- A separate Working Group on SW outreach may start soon
Rules
- OWL can be used for simple inferences
- Applications may want to express domain-specific knowledge, like “Horn clauses”:
- (prem-1 ∧ prem-2 ∧ …) ⇒ (concl-1 ∧
concl-2 ∧ …)
- e.g.: for any «X», «Y» and «Z»: “if «Y» is a
parent of «X», and «Z» is a brother of «Y» then «Z» is the
uncle of «X»”
- But it gets more complicated; there is a large corpus of rule based
systems and languages (mostly vendor specific), though not necessarily
bound to the Web (yet)
- Several attempts already to combine Semantic Web with Rules (Metalog,
RuleML,
SWRL,
WRL, cwm, …)
- note: cwm, for example, defines Horn predicates in terms of graph
patterns, a connection to SPARQL…
W3C’s Rules Workshop
- W3C held a Workshop in April
2005; lots of issues and user cases were identified
- Some interesting scenarios have already been identified:
- exchange mail filtering rules among clients
- rules to search into data (databases, RDF stores)
- rules to analyse medical, banking, or other data data (merging rules of different origins)
- interest from financial services, business rules, life science community,…
Current Plans
- Start a Working Group at the end of 2005 (if the charter is accepted by the W3C Members)
- Work in two “phases”:
- construct an extensible format for rule interchange
- define more complex extensions
- The goal is to finish phase 1 quickly (end 2006?), by restricting it to a well manageable subset of possible rule sets
- probably: “Full Horn Logic”
- Second phase would look at more complex alternatives (as extensions)
Lots of Theoretical Questions to Solve
- Open vs. Closed Worlds, monotonicity
- Probabilitistic and/or fuzzy logic extensions
- Syntax issues (XML?, RDF?, Another abstract syntax?)
- Relationship to RDFS and mainly to OWL: “One Tower” vs. “Two Towers” model:
Beyond Rules: Trust
- Can I trust a metadata on the Web?
- is the author the one who claims he/she is, can I check his/her
credentials?
- can I trust the inference engine?
- etc.
- Some of the building blocks are available (e.g., Signature/Encryption) but some are missing:
- how to “name” a full graph
- a “canonical” form of triplets (necessary for unambiguous
signatures or to compare graphs)
- how to “express” trust? (e.g., trust in context)
- protocols to check, for example, a signature
- …
- It is on the “future” stack of W3C and the SW Community …
A Number of Other Issues…
- Lot of R&D is going on:
- improve the inference algorithms and implementations
- improve scalability, reasoning with OWL Full
- temporal & spatial reasoning, fuzzy logic
- better modularization (import or refer to part of ontologies)
- procedural attachments (e.g., in rules)
- exact relationships between Description and other types logics (and their usage on the Web)
- ontology management on the Web
- …
- This mostly happens outside of W3C, though
- W3C is not a research entity…
Now For Real…
Other Questions?