RIF Framework for Logic Dialects

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

Set of Documents

Please Comment By 19 February 2008

No Endorsement

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

Patents

1 Overview of RIF-FLD

The RIF Framework for Logic-based Dialects (RIF-FLD) is a formalism for specifying all logic-based dialects of RIF, including RIF-BLD. It is a logic in which both syntax and semantics are described through a number of mechanisms that are commonly used for various logic languages, but are rarely brought all together. RIF-BLD gives precise definitions to these mechanisms, but leaves some concrete details out. Each dialect that is based on RIF-FLD is expected to specialize these general mechanisms (even leave out some elements of RIF-FLD) to produce its concrete syntax and model-theoretic semantics.

This document is intended for the designers of future RIF dialects. The reader who is interested in using a particular dialect, such as RIF-BLD, or in implementing such a dialect can go directly to the description of the dialect in question.

RIF-FLD has the following main components:

Syntactic framework. This framework defines the mechanisms for specifying the formal presentation syntax of RIF's logic dialects. The presentation syntax is used in RIF to define the semantics of the dialects and to illustrate the main ideas with examples. The presentation syntax of a dialect is not intended to be a concrete syntax for that dialect. For instance, RIF deliberately leaves out details such as the delimiters of the various syntactic components, escape symbols, parenthesizing, precedence of operators, and the like. Instead, being an interchange format, RIF dialects use XML as their concrete syntax.
Semantic framework. The semantic framework describes the mechanisms that are used for specifying the models of RIF logic-based dialects.
XML serialization framework. This framework defines the general principles that logic-based dialects are to use in specifying their concrete XML-based syntaxes. For each dialect, its concrete XML syntax is a derivative of the dialect's presentation syntax. It can be seen as a serialization of that syntax.

The framework described in this document is very general, and it captures most of the popular logic-based languages found in Databases, Logic Programming, and on the Semantic Web. However, it is expected that the needs of some newly developed dialects may stimulate further evolution of RIF-FLD.

Syntactic framework. The syntactic framework defines three main classes of RIF terms:

Positional terms. These are the usual terms, which are most commonly used in first-order logic. RIF-FLD defines positional terms in a slightly more general way in order to enable dialects with higher-order syntax, such as HiLog [CKW93].
Terms with named arguments. These are like positional terms except that each argument of a term is named and the order between the arguments is immaterial. Terms with named arguments correspond to rows in relational tables, where column headings correspond to argument names.
Frames. A frame term represents an assertion about an object and its properties. These terms correspond to molecules of F-logic [KLW95]. There is certain syntactic similarity between terms with named arguments and frames, since object properties resemble to named arguments. However, the semantics of these terms are quite different.

RIF dialects can choose to support all or some of the aforesaid categories of terms. The syntactic framework also defines the following mechanisms for specializing these terms:

Symbol spaces.

Symbol spaces are used to separate the set of all non-logical symbols (symbols used as variables, individual constants, predicates, and functions) into distinct subsets. These subsets can then be given different semantics. A symbol space has one or more identifiers and a lexical space, which defines the "shape" of the symbols in that symbol space. For instance, some symbol spaces can be used to identify any object, and syntactically they look like IRIs (for instance, rif:IRI in RIF Basic Logic Dialect). Other symbol spaces may be used to describe the data types used in RIF (for example, xsd:integer).

Signatures.

Signatures determine which terms and formulas are well-formed. It is a generalization of the notion of a sort in classical first-order logic [Enderton01]. Each nonlogical symbol (and some logical symbols, like =) has an associated signature. A signature defines, in a precise way, the syntactic contexts in which the symbol is allowed to occur. For instance, the signature associated with a symbol, p, might allow p to appear in a term of the form f(p), but disallow it to occur in a term like p(a,b). The signature for f, on the other hand, might allow that symbol to appear in f(p) and f(p,q), but disallow f(p,q,r) and f(f). In this way, it is possible to control which symbols are used for predicates and which for functions, where variables can occur, and so on.

Semantic framework. This framework defines the notion of a semantic structure or interpretation (both terms are used in the literature [Enderton01, Mendelson97], but here we will mostly use the first). Semantic structures are used to interpret RIF formulas and to define logical entailment. As with the syntax, this framework includes a number of mechanisms that RIF logic-based dialects can specialize to suit their needs. These mechanisms include:

Truth values. RIF-FLD is designed to accommodate the dialects that support reasoning with inconsistent and uncertain information. Most of the logics that were designed to deal with these situations are multi-valued. Consequently, RIF-FLD postulates that there is a set of truth values, TV, which includes the values t (true) and f (false) and possibly others. For example, RIF Basic Logic Dialect is two-valued, but other dialects can be three-valued, four-valued, and so on.
Data types. Some symbol spaces (which are part of the RIF syntactic framework) may have special semantics. For instance, symbols in the symbol space of strings (xsd:string) are always interpreted as sequences of unicode characters, and a ≠ b for any pair of distinct symbols. Symbol spaces that have special semantics are called data types.
Entailment. This notion is fundamental to logic-based dialects. Given a set of formulas (e.g., facts and rules) G, entailment determines which other formulas necessarily follow from G. Entailment is the main mechanism underlying query answering in databases, logic programming, and the various reasoning tasks in Description Logic.

Roughly speaking, a set of formulas, G, logically entails another formula, g, if for every semantic structure I in some set S, if I makes G true, then I also makes g true. Almost all known logics define entailment this way. The difference lies in which set S they use. For instance, logics that are based on the classical first-order predicate calculus, such as Description Logic, assume that S is the set of all semantic structures. In contrast, logic programming languages, which use default negation, assume that S contains only the so-called "minimal" Herbrand models of G and, furthermore, only the minimal models of a special kind. See [Shoham87] for a more detailed exposition of this subject.

XML serialization framework. This framework defines the general principles for serializing the various parts of the presentation syntax of RIF-FLD.

2 Syntactic Framework

2.1 Syntax of a RIF Dialect as a Specialization of RIF-FLD

The syntax for a RIF dialect can be obtained from the general syntactic framework of RIF by specializing the following parameters (which are defined in this document):

The alphabet of RIF-FLD can be restricted.
An assignment of signatures to each constant symbol.
- Signatures determine which terms in the dialect are well-formed and which are not. The exact way this assignment is defined depends on the dialect. The assignment can be explicit or implicit (for instance, derived from the context in which each symbol is used).
The choice of the types of terms supported by the dialect.
- The RIF logic framework introduces the following types of terms:
  - constant
  - variable
  - positional
  - with named arguments
  - equality
  - frame
  - class membership
  - subclass A dialect might support all of them or a subset.
The choice of symbol spaces supported by the dialect.
- Symbol spaces determine the "shapes" of the symbols that are allowed by the syntax of the dialect.
The choice of the formulas supported by the dialect.
- RIF-FLD allows to build formulas of the following kind:
  - Atomic
  - Conjunction
  - Disjunction
  - Classical negation
  - Default negation
  - Rule
  - Quantification: universal and existential A dialect might support all of these formulas or it might impose various restrictions. For instance, the formulas in the conclusion and the premises of rules might be restricted, certain quantifications might be prohibited, classical or default negation (or both) might not be allowed, etc.

2.2 Alphabet

The alphabet of RIF-FLD consists of a countably infinite set of constant symbols Const, a countably infinite set of variable symbols Var (disjoint from Const), a countably infinite set of argument names ArgNames (disjoint from both Const and Var), connective symbols And and Or, quantifiers Exists and Forall, the symbols =, #, ##, :-, ->, Naf, Neg, and auxiliary symbols, such as "(" and ")". The set of connective symbols, quantifiers, =, etc., is disjoint from Const and Var. Variables are written as Unicode strings preceded with the symbol "?". The syntax for constant symbols is given in Section Symbol Spaces.

The language of RIF-BLD is the set of formulas constructed using the above alphabet according to the rules spelled out below.

2.3 Terms

The most basic construct of a logic language is a term. RIF-FLD supports several kinds of terms: constants, variables, the regular positional terms, plus terms with named arguments, equality, classification terms, and frames. The word "term" will be used to refer to any kind of terms. Formally, terms are defined as follows:

Constants and variables. If t ∈ Const or t ∈ Var then t is a simple term.
Positional terms. If t and t₁, ..., t_n are terms then t(t₁ ... t_n) is a positional term. Positional terms in RIF-FLD generalize the regular notion of a term used in first-order logic. For instance, the above definition allows variables everywhere.
Terms with named arguments. A term with named arguments is of the form t(s₁->v₁ ... s_n->v_n), where t, v₁ , ..., v_n are terms (positional, with named arguments, frame, etc.), and s₁, ..., s_n are (not necessarily distinct) symbols from the set ArgNames. The term t here represents a predicate or a function; s₁, ..., s_n represent argument names; and v₁ , ..., v_n represent argument values. Terms with named arguments are like regular positional terms except that the arguments are named and their order is immaterial. Note that a term with no arguments, like f(), is both positional and also is considered to have named arguments.
Equality terms. An equality term has the form t = s, where t and s are terms.
Classification terms. There are two kinds of classification terms: class membership terms (or just membership terms) and subclass terms.
- t#s is a membership term if t and s are arbitrary terms.
- t##s is a subclass term if t and s are arbitrary terms.
Frame terms. t[p₁->v₁ ... p_n->v_n] is a frame term (or simply a frame) if t, p₁, ..., p_n, v₁, ..., v_n, n ≥ 0, are arbitrary terms. As in the case of the terms with named arguments, the order of the properties p_i->v_i in a frame is immaterial.

Classification and frame terms are used to describe objects in object-based logics like F-logic [KLW95].

The above definition is very general. It makes no distinction between constant symbols that represent individuals, predicates, and function symbols. The same symbol can occur in multiple contexts at the same time. For instance, if p, a, and b are symbols then p(p(a) p(a p c)) is a term. Even variables and general terms are allowed to occur in the position of predicates and function symbols, so p(a)(?v(a c) p) is also a term.

Frame, classification, and other terms can be freely nested, as exemplified by p(?X q#r[p(1,2)->s](d->e f->g)). Some language environments, like FLORA-2 [FL2], OO jDREW [OOjD], and CycL [CycL] support fairly large (partially overlapping) subsets of RIF-FLD terms, but most languages support much smaller subsets. RIF dialects are expected to carve out the appropriate subsets of RIF-FLD terms, and the general form of the RIF logic framework allows a considerable degree of freedom.

The mechanism that allows "carving out" of such subsets is called a signature and works as follows. The RIF-FLD language associates a signature with each symbol (both constant and variable symbols) and uses signatures to define what is called well-formed terms. Each RIF dialect is expected to select appropriate signatures for the symbols in its alphabet, and only the terms that are well-formed according to the selected signatures are allowed in that particular dialect.

2.4 Signatures

In this section we introduce the concept of a signature, which is a key mechanism that allows RIF-FLD to control the context in which the various symbols are allowed to occur. Much of this development is inspired by [CK95]. It should be kept in mind that signatures are not part of the logic language in RIF, since they do not appear anywhere in the RIF formulas. Instead they are part of a separate language for signatures, which is akin to grammar rules in that it determines which sequences of tokens are in the language and which are not. In some dialects (for example RIF-BLD), signatures are derived from the context and no separate language for signatures is used. Other dialects may choose to specify signatures explicitly. In that case, they will need to define a concrete language for specifying signatures.

Let SigNames be a non-empty, partially-ordered finite or countably infinite set of symbols, disjoint from Const, Var, and ArgNames, called signature names. We require that this set includes at least the following signature names:

atomic -- used to represents the syntactic context where atomic formulas are allowed to appear.
= -- used for representing contexts where equality terms can appear.
# -- a signature name reserved for membership terms.
## -- a signature reserved for subclass terms.
-> -- a signature reserved for frame terms.

Dialects are expected to introduce additional signature names. For instance, RIF-BLD introduces one other signature name, term. The partial order on SigNames is dialect-specific; it is used in the definition of well-formed terms below.

We use the symbol < to represent the partial order on SigNames. Informally, α < β means that terms with signature α can be used wherever terms with signature β are allowed. We will write α ≤ β if either α = β or α < β.

A signature is a statement of the form η{e₁, ..., e_n, ...} where η ∈ SigNames is the name of the signature and {e₁, ..., e_n, ...} is a countable set of arrow expressions. Such a set can thus be infinite, finite, or even empty. In RIF-BLD, signatures can have at most one arrow expression. Other dialects (such as HiLog [CKW93], for example) may require polymorphic symbols and thus allow signatures with more than one arrow expression in them.

An arrow expression is defined as follows:

If κ, κ₁, ..., κ_n ∈ SigNames, n≥0, are signature names then (κ₁ ... κ_n) ⇒ κ is a positional arrow expression. For instance, () ⇒ term and (term) ⇒ term are arrow expressions, if term is a signature name.
If κ, κ₁, ..., κ_n ∈ SigNames, n≥0, are signature names and p₁, ..., p_n ∈ ArgNames are argument names then (p₁->κ₁ ... p_n->κ_n) => κ is an arrow expression with named arguments. For instance, (arg1->term arg2->term) => term is an arrow signature expression with named arguments. The order of the arguments in arrow expressions with named arguments is immaterial, so any permutation of arguments yields the same expression.

A set S of signatures is coherent iff

S contains the special signature atomic{ }, which represents the context of atomic formulas.
S contains the signature ={e₁, ..., e_n, ...} for the equality symbol. All arrow expressions e_i here have the form (κ κ) ⇒ γ (both arguments in an equation must have the same signature) and at least one of these expressions must have the form (κ κ) ⇒ atomic (i.e., some equations should be allowed as atomic formulas). Dialects may further specialize this signature.
S contains the signature #{e₁, ..., e_n...} where all arrow expressions e_i are binary (have two arguments) and at least one has the form (κ γ) ⇒ atomic. Dialects may further specialize this signature.
S contains the signature ##{e₁, ..., e_n...} where all arrow expressions e_i have the form (κ κ) ⇒ γ (both arguments must have the same signature) and at least one of these arrow expressions has the form (κ κ) ⇒ atomic. Dialects may further specialize this signature.
S contains the signature ->{e₁, ..., e_n...}, where all arrow expressions e_i are ternary (have three arguments) and at least one of them is of the form (κ₁ κ₂ κ₃ ) ⇒ atomic. Dialects may further specialize this signature.
S has at most one signature for any given signature name.
Whenever S contains a pair of signatures, ηS and κR, such that η<κ then R⊆S. Here ηS denotes a signature with the name η and the associated set of arrow expression S; similarly κR is a signature named κ with the set of expressions R. The requirement that R⊆S ensures that symbols that have signature η can be used wherever the symbols with signature κ are allowed.

2.5 Well-formed Terms and Formulas

Signatures are used to control the context in which various symbols are allowed to occur, as explained next.

Each variable symbol is associated with exactly one signature from a coherent set of signatures. A constant symbol can have one or more signatures, and different symbols can be associated with the same signature. Since signature names uniquely identify signatures in coherent signature sets, we will often refer to signatures simply by their names. For instance, if one of f's signatures is atomic{ }, we may simply say that symbol f has signature atomic.

Next we define well-formed terms and their signatures. Like the constant symbols, well-formed terms can have more than one signature.

A constant or variable symbol with signature η is a well-formed term with signature η.
A positional term t(t₁ ... t_n), 0≤n, is well-formed and has a signature σ iff
- t is a well-formed term that has a signature that contains an arrow expression of the form (σ₁ ... σ_n) ⇒ σ; and
- Each t_i is a well-formed term whose signature is γ_i, such that γ_i, ≤ σ_i.

As a special case, when n=0 we obtain that t( ) is a well-formed term with signature σ, if t's signature contains the arrow expression () ⇒ σ.

A term with named arguments t(p₁->t₁ ... p_n->t_n), 0≤n, is well-formed and has a signature σ iff
- t is a well-formed term that has a signature that contains an arrow expression with named arguments of the form (p₁->σ₁ ... p_n->σ_n) ⇒ σ; and
- Each t_i is a well-formed term whose signature is γ_i, such that γ_i, ≤ σ_i.

As a special case, when n=0 we obtain that t( ) is a well-formed term with signature σ, if t's signature contains the arrow expression () ⇒ σ.

An equality term of the form t₁=t₂ is well-formed and has a signature κ iff
- The signature = has an arrow expression (σ σ) ⇒ κ
- t_i and t₂ are well-formed terms with signatures γ₁ and γ₂, respectively, such that γ_i ≤ σ, i=1,2.
A membership term of the form t₁#t₂ is well-formed and has a signature κ iff
- The signature # has an arrow expression (σ₁ σ₂) ⇒ κ
- t_i and t₂ are well-formed terms with signatures γ₁ and γ₂, respectively, such that γ_i ≤ σ_i, i=1,2.
A subclass term of the form t₁##t₂ is well-formed and has a signature κ iff
- The signature ## has an arrow expression (σ σ) ⇒ κ
- t_i and t₂ are well-formed terms with signatures γ₁ and γ₂, respectively, such that γ_i ≤ σ, i=1,2.
A frame term of the form t[s₁->v₁ ... s_n->v_n] is well-formed and has a signature κ iff
- The signature -> has arrow expressions (σ σ₁₁ σ₁₂) ⇒ κ, ..., (σ σ_n1 σ_n2) ⇒ κ (these n expressions need not be distinct).
- t, s_j, and v_j are well-formed terms with signatures γ, γ_j1, and γ_j2, respectively, such that γ ≤ σ and γ_ji ≤ σ_ji, where j=1,...,n and i=1,2.

Note that, according to the above definition, f() and f are distinct terms. We define atomic formulas as follows:

A term is a well-formed atomic formula iff it is a well-formed term one of whose signatures is η, such that η = atomic or η < atomic.

Note that equality, membership, subclass, and frame terms are always atomic formulas, since atomic is always one of their signatures.

More general formulas are constructed out of atomic formulas with the help of logical connectives. A formula is a statement that can have one of the following forms:

Atomic: If φ is a well-formed atomic formula then it is also a well-formed formula.
Conjunction: If φ₁, ..., φ_n, n ≥ 0, are well-formed formulas then so is And(φ₁ ... φ_n). As a special case, And() is allowed and is treated as a tautology, i.e., a formula that is always true.
Disjunction: If φ₁, ..., φ_n, n ≥ 0, are well-formed formulas then so is Or(φ₁ ... φ_n). When n=0, we get Or() as a special case; it is treated as a formula that is always false.
Classical negation: If φ is a well-formed formula then Neg φ is a well-formed formula.
Default negation: If φ is a well-formed formula then Naf φ is a well-formed formula.
Rule: If φ and ψ are well-formed formulas then φ :- ψ is a well-formed formula.
Quantification: If φ is a well-formed formula and ?V₁, ..., ?V_n are variables then Exists ?V₁ ... ?V_n(φ) and Forall ?V₁ ... ?V_n(φ) are well-formed formulas.

Example 1 (The use of signatures)

We illustrate the above definitions with the following examples. In addition to atomic, let there be another signature, term{ }, which is also used in RIF-BLD.

Consider the term p(p(a) p(a b c)). If p has the (polymorphic) signature mysig{(term)⇒term, (term term)⇒term, (term term term)⇒term} and a, b, c each has the signature term{ } then p(p(a) p(a b c)) is a well-formed term with signature term{ }. If instead p had the signature mysig2{(term term)⇒term, (term term term)⇒term} then p(p(a) p(a b c)) would not be a well-formed term since then p(a) would not be well-formed (in this case, p would have no arrow expression which allows p to take just one argument).

For a more complex example, let r have the signature mysig3{(term)⇒atomic, (atomic term)⇒term, (term term term)⇒term}. Then r(r(a) r (a b c)) is well-formed. The interesting twist here is that r(a) is an atomic formula that occurs as an argument to a function symbol. However, this is allowed by the arrow expression (atomic term)⇒ term, which is part of r's signature. If r's signature were mysig4{(term)⇒atomic, (atomic term)⇒atomic, (term term term)⇒term} instead, then r(r(a) r(a b c)) would be not only a well-formed term, but also a well-formed atomic formula.

An even more advanced example of signatures is when the right-hand side of an arrow expression is something other than term or atomic. For instance, let John, Mary, NewYork, and Boston have signatures term{ }; flight and parent have signature h₂{(term term)⇒atomic}; and closure has signature hh₁{(h₂)⇒p₂}, where p₂ is the name of the signature p₂{(term term)⇒atomic}. Then flight(NewYork Boston), closure(flight)(NewYork Boston), parent(John Mary), and closure(parent)(John Mary) would be well-formed formulas. Such formulas are allowed in languages like HiLog [CKW93], which support predicate constructors like closure in the above example.

2.6 Symbol Spaces

Throughout this document, the xsd: prefix stands for the XML Schema namespace URI http://www.w3.org/2001/XMLSchema#, the rdf: prefix stands for http://www.w3.org/1999/02/22-rdf-syntax-ns#, and rif: stands for the URI of the RIF namespace, http://www.w3.org/2007/rif#. Syntax such as xsd:string should be understood as a compact URI [CURIE] -- a macro that expands to a concatenation of the character sequence denoted by the prefix xsd and the string string.

The set of all constant symbols in a RIF dialect is partitioned into a number of subsets, called symbol spaces, which are used to represent XML Schema data types, data types defined in other W3C specifications, such as rdf:XMLLiteral, and to distinguish other sets of constants. Constant symbols that belong to the various symbol spaces have special presentation syntax and semantics.

Formally, a symbol space is a named subset of the set of all constants, Const. The semantic aspects of symbol spaces will be described in Section Semantic Framework. Each symbol in Const belongs to exactly one symbol space.

Each symbol space has an associated lexical space and an identifier.

The lexical space of a symbol space is a non-empty set of Unicode character strings.
The identifier of a symbol space is an absolute IRI.

To simplify the language, we will often use symbol space identifiers to refer to the actual symbol spaces (for instance, we may use "symbol space xsd:string" instead of "symbol space identified by xsd:string").

To refer to a constant in a particular RIF symbol space, we use the following presentation syntax:

     LITERAL^^SYMSPACE

where LITERAL is a Unicode string, called the lexical part of the symbol, and SYMSPACE is an identifier of the symbol space in the form of an absolute IRI string. LITERAL must be an element in the lexical space of the symbol space. For instance, 1.2^^xsd:decimal and 1^^xsd:decimal are legal symbols because 1.2 and 1 are members of the lexical space of the XML Schema data type xsd:decimal. On the other hand, a+2^^xsd:decimal is not a legal symbol, since a+2 is not part of the lexical space of xsd:decimal.

The set of all symbol spaces that partition Const is considered to be part of the logic language used by RIF rule sets.

RIF supports the following symbol spaces. Rule sets that are exchanged through RIF can use additional symbol spaces as explained below.

xsd:string (http://www.w3.org/2001/XMLSchema#string)

and all the symbol spaces that corresponds to the subtypes of xsd:string as specified in [XML-SCHEMA2].

xsd:decimal (http://www.w3.org/2001/XMLSchema#decimal)

and all the symbol spaces that corresponds to the subtypes of xsd:decimal as specified in [XML-SCHEMA2].

xsd:time (http://www.w3.org/2001/XMLSchema#time).
xsd:date http://www.w3.org/2001/XMLSchema#dateTime).
xsd:dateTime http://www.w3.org/2001/XMLSchema#dateTime).

The lexical spaces of the above symbol spaces are defined in the document [XML-SCHEMA2].

rdf:XMLLiteral (http://www.w3.org/1999/02/22-rdf-syntax-ns#XMLLiteral).

This symbol space represents XML content. The lexical space of rdf:XMLLiteral is defined in the document [RDF-CONCEPTS].

rif:text (for text strings with language tags attached).

This symbol space represents text strings with a language tag attached. The lexical space of rif:text is the set of all Unicode strings of the form ...@LANG, i.e., strings that end with @LANG where LANG is a language identifier as defined in [RFC-3066].

rif:iri (for internationalized resource identifiers or IRIs).

Constant symbols that belong to this symbol space are intended to be used in a way similar to RDF resources [RDF-SCHEMA]. The lexical space consists of all absolute IRIs as specified in [RFC-3987]; it is unrelated to the XML primitive type anyURI. A rif:iri constant is supposed to be interpreted as a reference to one and the same object regardless of the context in which that constant occurs.

rif:local (for constant symbols that are not visible outside of a particular set of RIF formulas).

Symbols in this symbol space are used locally in their respective rule sets. This means that occurrences of the same rif:local-constant in different rule sets are viewed as unrelated distinct constants, but occurrences of the same constant in the same rule set must refer to the same object. The lexical space of rif:local is the same as the lexical space of xsd:string.

Notes on RIF-compliant support for symbol spaces.

A RIF-compliant inference engine must support the following symbol spaces: xsd:string, xsd:decimal, xsd:time, xsd:date, xsd:dateTime, rdf:XMLLiteral, rif:text, rif:iri, rif:local. Such an engine can support additional symbol spaces.
A RIF-producing system includes a RIF compliant inference engine and a transformation from the language of that engine into valid RIF XML format. Such an engine must support all the symbol spaces that are mentioned in the documents produced by the aforesaid transformation. In particular, this transformation must not produce invalid constant symbols, i.e., symbols whose lexical part is not an element of the lexical space of the symbol's symbol space.
A RIF-consuming system includes a RIF-compliant inference engine and a transformation from RIF XML to the language of the engine. A consumer engine is not required to support all symbol spaces that are subspaces of the symbol spaces supported by the producer engine. For instance, a RIF-producer system might support xsd:short, a subspace of xsd:decimal, but RIF consumers do not need to support xsd:short. The consumer is allowed to replace the constants in an unsupported symbol space with the corresponding constant symbols in a supported superspace. For example, "123"^^xsd:short can be replaced with "123"^^xsd:decimal and "abc123"^^xsd:IDREF with "abc123"^^xsd:string. Such substitutions are permitted because they do not affect the inferences that can be made from RIF documents (see Section RIF Semantic Framework).

3 Semantic Framework

3.1 Semantics of a RIF Dialect as a Specialization of RIF-FLD

The RIF-FLD semantic framework defines the notions of semantic structures and of models of RIF formulas. The semantics of a dialect is derived from these notions by specializing the following parameters.

The effect of the syntax.
- The syntax of a dialect may limit the kinds of terms that are supported. For instance, if the dialect does not support frames or terms with named arguments then the parts of the semantic structures whose purpose is to interpret the unsupported types of terms become redundant.
Truth values.
- The RIF-FLD semantic framework allows formulas to have truth values from an arbitrary partially ordered set of truth values, TV. A concrete dialect must select a concrete partially or totally ordered set of truth values.
Data types.
- A data type is a symbol space that has a fixed interpretation in any semantic structure. RIF-FLD defines a set of core data types that each dialect is expected to support, but its semantics does not limit support to just the core types. RIF dialects can introduce additional data types, and each dialect is expected to define the exact set of data types that it supports.
Logical entailment.
- Logical entailment in RIF-FLD is defined with respect to an unspecified set of intended models. A RIF dialect must define which models are considered to be intended. For instance, one dialect might specify that all models are intended (which leads to classical first-order entailment), another may consider only the minimal models as intended, while a third one might only use so-called well-founded or stable models.

All of the above notions are defined in the remainder of this document.

3.2 Truth Values

Each RIF dialect is expected to define the set of truth values, denoted by TV. This set must have a partial order, called the truth order, denoted <_t. As a special case, <_t can be a total order in some dialects. We write a ≤_t b if either a <_t b or a and b are the same element of TV. In addition,

TV must be a complete lattice with respect to <_t, i.e., the least upper bound (lub_t) and the greatest lower bound (glb_t) must exist for any subset of TV.
TV is required to have two distinguished elements, f and t, such that f ≤_t elt and elt ≤_t t for every elt∈TV.
TV has an operator of negation, ~: TV → TV, such that
- ~ is idempotent, i.e., applying ~ twice gives the identity mapping.
- ~t = f (and thus ~f = t).

RIF dialects can have additional truth values. For instance, the semantics of some versions of NAF, such as the well-founded negation, requires three truth values: t, f, and u (undefined), where f <_t u <_t t. Handling of contradictions and uncertainty usually requires at least four truth values: t, u, f, and i (inconsistent). In this case, the truth order is partial: f <_t u <_t t and f <_t i <_t t.

3.3 Primitive Data Types

A primitive data type (or just a data type, for short) is a symbol space that has

an associated set, called the value space, and
a mapping from the lexical space of the symbol space to the value space, called lexical-to-value-space mapping.

Semantic structures are always defined with respect to a particular set of data types, denoted by DTS. In a concrete dialect, DTS always includes the data types supported by that dialect. All RIF dialects are expected to support the following primitive data types:

xsd:long
xsd:integer
xsd:decimal
xsd:string
xsd:time
xsd:dateTime
rdf:XMLLiteral
rif:text

Their value spaces and the lexical-to-value-space mappings are defined as follows:

For the XML Schema data types of RIF, namely xsd:long, xsd:integer, xsd:decimal, xsd:string, xsd:time, and xsd:dateTime, the value spaces and the lexical-to-value-space mappings are defined in the XML Schema specification [XML-SCHEMA2].
The value space for the primitive data type rdf:XMLLiteral is defined in RDF [RDF-CONCEPTS].
The value space of rif:text is the set of all pairs of the form (string, lang), where string is a Unicode character sequence and lang is a lowercase Unicode character sequence which is a natural language identifier as defined by RFC 3066 [RFC-3066]. The lexical-to-value-space mapping of rif:text, denoted L_rif:text, maps each symbol string@lang in the lexical space of rif:text to (string, lower-case(lang)), where lower-case(lang) is lang written in all-lowercase letters.

The value space and the lexical-to-value-space mapping for rif:text defined here are compatible with RDF's semantics for strings with named tags [RDF-SEMANTICS].

Although the lexical and the value spaces might sometimes look similar, one should not confuse them. Lexical spaces define the syntax of the constant symbols in the RIF language that belong to the various primitive data types. In contrast, value spaces define the meaning of those constants. The lexical and the value spaces are often not even isomorphic. For instance, 1.2^^xsd:decimal and 1.20^^xsd:decimal are two legal -- and distinct -- constants in RIF because 1.2 and 1.20 belong to the lexical space of xsd:decimal. However, these two constants are interpreted by the same element of the value space of the xsd:decimal type. Therefore, 1.2^^xsd:decimal = 1.20^^xsd:decimal is a RIF tautology. Likewise, RIF semantics for data types implies certain inequalities. For instance, abc^^xsd:string ≠ abcd^^xsd:string is a tautology, since the lexical-to-value-space mapping of the xsd:string type maps these two constants into distinct elements in the value space of xsd:string.

3.4 Semantic Structures

The central step in specifying a model-theoretic semantics for a logic-based language is defining the notion of a semantic structure, also known as an interpretation. Semantic structures are used to assign truth values to RIF-FLD formulas.

A semantic structure, I, is a tuple of the form <TV, DTS, D, I_C, I_V, I_F, I_frame, I_SF, I_sub, I_isa, I₌, I_Truth>. Here D is a non-empty set of elements called the domain of I. We will continue to use Const to refer to the set of all constant symbols and Var to refer to the set of all variable symbols. TV denotes the set of truth values that the semantic structure uses and DTS is the set of primitive data types used in I.

The other components of I are total mappings defined as follows:

I_C maps Const to elements of D.
- This mapping interprets constant symbols.
I_V maps Var to elements of D.
- This mapping interprets variable symbols.
I_F maps D to functions D* → D (here D* is a set of all sequences of any finite length over the domain D)
- This mapping interprets positional terms.

I_SF interprets terms with named arguments. It is a total mapping from Const to the set of total functions of the form SetOfFiniteBags(ArgNames × D) → D. This is analogous to the interpretation of positional terms with two differences:
- Each pair <s,v> ∈ ArgNames × D represents an argument/value pair instead of just a value in the case of a positional term.
- The argument to a term with named arguments is a finite set of argument/value pairs rather than a finite ordered sequence of simple elements.
- Bags are used here because the order of the argument/value pairs in a term with named arguments is immaterial and the pairs may repeat. For instance, p(a->b a->c).
I_frame is a total mapping from D to total functions of the form SetOfFiniteBags(D × D) → D.
- This mapping interprets frame terms. An argument, d ∈ D, to I_frame represent an object and a finite bag {<a1,v1>, ..., <ak,vk>} represents a bag (multiset) of attribute-value pairs for d. We will see shortly how I_frame is used to determine the truth valuation of frame terms.

Bags are used here because the order of the attribute/value pairs in a frame is immaterial and the pairs may repeat. For instance, o[a->b a->c] means that the value of the attribute a on the object o is a set that contains b and c.

I_sub gives meaning to the subclass relationship. It is a total function D × D → D.
- The operator ## is required to be transitive, i.e., c1 ## c2 and c2 ## c3 must imply c1 ## c3. This is ensured by a restriction in Section Interpretation of Formulas.
I_isa gives meaning to class membership. It is a total function D × D → D.
- The relationships # and ## are required to have the usual property that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl must imply o # scl. This is ensured by a restriction in Section Interpretation of Formulas.
I₌ gives meaning to the equality. It is a total function D × D → D.
I_Truth is a total mapping D → TV.
- It is used to define truth valuation of formulas.

We also define the following mapping I :

I(k) = I_C(k), if k is a symbol in Const
I(?v) = I_V(?v), if ?v is a variable in Var
I(f(t₁ ... t_n)) = I_F(I(f))(I(t₁),...,I(t_n))
I(f(s₁->v₁ ... s_n->v_n)) = I_SF(I(f))({<s₁,I(v₁)>,...,<s_n,I(v_n)>})
- Here we use {...} to denote a bag of argument/value pairs.
I(o[a₁->v₁ ... a_k->v_k]) = I_frame(I(o))({<I(a₁),I(v₁)>, ..., <I(a_n),I(v_n)>})
- Here {...} denotes a bag of attribute/value pairs.
I(c1##c2) = I_sub(I(c1), I(c2))
I(o#c) = I_isa(I(o), I(c))
I(x=y) = I₌(I(x), I(y))

The effect of signatures. For every signature, sg, supported by the dialect, there is a subset D_sg ⊆ D, called the domain of the signature. Terms that have a given signature, sg, are supposed to be mapped by I to D_sg, and if a term has more than one signature it is supposed to be mapped into the intersection of the corresponding signature domains. To ensure this, the following is required:

If sg < sg' then D_sg⊆D_sg'.
If k is a constant that has signature sg then I_C(k) ∈ D_sg.
If ?v is a variable that has signature sg then I_V(?v) ∈ D_sg.
If sg has an arrow expression of the form (s1 ... sn)⇒s then, for every d∈D_sg, I_F(d) must map D_s1× ... ×D_sn to D_s.
If sg has an arrow expression of the form (p1->s1 ... pn->sn)⇒s then, for every d∈D_sg, I_SF(d) must map the set {<p1,D_s1>, ..., <pn,D_sn>} to D_s.
If the signature -> has an arrow expressions (sg,s₁,r₁)⇒k, ..., (sg,s_n,r_n)⇒k, then, for every d∈D_sg, I_frame(d) must map {<D_s1,D_r1>, ..., <D_sn,D_rn>} to D_k.
If the signature # has an arrow expression (s r)⇒k then I_isa must map D_s×D_r to D_k.
If the signature ## has an arrow expression (s s)⇒k then I_sub must map D_s×D_s to D_k.
If the signature = has an arrow expression (s s)⇒k then I₌ must map D_s×D_s to D_k.

The effect of data types. The data types in DTS impose the following restrictions. If dt is a symbol space identifier of a data type, let LS_dt denote the lexical space of dt, VS_dt denote its value space, and L_dt: LS_dt → VS_dt the lexical-to-value-space mapping. Then the following must hold:

VS_dt ⊆ D; and
For each constant lit^^dt ∈ LS_dt, I_C(lit^^dt) = L_dt(lit).

That is, I_C must map the constants of a data type dt in accordance with L_dt.

RIF-FLD does not impose special requirements to I_C for constants in the lexical spaces that do not correspond to primitive datatypes in DTS. Dialects may have such requirements, however. An example of such a restriction could be a requirement that no constant in a particular symbol space (such as rif:local) can be mapped to VS_dt of a data type dt.

3.5 Interpretation of Formulas

Truth valuation for well-formed formulas in RIF-BLD is determined using the following function, denoted TVal_I:

Constants: TVal_I(k) = I_Truth(I(k)), if k ∈ Const.
Variables: TVal_I(?v) = I_Truth(I(?v)), if ?v ∈ Var.
Positional atomic formulas: TVal_I(r(t₁ ... t_n)) = I_Truth(I(r(t₁ ... t_n)))
Atomic formulas with named arguments: TVal_I(p(s₁->v₁ ... s_k->v_k)) = I_Truth(I(p(s₁-> v₁ ... s_k->v_k))).
Equality: TVal_I(x = y) = I_Truth(I(x = y)).
- To ensure that equality has precisely the expected properties, it is required that I_Truth(I(x = y)) = t if and only if I(x) = I(y) and that I_Truth(I(x = y)) = f otherwise.
Subclass: TVal_I(sc ## cl) = I_Truth(I(sc ## cl)).
- To ensure that the operator ## is transitive, i.e., c1 ## c2 and c2 ## c3 imply c1 ## c3, the following is required: For all c1, c2, c3 ∈ D, glb_t(TVal_I(c1 ## c2), TVal_I(c2 ## c3)) ≤_t TVal_I(c1 ## c3).
Membership: TVal_I(o # cl) = I_Truth(I(o # cl)).
- To ensure that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl implies o # scl, the following is required: For all o, cl, scl ∈ D, glb_t(TVal_I(o # cl), TVal_I(cl ## scl)) ≤_t TVal_I(o # scl).
Frame: TVal_I(o[a₁->v₁ ... a_k->v_k]) = I_Truth(I(o[a₁->v₁ ... a_k->v_k])).
- Since the different attribute/value pairs are supposed to be understood as conjunctions, the following is required:
  - TVal_I(o[a₁->v₁ ... a_k->v_k]) = glb_t(TVal_I(o[a₁->v₁]), ..., TVal_I(o[a_k->v_k]))
Conjunction: TVal_I(And(c₁ ... c_n)) = glb_t(TVal_I(c₁), ..., TVal_I(c_n)).
Disjunction: TVal_I(Or(c₁ ... c_n)) = lub_t(TVal_I(c₁), ..., TVal_I(c_n)).
Negation: TVal_I(neg φ) = ~TVal_I(φ) and TVal_I(naf φ) = ~TVal_I(φ)

where ~ is the idempotent operator of negation on TV introduced in Section Truth Values. Note that both classical and default negation are interpreted the same way in any concrete semantic structure. The difference between the two kinds of negation comes into play when logical entailment is defined.

Quantification: TVal_I(Exists ?v₁ ... ?v_n (φ)) = lub_t(TVal_I*(φ)) and TVal_I(Forall ?v₁ ... ?v_n (φ)) = glb_t(TVal_I*(φ)).

Here lub_t (respectively, glb_t) is taken over all interpretations I* of the form <TV, DTS, D, I_C, I*_V, I_F, I_frame, I_SF, I_sub, I_isa, I_Truth>, which are exactly like I, except that the mapping I*_V, is used instead of I_V. I*_V is defined to coincide with I_V on all variables except, possibly, on ?v₁,...,?v_n.

Rules: TVal_I(head :- body) = t, if TVal_I(head) ≥_t TVal_I(body); TVal_I(head :- body) = f otherwise.

Note that rules and equality formulas are two-valued even if TV has more than two values.

A model of a set R of formulas is a semantic structure I such that TVal_I(φ) = t for every φ∈R.

3.6 Intended Models

The semantics of a set of formulas, R, is the set of its intended semantic structures. RIF-FLD does not specify what these intended structures are, leaving this to RIF dialects. There are different theories of how the intended sets of semantic structures are supposed to look like.

For the classical first-order logic, every semantic structure is intended. For RIF-BLD, which is based on Horn rules, intended semantic structures are defined only for rulesets: an intended semantic structure of a RIF-BLD ruleset R is the unique minimal Herbrand model of R. For the dialects in which rule bodies may contain literals negated with the negation-as-failure connective naf, only some of the minimal Herbrand models of a rule set are intended. Each dialect of RIF is supposed to define the notion of intended semantic structures precisely. The two most common theories of intended semantic structures are the so called well-founded models [GRS91] and stable models [GL88].

The following example illustrates the notion of intended semantic structures. Suppose R consists of a single rule p :- naf q. If naf were interpreted as classical negation, not, then this rule would be simply equivalent to p \/ q, and so it would have two kinds of models: those where p is true and those where q is true. In contrast to first-order logic, most rule-based systems do not consider p and q symmetrically. Instead, they view the rule p :- naf q as a statement that p must be true if it is not possible to establish the truth of q. Since it is, indeed, impossible to establish the truth of q, such theories would derive p even though it does not logically follow from p \/ q. The logic underlying rule-based systems also assumes that only the minimal Herbrand models are intended (minimality here is with respect to the set of true facts). Furthermore, although our example has two minimal Herbrand models -- one where p is true and q is false, and the other where p is false, but q is true, only the first model is considered to be intended.

The above concept of intended models and the corresponding notion of logical entailment with respect to the intended models, defined below, is due to [Shoham87].

3.7 Logical Entailment

We will now define what it means for a set of RIF formulas to entail a RIF formula. We assume that each ruleset has an associated set of intended semantic structures.

Let R be a set of RIF formulas and φ a closed RIF formula. We say that R entails φ, written as R |= φ, if and only if for every intended semantic structure I of R and every ψ ∈ R, it is the case that TVal_I(ψ) ≤ TVal_I(φ).

This general notion of entailment covers both first-order logic and non-monotonic logics that underlie many rule-based languages [Shoham87].

RIF Framework for Logic Dialects

W3C Editor's Draft 22 February 2008

Abstract

Status of this Document

May Be Superseded