Semantics of a RIF Dialect as a Specialization of RIF-FLD
The RIF-FLD semantic framework defines the notions of semantic structures and of models of RIF formulas. The semantics of a dialect is derived from these notions by specializing the following parameters.
The effect of the syntax.
- The syntax of a dialect may limit the kinds of terms that are supported. For instance, if the dialect does not support frames or terms with named arguments then the parts of the semantic structures whose purpose is to interpret the unsupported types of terms become redundant.
Truth values.
The RIF-FLD semantic framework allows formulas to have truth values from an arbitrary partially ordered set of truth values, TV. A concrete dialect must select a concrete partially or totally ordered set of truth values.
Data types.
- A data type is a symbol space that has a fixed interpretation in any semantic structure. RIF-FLD defines a set of core data types that each dialect is expected to support, but its semantics does not limit support to just the core types. RIF dialects can introduce additional data types, and each dialect is expected to define the exact set of data types that it supports.
Logical entailment.
Logical entailment in RIF-FLD is defined with respect to an unspecified set of intended models. A RIF dialect must define which models are considered to be intended. For instance, one dialect might specify that all models are intended (which leads to classical first-order entailment), another may consider only the minimal models as intended, while a third one might only use so-called well-founded or stable models.
All of the above notions are defined in the remainder of this document.
Truth Values
Each RIF dialect is expected to define the set of truth values, denoted by TV. This set must have a partial order, called the truth order, denoted <t. As a special case, <t can be a total order in some dialects. We write a ≤t b if either a <t b or a and b are the same element of TV. In addition,
TV must be a complete lattice with respect to <t, i.e., the least upper bound (lubt) and the greatest lower bound (glbt) must exist for any subset of TV.
TV is required to have two distinguished elements, f and t, such that f ≤t elt and elt ≤t t for every elt∈TV.
TV has an operator of negation, ~: TV → TV, such that
~ is idempotent, i.e., applying ~ twice gives the identity mapping.
~t = f (and thus ~f = t).
RIF dialects can have additional truth values. For instance, the semantics of some versions of NAF, such as the well-founded negation, requires three truth values: t, f, and u (undefined), where f <t u <t t. Handling of contradictions and uncertainty usually requires at least four truth values: t, u, f, and i (inconsistent). In this case, the truth order is partial: f <t u <t t and f <t i <t t.
Primitive Data Types
A primitive data type (or just a data type, for short) is a symbol space that has
an associated set, called the value space, and
a mapping from the lexical space of the symbol space to the value space, called lexical-to-value-space mapping.
Semantic structures are always defined with respect to a particular set of data types, denoted by DTS. In a concrete dialect, DTS always includes the data types supported by that dialect. All RIF dialects are expected to support the following primitive data types:
xsd:long
xsd:integer
xsd:decimal
xsd:string
xsd:time
xsd:dateTime
rdf:XMLLiteral
rif:text
Their value spaces and the lexical-to-value-space mappings are defined as follows:
For the XML Schema data types of RIF, namely xsd:long, xsd:integer, xsd:decimal, xsd:string, xsd:time, and xsd:dateTime, the value spaces and the lexical-to-value-space mappings are defined in the XML Schema specification [XML-SCHEMA2].
The value space for the primitive data type rdf:XMLLiteral is defined in RDF [RDF-CONCEPTS].
The value space of rif:text is the set of all pairs of the form (string, lang), where string is a Unicode character sequence and lang is a lowercase Unicode character sequence which is a natural language identifier as defined by RFC 3066 [RFC-3066]. The lexical-to-value-space mapping of rif:text, denoted Lrif:text, maps each symbol string@lang in the lexical space of rif:text to (string, lower-case(lang)), where lower-case(lang) is lang written in all-lowercase letters.
The value space and the lexical-to-value-space mapping for rif:text defined here are compatible with RDF's semantics for strings with named tags [RDF-SEMANTICS].
Although the lexical and the value spaces might sometimes look similar, one should not confuse them. Lexical spaces define the syntax of the constant symbols in the RIF language that belong to the various primitive data types. In contrast, value spaces define the meaning of those constants. The lexical and the value spaces are often not even isomorphic. For instance, 1.2^^xsd:decimal and 1.20^^xsd:decimal are two legal -- and distinct -- constants in RIF because 1.2 and 1.20 belong to the lexical space of xsd:decimal. However, these two constants are interpreted by the same element of the value space of the xsd:decimal type. Therefore, 1.2^^xsd:decimal = 1.20^^xsd:decimal is a RIF tautology. Likewise, RIF semantics for data types implies certain inequalities. For instance, abc^^xsd:string ≠ abcd^^xsd:string is a tautology, since the lexical-to-value-space mapping of the xsd:string type maps these two constants into distinct elements in the value space of xsd:string.
Semantic Structures
The central step in specifying a model-theoretic semantics for a logic-based language is defining the notion of a semantic structure, also known as an interpretation. Semantic structures are used to assign truth values to RIF-FLD formulas.
A semantic structure, I, is a tuple of the form <TV, DTS, D, IC, IV, IF, Iframe, ISF, Isub, Iisa, I=, ITruth>. Here D is a non-empty set of elements called the domain of I. We will continue to use Const to refer to the set of all constant symbols and Var to refer to the set of all variable symbols. TV denotes the set of truth values that the semantic structure uses and DTS is the set of primitive data types used in I.
The other components of I are total mappings defined as follows:
IC maps Const to elements of D.
- This mapping interprets constant symbols.
IV maps Var to elements of D.
- This mapping interprets variable symbols.
IF maps D to functions D* → D (here D* is a set of all sequences of any finite length over the domain D)
- This mapping interprets positional terms.
ISF interprets terms with named arguments. It is a total mapping from Const to the set of total functions of the form SetOfFiniteBags(ArgNames × D) → D. This is analogous to the interpretation of positional terms with two differences:
Each pair <s,v> ∈ ArgNames × D represents an argument/value pair instead of just a value in the case of a positional term.
- The argument to a term with named arguments is a finite set of argument/value pairs rather than a finite ordered sequence of simple elements.
Bags are used here because the order of the argument/value pairs in a term with named arguments is immaterial and the pairs may repeat. For instance, p(a->b a->c).
Iframe is a total mapping from D to total functions of the form SetOfFiniteBags(D × D) → D.
This mapping interprets frame terms. An argument, d ∈ D, to Iframe represent an object and a finite bag {<a1,v1>, ..., <ak,vk>} represents a bag (multiset) of attribute-value pairs for d. We will see shortly how Iframe is used to determine the truth valuation of frame terms.
Bags are used here because the order of the attribute/value pairs in a frame is immaterial and the pairs may repeat. For instance, o[a->b a->c] means that the value of the attribute a on the object o is a set that contains b and c.
Isub gives meaning to the subclass relationship. It is a total function D × D → D.
The operator ## is required to be transitive, i.e., c1 ## c2 and c2 ## c3 must imply c1 ## c3. This is ensured by a restriction in Section Interpretation of Formulas.
Iisa gives meaning to class membership. It is a total function D × D → D.
The relationships # and ## are required to have the usual property that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl must imply o # scl. This is ensured by a restriction in Section Interpretation of Formulas.
I= gives meaning to the equality. It is a total function D × D → D.
ITruth is a total mapping D → TV.
- It is used to define truth valuation of formulas.
We also define the following mapping I :
I(k) = IC(k), if k is a symbol in Const
I(?v) = IV(?v), if ?v is a variable in Var
I(f(t1 ... tn)) = IF(I(f))(I(t1),...,I(tn))
I(f(s1->v1 ... sn->vn)) = ISF(I(f))({<s1,I(v1)>,...,<sn,I(vn)>})
- Here we use {...} to denote a bag of argument/value pairs.
I(o[a1->v1 ... ak->vk]) = Iframe(I(o))({<I(a1),I(v1)>, ..., <I(an),I(vn)>})
- Here {...} denotes a bag of attribute/value pairs.
I(c1##c2) = Isub(I(c1), I(c2))
I(o#c) = Iisa(I(o), I(c))
I(x=y) = I=(I(x), I(y))
The effect of signatures. For every signature, sg, supported by the dialect, there is a subset Dsg ⊆ D, called the domain of the signature. Terms that have a given signature, sg, are supposed to be mapped by I to Dsg, and if a term has more than one signature it is supposed to be mapped into the intersection of the corresponding signature domains. To ensure this, the following is required:
If sg < sg' then Dsg⊆Dsg'.
If k is a constant that has signature sg then IC(k) ∈ Dsg.
If ?v is a variable that has signature sg then IV(?v) ∈ Dsg.
If sg has an arrow expression of the form (s1 ... sn)⇒s then, for every d∈Dsg, IF(d) must map Ds1× ... ×Dsn to Ds.
If sg has an arrow expression of the form (p1->s1 ... pn->sn)⇒s then, for every d∈Dsg, ISF(d) must map the set {<p1,Ds1>, ..., <pn,Dsn>} to Ds.
If the signature -> has an arrow expressions (sg,s1,r1)⇒k, ..., (sg,sn,rn)⇒k, then, for every d∈Dsg, Iframe(d) must map {<Ds1,Dr1>, ..., <Dsn,Drn>} to Dk.
If the signature # has an arrow expression (s r)⇒k then Iisa must map Ds×Dr to Dk.
If the signature ## has an arrow expression (s s)⇒k then Isub must map Ds×Ds to Dk.
If the signature = has an arrow expression (s s)⇒k then I= must map Ds×Ds to Dk.
The effect of data types. The data types in DTS impose the following restrictions. If dt is a symbol space identifier of a data type, let LSdt denote the lexical space of dt, VSdt denote its value space, and Ldt: LSdt → VSdt the lexical-to-value-space mapping. Then the following must hold:
VSdt ⊆ D; and
For each constant lit^^dt ∈ LSdt, IC(lit^^dt) = Ldt(lit).
That is, IC must map the constants of a data type dt in accordance with Ldt.
RIF-FLD does not impose special requirements to IC for constants in the lexical spaces that do not correspond to primitive datatypes in DTS. Dialects may have such requirements, however. An example of such a restriction could be a requirement that no constant in a particular symbol space (such as rif:local) can be mapped to VSdt of a data type dt.
Interpretation of Formulas
Truth valuation for well-formed formulas in RIF-BLD is determined using the following function, denoted TValI:
Constants: TValI(k) = ITruth(I(k)), if k ∈ Const.
Variables: TValI(?v) = ITruth(I(?v)), if ?v ∈ Var.
Positional atomic formulas: TValI(r(t1 ... tn)) = ITruth(I(r(t1 ... tn)))
Atomic formulas with named arguments: TValI(p(s1->v1 ... sk->vk)) = ITruth(I(p(s1-> v1 ... sk->vk))).
Equality: TValI(x = y) = ITruth(I(x = y)).
To ensure that equality has precisely the expected properties, it is required that ITruth(I(x = y)) = t if and only if I(x) = I(y) and that ITruth(I(x = y)) = f otherwise.
Subclass: TValI(sc ## cl) = ITruth(I(sc ## cl)).
To ensure that the operator ## is transitive, i.e., c1 ## c2 and c2 ## c3 imply c1 ## c3, the following is required: For all c1, c2, c3 ∈ D, glbt(TValI(c1 ## c2), TValI(c2 ## c3)) ≤t TValI(c1 ## c3).
Membership: TValI(o # cl) = ITruth(I(o # cl)).
To ensure that all members of a subclass are also members of the superclass, i.e., o # cl and cl ## scl implies o # scl, the following is required: For all o, cl, scl ∈ D, glbt(TValI(o # cl), TValI(cl ## scl)) ≤t TValI(o # scl).
Frame: TValI(o[a1->v1 ... ak->vk]) = ITruth(I(o[a1->v1 ... ak->vk])).
- Since the different attribute/value pairs are supposed to be understood as conjunctions, the following is required:
TValI(o[a1->v1 ... ak->vk]) = glbt(TValI(o[a1->v1]), ..., TValI(o[ak->vk]))
- Since the different attribute/value pairs are supposed to be understood as conjunctions, the following is required:
Conjunction: TValI(And(c1 ... cn)) = glbt(TValI(c1), ..., TValI(cn)).
Disjunction: TValI(Or(c1 ... cn)) = lubt(TValI(c1), ..., TValI(cn)).
Negation: TValI(neg φ) = ~TValI(φ) and TValI(naf φ) = ~TValI(φ)
where ~ is the idempotent operator of negation on TV introduced in Section Truth Values. Note that both classical and default negation are interpreted the same way in any concrete semantic structure. The difference between the two kinds of negation comes into play when logical entailment is defined.
Quantification: TValI(Exists ?v1 ... ?vn (φ)) = lubt(TValI*(φ)) and TValI(Forall ?v1 ... ?vn (φ)) = glbt(TValI*(φ)).
Here lubt (respectively, glbt) is taken over all interpretations I* of the form <TV, DTS, D, IC, I*V, IF, Iframe, ISF, Isub, Iisa, ITruth>, which are exactly like I, except that the mapping I*V, is used instead of IV. I*V is defined to coincide with IV on all variables except, possibly, on ?v1,...,?vn.
Rules: TValI(head :- body) = t, if TValI(head) ≥t TValI(body); TValI(head :- body) = f otherwise.
Note that rules and equality formulas are two-valued even if TV has more than two values.
A model of a set R of formulas is a semantic structure I such that TValI(φ) = t for every φ∈R.
Intended Models
The semantics of a set of formulas, R, is the set of its intended semantic structures. RIF-FLD does not specify what these intended structures are, leaving this to RIF dialects. There are different theories of how the intended sets of semantic structures are supposed to look like.
For the classical first-order logic, every semantic structure is intended. For RIF-BLD, which is based on Horn rules, intended semantic structures are defined only for rulesets: an intended semantic structure of a RIF-BLD ruleset R is the unique minimal Herbrand model of R. For the dialects in which rule bodies may contain literals negated with the negation-as-failure connective naf, only some of the minimal Herbrand models of a rule set are intended. Each dialect of RIF is supposed to define the notion of intended semantic structures precisely. The two most common theories of intended semantic structures are the so called well-founded models [GRS91] and stable models [GL88].
The following example illustrates the notion of intended semantic structures. Suppose R consists of a single rule p :- naf q. If naf were interpreted as classical negation, not, then this rule would be simply equivalent to p \/ q, and so it would have two kinds of models: those where p is true and those where q is true. In contrast to first-order logic, most rule-based systems do not consider p and q symmetrically. Instead, they view the rule p :- naf q as a statement that p must be true if it is not possible to establish the truth of q. Since it is, indeed, impossible to establish the truth of q, such theories would derive p even though it does not logically follow from p \/ q. The logic underlying rule-based systems also assumes that only the minimal Herbrand models are intended (minimality here is with respect to the set of true facts). Furthermore, although our example has two minimal Herbrand models -- one where p is true and q is false, and the other where p is false, but q is true, only the first model is considered to be intended.
The above concept of intended models and the corresponding notion of logical entailment with respect to the intended models, defined below, is due to [Shoham87].
Logical Entailment
We will now define what it means for a set of RIF formulas to entail a RIF formula. We assume that each ruleset has an associated set of intended semantic structures.
Let R be a set of RIF formulas and φ a closed RIF formula. We say that R entails φ, written as R |= φ, if and only if for every intended semantic structure I of R and every ψ ∈ R, it is the case that TValI(ψ) ≤ TValI(φ).
This general notion of entailment covers both first-order logic and non-monotonic logics that underlie many rule-based languages [Shoham87].