TimBL
Web of Indexes
In WWW , an
index is a document
like any other. An index may be built to cover a certain domain
of information. For example, at CERN there is a CERN computer center document
index . There is a separate functional telephone book
index . Indexes may be built by the original information
provider, or by a third party as a value-added service.
Indexes may point to other indexes. An index search on one
index may turn up another index in the result hit list. In
this case, the following algorithm seems appropriate.
Index context
Most index searches nowadays, though some look like
intelligent semantically aware searches, are basically
associative keyword searches. That is, a document matches a
search if there is a large correlation (with or without boolean
operations) between the set of words it or its abstract
contains and the set of words specified in the search. Let us
consider extending these searches to linked indexes.
Each index has a certain context. This may be represented by
a set of keywords which may be considered to apply implicitly
to everything indexed. For example, in the CERN computer
center documentation index, one may imagine that everything
in it will be considered as pertaining to the CERN computer
center. We might represent the context by the keyword list
"CERN computer center documentation physics support".
Context narrowing
Suppose we search a general physics index with the
keywords "CERN NEWSLETTER". That index may contain an entry
with keyword "CERN" pointing to the CERN index. Therefore, a
search on the first index will turn up the CERN index. We
should then search the CERN index, but looking only for the
keyword "NEWSLETTER". The keyword "CERN" is discarded, as it is
assumed by the new context. In this simple model, we can assume
that the contextwords could be used directly as the keywords
for the index itself.
A simple algorithm, then, would be for the server to discard
from a search list any keywords matching the index's context
-- but is this really what we want to do? Perhaps those
keywords have a more refined meaning within the context. For
example, if I am looking for documents about document storage
schemes at CERN, I might search the index with the keyword
"documents". I don't want this to be discarded because it is
in the context: I am looking for documents about documents.
It is understood that we are already within the context of
computer center documentation, so to ask about documentation
in this context implies more than that I am looking for a
document.
A more refined approach would therefore be to strip from the
search those keywords which were used in order to find the
index. The keyword list for the entry of one index within
anotherthen reflects the change in context.
Context Broadening
We have discussed here only a narrowing of context, not a
broadening. One can imagine also a reference to a broader
context index. In this case, perhaps one should add to the
search some keywords which come from the original context but
were not expressed. This would be dangerous, and people would
not like it as they often feel that they are expressing their
request in absolute terms even when they are not. Also, they
may have been trying to escape from too restricing a context.
One should also consider a search which traces hypertext links as well as
using indexes.
See also: Navigational
techniques , Hypertext and IR
,
_________________________________________________________________
Tim BL