Scalable, efficient access to popular resources requires widespread replication, or mirroring, of these resources. With current mirroring schemes, a different name (i.e., URL) is given to each copy of a replicated file. Web crawlers must access all the mirrored copies and deduce which ones are duplicates. A user who accesses a mirrored copy, perhaps after being given a list of alternative mirror sites by an overloaded server, has no way of verifying that the retrieved mirror copy is identical to the original. Thus, there is a need for a single location-independent name for all copies of a file, so that metadata can be attached to this name rather than to the individual copies. This metadata should include a digitally signed file fingerprint so that a user can verify the integrity of a retrieved file copy. There is also a need for users to be able to verify the authenticity and integrity of metadata that comes from different sources.
The Resource Cataloging and Distribution System (RCDS) under development at the University of Tennessee is addressing the above needs. The system components include catalog servers, location servers, and file servers. Resource providers assign location-independent names to resources and submit metadata to an RCDS catalog server. An authorized file server that mirrors a copy of a file registers its name-to-location binding with an RCDS location server. An RCDS catalog server provides a centralized location from which Web crawlers can gather metadata. For clients such as Web browsers, an RCDS catalog server resolves a name to associated metadata, which may includes names for individual files. An RCDS location server resolves a name to a list of locations. The RCDS catalog server design provides for attaching a digitally signature to an assertion or to a set of assertions, where an assertion consists of an attribute-value pair.
Ideally, RCDS should use a standard format for assertion metadata, so that it presents a standard interface to clients such as Web browsers and Web crawlers, but no suitable standard currently exists. Text representations of metadata are problematic because of changes introduced by editing and other processing that invalidate a digital signature over the byte contents. The Harvest SOIF format is in practice a text-based format, although it allows arbitrary content for the value of an attribute. A digital signature could conceivably be attached to an entire SOIF record, if the record could be guaranteed not to change during transfer and processing, although this would not allow for selective signing of subsets of assertions. To be suitable for use with RCDS, SOIF would also need to allow a URN for the identifer of an object, in addition to a URL.