The Synopsis File System: From Files to File Objects

                    Mic Bowman   Ranjit John
                     Transarc Corporation
                  The Gulf Tower, 707 Grant St.
                     Pittsburgh, PA 15219
                      {mic@transarc.com}

Wide-area file systems like AFS and DFS enable file sharing between users
in large distributed organizations. The file system does not enforce any
structure on the data in the files leaving that to individual applications.
This lack of structure makes it difficult to locate information and for
applications to share data. Consider the simple problem of adding an
appointment to someone's schedule. The file system makes it possible to
share the file that contains the schedule (assuming you know where it is
located), but it does not contain information about the programs that must
be run to manipulate the information within the file. Ideally, the file
system should provide an encapsulation that helps others find the calendar
and, simultaneously, the operations that act on it.

To solve this problem, we have implemented a new file system, called the
Synopsis file system (SynFS), that enhances the traditional file system
with facilities for storing, locating and manipulating objects. The
Synopsis file system defines a logical uniform interface to files through
an object-based extension to the traditional file system interface. In
addition to the traditional untyped files, SynFS defines an interface to
a typed synopsis.  As the name indicates, a synopsis is an object that
contains a summary of the file. A file system uses static directories to
group similar files. SynFS adds digests to dynamically classify synopses;
a digest is similar to a database view. Path names serve to identify files.
To discover synopses, SynFS adds content-based addressing based on synopsis
properties. Finally, for operational encapsulation, SynFS adds method
invocation as a way of operating on a synopsis.

SynFS leverages technology from the object and World-Wide Web communities
to address two specific problems. First, to address the problem of
adding structure and a well defined interface to data in files, we borrow
the concept of encapsulation and typing from the object community. A
synopsis is an object that contains data and a set of methods that operate
on the data. Every synopsis is typed and that identifies the data and the
methods that can be invoked. Types are defined in a declarative language
similar to ODMG ODL. The SynFS type system is extensible in that new types
can be created after the system is installed. Type definitions are stored
in a globally accessible type repository. These types have methods
implemented in a scripting language (currently we use Tcl) which allows
code to be shipped whenever a type is retrieved.

Second, to address the problem of representing the information content
in a file, we borrowed the uniform presentation language (HTML) from the
Web community.  In a traditional file system, a user can see the raw
data in a file or must run a specific application to view the
information content.  The raw data presents no information about the
information content of the file, but a specific application limits the
scope of interaction.  SynFS uses a special method, 'display' (which is
defined in the base type Synopsis and is inherited by all other types)
to dynamically format the information content of a file using HTML.  For
example, the display method for the C++ type highlights comments, function
and class declarations, and other C++-specific constructs in a C++ source
file.  HTML makes it possible for SynFS to present additional semantic
information about a file without limiting the scope of interaction.

In summary, the Synopsis file system provides a functioning demonstration
of the integration of technology and concepts from the object, web and
database communities. The enhanced interface uses these technologies to
improve the file system's ability to support file location and data
sharing. Further, the global type system enables applications to
collaborate and share data across autonomous organizations.