Formalizing Web Technology
Daniel
W. Connolly
World Wide Web Consortium
Massachusettes Institute of Technology
Lab for Computer Science
Web World
April 19-21, Santa Clara, CA
$Id: ww9401.html,v 1.2 1995/04/18 18:58:29 connolly Exp $
Formalizing Web Technology
- Proven Value of the Core Technology
- Distributed Hypermedia is an idea whose time has come.
- Forces for Change
- The web represents a significant market force, and resources are
being pooled from many directions to satisfy the needs and desires of
that market.
- Capturing the State of the Art
- Where are we right now?
- Stabilizing Forces
- Deployment of new features does not come without cost.
- Breaking Down HTML
- The HTML 2.0 spec is a good steak in the ground, but it should be broken
into smaller, more modular documents.
- W3C - The Center of Evolution
- The core technology of the web should be in the hands of a "community trust," where
anyone can contribute, and everyone gains.
- Looking ahead
- How will new features affect the technology base? What research and developments
are on the horizon?
There are few novel technologies in the World-Wide Web. It is simply an effective application of
ideas that have been tested and proven:
- Sharing information makes people more effective
The Internet is an excellent basis for a distributed information system
HTTP is a very simple information retrieval protocol. As such, it has provided an
extensible basis for a number of valuable applications.
- HyperText and HyperMedia are an effective way to represent human
knowledge
HTML is a simple structured document representation, capable of representing
many common forms of communications. URLs comprise a simple hierachical
document address space, which can accomodate many of the existing information
systems on the Internet.
- A direct manipulation interface (i.e. "point and click") is easy to use
NCSA Mosaic was an instant, overnight success.
The result: The web is now a vital, global
information system.
- The consumers on the web represent a substantial commercial market, but
the web does not currently support secure, reliable transactions.
- Finding information on the web is difficult, and opportunities for automated
searching have been demonstrated.
- The same information could be delivered for less cost with caching and
replication.
- Until HTML (or some other ubiqutous data format) provides the expressive
capability of contemporary word-processing and desktop publishing
packages, information providers will feel constrained.
How Do We Increase the Quality of Service and
Security, and provide for Resource Discovery?
Maintaining Confidence in the Technology
- People resist change
The technology will be perceived as stable as long as individual sites and users can
choose between staying with their old applications and upgrading to participate in
the new features. If they are forced to change their operation in response to changes
that they did not ask for, they will be upset.
- Mistakes are costly
Once technology is deployed, it never really goes away. Mistakes represent a
documentation, development, and support burden for a long time. It is critical to
experiment and gain experience before wide deployment.
- Consumers demand quality software products
Internet tools have moved from research projects, to user-supported software, and
now to the consumer market. Commercial development takes time: time to learn the
technology and develop products, including testing, support, and documentation.
- Mission critical applications must not be compromised
Otherwise, the vast resources available for development of mission critical
applications will simply be applied somewhere other than the web.
To make a change with confidence, we must be able to assess the scope of the change.
Minimal design, i.e. modularization and information hiding, is necessary to be able to
be able to identify the scope of effect of changes.
- TimBL's original writings
- NCSA documentation: NCSA httpd, CGI, CCI
- HTML+
- HTML interoperability group
- HTML, HTTP, URI working groups
- IIIR, integrated directory services, quality information services, HTTP
security
- HTML Syntax
- how to decide whether a sequence of characters is a valid HTML document, and if so,
how to create a parse tree.
- Interpretation of HTML Idioms
- an informal description of the meaning and suggested rendering of an HTML parse
tree.
- The text/html Internet Media Type
- registration of HTML as a MIME type. Charset issues. Newline Issues. Appendices
specifically addressing SMTP transport and HTTP transport issues. Security issues.
- World-Wide Web User Agents and Applications
- Specific techniques: basic HREF links, ISINDEX, FORMS, ISMAP, .mailcap,
$WWW_HOME, mailto:, proxies, security issues. Suggestions for documentation,
default configuration, etc.
- World-Wide Web Hypermedia Architecture
- formal discussion of the WWW hypertext model: documents, anchors, links,
searching. Formal discussion of common abstractions from ftp, http, gopher, WAIS,
etc. Definition of correct caching/proxy behavior.
- Pooling our resources
The development of the web has been a research/volunteer effort. The current
demand is more than that community can support. In fact, it's more than almost any
one company or organization could shoulder. A consortium allows all the interested
and motivated parties to contribute without taking on the entire burden.
- An Open Market
Various companies will carve out their niche in the vast marketplace of W3 products
and services, but none of these companies has the last word. The core technology will
remain royalty-free, which allows it to spread quickly.
- Open discussion balanced with rapid progress
The Internet Engineering Task Force working groups provide a forum for open
communication and consensus building, and the W3 consortium provides resources to
research and develop the technologies. Consortium members will have early access to
the technology in order to be able to support it when it is publicly released.
- HTML Syntax
- Elements for new features: super/subscript, tables, etc.
- ISO special character entities, and how they show up in the parse tree
- Conformance testing.
- Entity declarations, marked sections.
- Math markup?
- Interpretation of HTML Idioms
- Tables, Figures. Super/subscript.
- DSSSL-Lite.
- Toolbars (next/previous/up).
- Vendor- and application-specific extensions.
- The text/html Internet Media Type
- Character sets
- versions, levels, format negociation issues
- Vendor- and applicatoin-specific extensions.
- World-Wide Web User Agents
-
- File upload
- Embeded presentation
- Mandatory display of copyrights. Display of security information
- Desktop message bus (CCI/OLE/Tooltalk/AppleEvents)
- Distributed editing, annotation, and other forms of collaboration.
- resource discovery technology (e.g. harvest, verity) will have user interface
implications.
- World-Wide Web Hypermedia Architecture
- link relationships
- embedding, compound doucment architecture
- the web as a knowledge base
- isomorphisms with HyTime
- Publishing model (URNs/URCs, copyright, payment, replication, authentication,
access control).
- Common attributes (aka "meta-information") and taxonomies for distributed
searching.
- HTTP
- Security
- Variations on Proxy: no-cache
- Session management, and application-level packets.
- Transactions
- Desktop message-bus, UDP version of the protocol.
(speaker's notes)