Style Guide for online hypertext

This guide is designed to help you create a WWW hypertext database that effectively communicates your knowledge to the reader. It has been prepared in the light of comments by readers, and many demands by providers of online documentation. Some of the points made may be influenced by personal preference, and some may be common sense, but a collection of points has been demanded, and so here it is.

The guide is designed to be read sequentially, but feel free to depart from this. The sections are as follows:

Introduction
Etiquette for a server administrators
Overall structure of a work
Within each document
Test your document
Ramblings: Related but more random thoughts. Under construction.
Background reading
Reader comments

The above lists all the parts of this guide except for individual reader comments.

To print this document

A single long page with all of them excluding reader comments is available for printing (but has dysfunctional links and is not in correct html).

This document is open to comment!

Suggestions are strongly invited, if you think of anything mail it to timbl@w3.org, mentioning the Style Guide for Online Hypertext or its URL. I'm also interested in the URLs of other style guides, corporate house style guides, or your favorite book on style (hypertext or otherwise).

Introduction

You are going to write (or generate ) some online hypertext. Because hypertext is potentially unconstrained you are a little daunted. Do not be. You can write a document as simply as you like. In many ways, the simpler the better.

You will be writing a number of separate files. These files will be linked to each other, and to external documents, to make your final work.

You may think of your work as a "document", and if it were on paper, then you would call it that. In the online case though, we tend to refer to each individual file as a document. A document may correspond, in the book analogy, to a section or a subsection, or even a footnote. In this guide, we'll refer to the whole collection as a work.

The document is the unit by which information is picked up. At any one time, a document is completely loaded into the reader's computer. It is also normally the amount you edit at any one time, though with a good editor you will probably have a number of documents open at a time.

This guide has a bit on etiquette for each server , which mainly applies to the server administrator: the rest applies to anyone putting information onto any server. The section on structure discusses how you organize your material into documents. Another section discusses how to organize your material within a document .

Tim BL

Structure

If you have in mind a body of information to put across to your reader, you probably have a mental organization for it. Normally this is a sort of hierarchical tree, like the chapters of a book if you were to write a book.

Keep this structure. It helps readers to have a tree structure as a basis for the book: it gives them a feeling of knowing where they are. You can also use this structure for organizing your files in directories.

You should also bear in mind:

The reader's structure .

Remember always the audience for whom you are writing. If they are novices in the subject, it will normally help if you are firm about the structure of your work, so that they can learn the structure of the knowledge itself. For example, if you feel that the subject falls into three distinct areas, then that is an important thing to teach.

If, however, your readers will already have some knowledge in the subject, then they will already have formed their own structure for it. In this case they will consciously or subconsciously know where they expect to find things. If your structure is different from theirs, enforcing it too strongly will confuse them and put them off.

You may in this case have resist a strong tendency to put across your own structure strongly and to the detriment of all others. There are two solutions.

If you have a single well-defined audience in mind, who will share a similar world view, then try to write exactly for that world view rather than yours.

If you are simultaneously writing for more than one group, then you must provide for both.

When you make a reference, qualify it with a clue to allow some people to skip it. For example, "If you really want to know how it works inside, see the Internals guide", or "A step-by-step introduction is in the tutorial".

Provide links for both reader's views. Your work will be more connected than a simple tree, but with proper qualification, no one should get lost.

Provide two separate tree "roots". For example, you can write a step-by-step tutorial and a functionally direct reference tree for the same data. Both will at the lowest level have the same data, but while the first will deal with the simple things first, the second may be functionally grouped. This is just like having several indexes to a book. The tutorial might also include information which the reference work does not.

Overlapping Trees

Here is an example of a work (describing some programming functions, say) with two separate structures:

			Tutorial			Reference
			   |				    |
		  Let's do it together		       -----------------
		from simple to difficult	      |			|
			    |			by Functional      Alphabetical
			    |			    group	     by name
		  Task oriented examples	      |			|
			    |			       -----------------
			    |				    |
		  Examples of use of		   Syntax definition for
		  specific functions   <-------->    specific functions

The novice user starts at the top left, and works his way down. Where he needs specific details, he will get down to the examples and from them a link to the underlying definitive descriptions of each. As far as he is concerned, he is reading a tree-structured work. In fact, he is reading the same information as the expert who, coming in to check on one particular function, then looks up an example of its use.

How big to make each document

The most important point here is that a document should put across a well-defined concept. It is not generally worth splitting one idea arbitrarily into two bits in order to make the bits smaller. Nor is it a good idea to put together ideas which area really separate just to make a bigger document.

A document can be as small as a footnote .

There are two upper limits on a document's size. One is that long documents will take longer to transfer , and so a reader will not be able to simply jump to it and back as fast as he or she can think. This depends a lot on the link speed of course.

The other limit is the difficulty for a reader to scroll through large documents. Readers with character based terminals don't general read more than a few screens. They often only absorb what is on the first screen, as if that is not interesting they won't be bothered to scroll down. Readers are also put off by being left at the top of a large document.

Readers with graphic interfaces generally scroll through long documents with a scroll bar. When the scroll bar is moved a small amount, the document should move a sufficiently small amount so that some of the original window-full is still left in the window. This allows the reader to scan the document. If the document is any bigger, then it is basically unreadable, in that any movement of the scroll bar will loses the place and leaves the reader disoriented.

Advantages with longer documents are that it is easier for readers with scroll bars to read through in an uninterrupted flow, if that is how the document is written.

Also, one doesn't have to go to the trouble of making (or generating) so many links and keeping them up to date if things are altered. If making the links is a problem, just settle for one link to a contents page. Some browsers have "next" and "previous" buttons to allow a document to be browsed serially according to a list.

(In fact, one can normally scroll up and down explicitly page by page, but this is gives the same feeling as the terminal interface.)

A rough guide, then, for the size of a document is:

For online help, menus giving access to other things: small enough to fit on 24 lines. Check this by using a terminal browser.
For textual documents, of the order of half a letter-sized (A4) page to 5 pages.

Refer or copy?

When you are setting up an information system which refer to information which is available elsewhere, be very careful before taking a copy.

Here are some reasons for leaving it where it is:

When it is updated, you will either have to have a way of finding out, and make a fresh copy, or you will end up with an out of date copy.
If you feel that your copy will be easier to access, remember that this is relative. You will have readers from other places who may find the original is closer. If the original has a serious access problem, you could find another server (maybe offer your own) as the definitive storage point.

Here are reasons for copying it:

If the information is transient, like a news article, you will have to take a copy.
You may want to refer to a particular version of something which will will be changed and not saved in an archive.

You should be very wary before referring to your own private collections of the following, of which plenty of established collections exist:

Internet FAQ *Frequently Asked Questions) lists
RFCs (Request For Comments -- Internet standards etc)
Information by subject
The "best" URLs on the web. Make a personalized list of pointers to things of specific interest. It will be more valuable!

Within each document

This section of the style guide deals with the layout of text within a "document", the unit of retrieval of information on the web.

To be completed.

You should try to:

Sign It!

An important aspect of information which helps keep it up to date is that one can trace its author. Doing this with hypertext is easy -- all you have to do is put a link to a page about the author (or simply to the author's phone book entry).

Make a page for yourself with your mail address and phone number. At the bottom of files for which you are responsible, put a small note -- say just your initials -- and link it to that page. The address style (typically right justified) is useful for this.

Your author page is also a convenient place to put and disclaimers, copyright notices, etc which law or convention require. It saves cluttering up the messages themselves with a long signature.

(If you are using the WorldWideWeb.app hypertext editor, then you can put this link from your default blank page so that it turns up on the bottom of each new document automatically)

The status of your information

Some information is definitive, some is hastily put together and incomplete. Both are useful to readers, so do not be shy to put information up which is incomplete or out of date -- it may be the best there is. However, do remember to state what the status is. When was it last updated? Is it complete? What is its scope? For a phone book for example, what set of people are in it?

Not every document needs a status declaration, if there is something in the overview page of the work which covers it.

You can of course also give a feel for the status of the text by its language ... bad spelling, missing capitals, and relaxed grammar all indicate informal notes. Careful use of verbs such as "shall" and "should", and the introduction of Long Capitalized Noun Phrases (LCNPs) will give at least the impression of an ISO standard. ;-)

Date it

In some cases it can be useful to put creation dates and last modified dates on your work. (Note that this is the sort of thing which one could make a server do automatically with a little programming).

Figure out whether putting one might later save the reader from following out of date information.

Linking to context

A major difference between writing part of a serial text, and an online document, is that your readers may have jumped in from anywhere. Even though you have only made links to it from one place, any other person may want to refer to that particular point, and will so make a link to that particular part of your work from their own. So you can't rely on your reader having followed your path through your work.

Of course if you are writing a tutorial, it will be important to keep the flow from one document to the next in the order you intended for its primary audience. You may not wish to cater specially for those who jump in out of the blue, but it is wise to leave them with enough clues so as not to be hopelessly lost. Some ways of doing this are:

Watch that your text and vocabulary stands by itself. Starting a document with "The next thing we we consider is..." or "The only solution to this problem is..." will certainly confuse.
Sometimes the opening words refer to the context, and can be linked to background information. For example, in the WWW project documentation, the first occurrence of the acronym WWW is often linked back to the central project document.
The navigation hints at the top or bottom of the document can give explicit pointers. Examples are at the bottom of this document.

It can also be useful to imagine as you are writing that you yourself may wish to reuse the document. some day.

Navigational Icons

Icons make great navigational hints. It is very effective to have the same consistent icon throughout the work, always (except on the "top" page) linked back to the top page. This kills two birds with one stone: it gives consistency to the work, so readers know when they are in it and when they are outside it, and it also gives them a quick way of getting back to the top of it.

You can do the same thing with sections, so that at the top (or bottom) of each page you might have a small string of icons, the first to go back to the top of the work, the second to go back to the chapter, the third to go back to the section within the chapter, for example.

[This style guide was for a long time empty of icons because I was editing it with the old hypertext editor which doesn't handle images. I may fix that with time -tbl]

TITLE

The title of a document is specified by the TITLE element. The TITLE element should occur in the HEAD of the document.

There may only be one title in any document. It should identify the content of the document in a fairly wide context.

The title is not part of the text of the document, but is a property of the whole document. It may not contain anchors, paragraph marks, or highlighting. The title may be used to identify the node in a history list, to label the window displaying the node, etc. It is not normally displayed in the text of a document itself. Contrast titles with headings . The title should ideally be less than 64 characters in length. That is, many applications will display document titles in window titles, menus, etc where there is only limited room. Whilst there is no limit on the length of a title (as it may be automatically generated from other data), information providers are warned that it may be truncated if long.

Examples of use

Appropriate titles might be

		<TITLE>Rivest and Neuman. 1989(b)</TITLE>

		<TITLE>A Recipe for Maple Syrup Flap-Jack</TITLE>

		<TITLE>Introduction -- AFS user's Guide</TITLE>

Examples of inappropriate titles are those which are only meaningful within context,

		<TITLE>Introduction</TITLE>

or too long,

	<TITLE>Remarks on the Quantum-Gravity effects of "Bean
	Pole" diversification in Mononucleosis patients in Developing
	Countries under Economic Conditions Prevalent during
	the Second half of the Twentieth Century, and Related Papers:
	a Summary</TITLE>

Device Independence

The hypertext you write is stored in HTML language, which does not contain information about the fonts and paragraph shapes and spacing which should be used for displaying the document.

This gives great advantages in that your document will be rendered successfully on whatever platform it is viewed, including a plain text terminal.

You should be aware that different clients do use different spacing and fonts. You should be careful to use the structuring elements such as headers and lists in the way in which they were intended. If you don't like the rendering on your particular client, don't try to fix it by using inappropriate elements, or trying for example to force extra spacing with empty elements. This may well end up being interpreted differently by other clients and looking very strange. You can in many cases configure the client displays each element.

For example:

Always use heading levels in order, with one heading level 1 at the top of the document, and if necessary several level 2 headings, and then if necessary several level 3 headings under each level 2 heading. If you don't like the way heading level 2 is formatted, fix it on your client, don't just skip to heading level 3.
Don't put extra spaces or blank lines into your text to pad it out, except in pre-formatted (PRE) sections.
Don't refer in your text to facets of particular browsers. Asking someone to "click here" won't make sense without a mouse, just as asking someone to "select a link by number" will betray the fact that you were using the line mode browser. Just leave a link. The instructions get boring as the user will normally know how to select a link.

Printable hypertext

In an ideal world, paper might not be necessary. In a next to ideal world, one would have enough time to write a hypertext version of a document and also to write a completely separate paper version. However, the real world, you will probably want to generate any printed documents and online documents from the same file.

Suppose the HTML files will be the master, and you will generate the printable from this, by making one long document, and possibly printing it via translation into TeX, or some word processor format, for example. You might not initially, but you might want to one day.

Try to avoid references in the text to online aspects. "See the section on device independence " is better than "For more on device independence, click here .". In fact we are talking about a form of device independence .

Unfortunately, the recommended practices of signing each document and giving navigational links tend to mess up the printable copy, though one can of course develop ways of stripping them out if they follow a common format.

For example, the most common comment about this document was that it is difficult to print. I therefore made a single page version of the whole thing with a few scripts, and put a pointer to it from the cover page. But then people still ask, not having read the cover page. (The scripts were just bits of "sed", which I am not supporting. I have put rules in at the top and bottom of each page and the scripts use these to chop off bits which are not needed in the printed copy.)

Make your (hyper)text readable

This is just a little rant about two style issues in hypertext that I'm seeing more of and don't like much.

The first is the _here_ syndrome, e.g.:

	Information about Blah Blah Blah is available by clicking _here_.

where the word _here_ is the link. This style is really awkward; when you click on 'here', you have to look around to make sure it is the *right* here. Let me urge you, when you construct your HTML page, to make sure that the thing-you-click is actually some kind of title for what it is when you click there. E.g. say

	Information about _Blah Blah Blah_ is now available.

And use:

	Information on _how to do searches_ is available.

instead of

	For information on how to do searches, choose _this link_

.Not quite as bad, but still awkward is where someone will use a topic word as a link, but it still talks about the links:

   	Here are links to a _CREDITS_ page and _technical details_ ...

Instead, try to write something like

   	Many thanks go to _various people_ for their contributions.

_Technical details_ of this system are available now.I.e., make your HTML page such that you can read it even if you don't follow any links.

Avoid talking about mechanics

Announcements of internet services have typically been followed by pages for information about how to use FTP, mail servers, etc, to get at the information. The WWW is designed to make all this unnecessary.

The temptation is to strip out these instructions and leave a link like:

	There is now WWW accessto our large FTP archive which was
	previously only available by FTP, NFS and mail.  This
	collection includes much public domain software and text
	whose copyright has expired.

The web is read by people who don't need or, often, want to know about FTP and NFS - or even WWW! So the following is better:

	Our archive includes much public domain software and text
	whose copyright has expired.

Keeping on the subject of discourse rather than the mechanisms and protocols keeps the text shorter, which means people are more likely to read it.

Even when you are working within the web metaphor, use links, don't talk about them. For example

	You can read more about this in the tutorial which is
	linked to the home page

obviously would be be better as

		The tutorial has more about this.

Another common one is

	The tutorial contains sections on mowing, sharpening the mower,
	and buying a mower.

Give the reader a break, and let him or her jump straight there!

	The tutorial contains sections on mowing, sharpening the mower,
	and buying a mower.

Test your document

In a way your hypertext is like a book, which you should have proof-read. In a way, it is like a program which you should have tested. At least get someone from the target group for which you wrote the document to read it and give you some feedback. Other ideas are:

Read the document several different client programs, to ensure that you have formatted it in a device independent way.
Monitor the readership of your document. You can do this by analysing the server log files . You may find that some parts are not being read, perhaps because people are looking in the wrong place for them. You may see that people often follow a path and backtrack. If you can guess what they were looking for, you can make the clues around the link more helpful. (Remember to keep log information confidential until you have removed user information from it.)
Make it clear whether your will accept criticism or suggestions from your readers, and how they should send it.
Ask people to solve problems using the document, and report on their success. If they fail, find out what they were looking for, whether it was in the document at all,

How much testing?

Testing takes time. The decision of how much testing you do is based on the quality of the document you wish to provide. You are balancing your reader's time and effort against yours. If your document is "selling" an idea, or if you are selling the document or providing a service, you will want to make it as easy as possible for the reader. If many people will read your work, a little of your time will save a lot of theirs.

If however you are documenting some obscure part of a system in which no one other than yourself is likely to be interested, or if you feel that your readers are lucky to have anything available at all, there is no point wasting time testing it. In the event of someone needing the information, they might have to go to some extra trouble to follow several links to find what they want, and then to understand what you have written. This may be the most efficient way of working. I emphasize this because there is very much information which is for a fleeting moment in people's minds, or is hastily scribbled down on some file, and which may be important to posterity. It is better for this information to be available even in unpolished form than for it to be hidden out of embarrassment for its form. Before electronic technology, the effort of publishing was such that this information was never seen, and it was a waste, and and considered an insult to one's readers, to publish something which was not of high quality. Nowadays, there is "publishing" at all levels, and both high quality and hasty documents have their value. It is important, though, to make it clear what the quality of a document is when making a reference to it, to avoid disappointment.

Monitoring the server log files will tell you which documents are really being read. You can use your time most efficiently to improve the quality of those. Of course, analysing the server log files also takes time!

Testing your HTML

If you are using hypertext editing software, then your files should always contain valid HTML. Currently though, many people are editing HTML files as plain text files and having to get the markup right themselves. If you are in this category, then it is well worth running the HTML you write through an HTML checker. (There are some pointers to these from the W3C HTML overview.) It is also a good idea when you use a new HTML generation tool to test its output once. There are pointers to clients (some with HTML editing capability)and pointers to more lists in the client list.)

Acceptable Content

Other sections of the style guide have dealt with the layout and structure of text. Here I digress to broaden the notion of style into a consideration of what is acceptable in the actual content of information put on the web.

The web is intended to be a mapping of the knowledge of society in general, and not constrained into a particular format or level. One can therefore expect to find anything on it, from scribbled note to encyclopaedia, and styles and manners will vary. However, for information which is to be generally accessible there are some questions of acceptability which apply simply because the material has been published.

There is a tendency on the Internet to regard news and email as very informal media, in which tolerance is expected. However, you never know who may link to or be led to your public web document. In some countries there may be legal requirements as well as informal ethical codes. I would not advocate any global censorship in this regard, but that does not mean you should not think about which aspects below are relevant to the document you are writing.

Due Credit

The visibility of someone's work depends, on the web more than anywhere, on where it is referenced. If an academic paper purports to be a description of some state of affairs and in fact does not mention related work which maybe of interest, the academic code requires that you refer to it.

Commercial servers selling products are not in practice bound by such a code: you don't find hypertext links to competitor's products. Therefore, make it clear whether your list of services is intended to be fair, or is commercially biased. Both forms are appreciated by the public, but an advertisement masquerading as something else is not.

Acceptable content

Pornography is just the most often discussed form of content which is generally disapproved of and illegal. There are others: libel, material infringing copyright or other intellectual property right, and material inciting to criminal activity are also things which you would be wise to avoid. Bear in mind in this context

your reader
whoever pays for your time equipment and connectivity
your government

and try not to upset any of them.

Where you feel that something may offend your reader, you can to a certain extent protect both yourself and them by making an access path which goes through a warning page, and never yourself distribute URIs for anything behind that page without a similar warning. This is not, of course, foolproof.

The PICS initiative of the W3C consortium is aimed at allowing you (or anyone else) to rate your pages as to their acceptability. The idea is that parents and schools can then use rating systems of their choice to select suitable content for their children. This technology is expected to be available commercially some time in 1996.

Acceptable language

Unacceptable language is the simplest form of unacceptable content. Standards tend to vary from one country to another: in the US they are high. Here is a non-exhaustive non-definitive checklist of a few sorts of language to avoid.

Sexist: Any language which makes reference to the male or female of the human species when either could in fact apply. References to users for example as "he" or "she" can be equally offending.
"Adult": Sexual innuendo or explicit vocabulary obviously is a no-no.
Racist: Politically correct language for referring to each ethnic group seems to vary from month to month. because there are such strong feelings it is wise to be careful.
Religiously assumptive: Whilst it it is reasonable to discuss religion, inadvertently assuming that your reader is of a particular religion or not can be offending.

( back to ... , On to ... )

Thanks to my brother Michael for descibing "how and why" trees he uses in his team-building training.

Why "why?"

Why is it important to be able to ask why?

Because without scientific curiosity we would not have scientific discovery.
Because democracy depends on an informed public (because, in principle, the public are needed to make decisions directly or indirectly);
Because an unquestioning individual can be hookwinked and manipulated;
Because our society is constructed of a set of assumptions built on each other and on shifting conditions, and only by constant inspection can this framework of assumptions and beliefs be maintained in a sufficiently consistent state for us to stay out of serious trouble.

(If this were a good piece of hypertext, it would have well researched links to webs on science, society, politics, ethics and philosophy -- but isn't and it doesn't. Suggestions for related material welcome.)

(back to Educational hypertext , Up to ... )

Background reading

Some other documents which may be of relevance, if you are reading the Style Guide for Online Hypertext :

The HTML Specification and references from it
A Beginner's Guide to writing HTML
World-Wide Web server software - a list of pointers
Web Etiquette -- for Server Administrators
The Web Designer, N Laviolette. Many references

Personal views:

Thoughts on style from Jorn Barger.

This list is far from complete. There are many books now (1994) about how to buld a web site and write HTML. If they have a common failing it is that they assume a particular browser. Pick one which inspires you, as they are generally full of enthusiasm and good ideas. Browse the web, get your own ideas, and mail me mentioning the guide.

Disclaimer and Copyright

MIT / LCS 545 Technology Square, Cambridge MA 02139, USA

This information is provided in good faith but no warranty can be made for its accuracy. Opinions expressed are entirely those of myself and/or my colleagues and cannot be taken to represent views past present or future of our employers.

Feel free to quote, but reproduction of this material in any form of storage, paper, etc is forbidden without the express written permission of the author. Intellectual property rights in this material may be be held by the author, CERN and/or MIT. All rights are reserved.

If you notice something incorrect or have any comment which you don't think is a FAQ, feel free to mail me. If I don't get around to answering, please forgive me -- I try to answer everything!

Tim Berners-Lee