Readability of the Web

Intelligent Navigation

Rating documents

We could take into account:

The author's view

The author could rate the interest of his documents (optional).

Most looked at

Each time a document is looked at, it could increment a "popularity" value, otherwise decreasing as time goes by, e.g.

                pop = exp(-alpha*(t - tp)),

tp being equal to ln(pop)/alpha + t_old, t_old being the last time someone looked at the file or through the link, and t the current time.

The reader's view

Any reader might rate the interest of the document, like "boring" or "interesting" (optional).

The reader should be allowed at any time to select the weight of each rating, with a default value that he could set in a default file.

The best-rated link, as well as any link over a default interest value (e.g. the average value for the whole text, or a constant), should be colored in a special way.

Search in the Web where no index is provided

The problem dealt with here is: I am in an html file, I know what I am looking for, I know keywords for it, and I want to see if there is anything about it available FROM the current file through its links.

Where to search ?

A breadth-first traversal searching seems to be the only way, if we don't want our grand-children to get the answer for us...

The search might detect if a file has already been looked through, and save the results for it.

For best results, after a study of each link of a given file, the search should study the links of the best-rated file, and then the links of the next best-rated file so far, wherever it is.

How to rate the interest of a file ?

We could take into account:

The text: How many times the keywords are used in the text.
The titles: Each time one of the keywords is used in a title, this shall increase very much the interest of a given file.
Its own links: The file should be given some feedback about the interest rates of its own linked files, which might have themselves been corrected if their own linked files show sufficient interest, and so on.
General interest: general information such as quoted above would also be used: the author's rating, the readers' average rating, the number of readers having looked through it per unit of time ...

How long ?

This is of course the most important. Given an infinite time, a search can be quite accurate, but will be very inefficient for the reader.

There are many ways to stop the search:

File found: A file that seems enough related to the keywords is found, and the reader wants to get right down to it. The reader would have to define the limit between "enough related" and "not enough related" somehow, or use a default value.
Depth reached: The reader could set a depth as a limit (e.g. the search should not follow more than three consecutive links).
Time over: The reader could set a maximum time for search, whatever other limitations he may use. He should also be allowed to stop the search at any time and get the best result so far.

Then what ?

Once the search is over, the reader could have two choices:

To get the best-rated file, and then by decreasing interest the others. The best-rated file isn't necessarily directly linked to the reader's current file current file.
To get the best-rated path, which means that the best link he should use would be highlighted in some way at each stage.

The reader should have the possibility to keep searching, while starting to read the documents found.

We now see better the difference between his two choices:

When he only wants the best-rated files, the reader will have access to files that won't be much related one to each other. When he takes the best-rated path, the user will follow links that have been created by a human being in an order that we may suppose to be logical.

When the reader wants some detail on a well-known field, he could take the first search method; when he needs a somewhat more logical information on an unknown field, he could take the second search method.

Increased Speed

Depending on its own possibilities, the client could dedicate part of its memory to guess what file(s) might be asked next by the reader, and memorize it/them while the reader would be reading its document. This should depend upon what amount of memory is available for it, what size the documents are, how difficult is the guess, how blocked is the network...

The smartest might be to ask for a transfer of the first page only, so that the rest of the file could be transferred while the reader would read the beginning of it.