Architectural Considerations for Language Versioning for the Web

Editors: Larry Masinter (Adobe), Jonathan Rees (Creative Commons)
Date: 11 June 2009
This version
Latest version: http://www.w3.org/2001/tag/doc/versioning-html
Status: This is an editor's draft produced as part of the W3C Technical Architecture Group's work on ISSUE-41.

Abstract

Introduction

This document is an attempt to focus on the issues of how languages evolve, with an aim toward providing guidance for W3C Working Groups considering how to write extensible languages, specifically whether to provide version indicators and how to control extensibility of languages when introducing new versions of languages.

The TAG has spent many years looking at versioning in general, and there are a number of drafts on which consensus has been elusive. The primary motivation for continuing this work has been to specifically look at the issues and requirements for versioning in HTML, between HTML4 and HTML5.

W3C defines languages. Languages evolve, creating new "versions" of the language. Evolution is either incremental or drastic. After evolution, new readers need to work with older content, and new content needs to not "break" inappropriately with older readers. How should language designers plan for evolution, and advise implementors of consumers and producers (creators) of content about how to "future proof" what they are making?

Background

The history of TAG work in versioning is summarized in Appendix I. More recently, the TAG reviewed the history of previous work: http://lists.w3.org/Archives/Public/www-tag/2009Apr/0028.html , and updated ISSUE-41 http://lists.w3.org/Archives/Public/www-tag/2009Apr/0061.html.

Terminology

[[Provisional definitions added by JAR 2009-06-11]]

Producer
An agent that generates texts for transmission to one or more consumers.
Consumer
An agent that performs some action in response to a text.
Text
A string of characters that might be transmitted from a producer to a consumer.
Language
A set of constraints and permissions on consumers of texts.
Version
A particular language belonging to a series or family of closely related languages; a sublanguage.
Language specification
A document that defines a language (or version).
Version indicator
A phrase describing or prescribing the language version meant to apply to a text.
Phrase
Part of a text.

(Why constrain only consumer behavior, and not producer behavior? Because a restriction on a producer - "don't say X" - can always be taken to be a permission on a consumer - "it is OK to treat X as invalid input".)

Nature of Language Changes for New Versions

Analyze the kinds of language changes that have occurred in the past and might occur in the future, so we can correlate them to the utility of version indicators (not really started)

Reasons for Language Changes

Why do languages evolve, in ways that might need to be called out as separate versions? Specifically, why might HTML evolve?

How can we define how we define HTML5 today such that, if problems are discovered that require incompatible language changes, we don't have rampant compatibility problems if implementations are updated to a later version?

In the history of computer science, it is difficult to come up with any language that has not evolved, been extended, or otherwise "versioned" as long as the language has been in use. This history of extension and change applies to network protocols, character encoding standards, programming languages, and certainly to every known technology found on the web.

It is difficult to come up with cases where a language hasn't gone through at least some minor incompatible change.

The standards process is established as a way of evolving specifications and implementations in a way to reduce the likelihood of complete failure to interoperate, but certainly not to guarantee that no incompatible changes will be needed in the future.

Reasons why Languages (and HTML in particular) will need changes in the future:

  1. Requirements change: This is the main reason for evolution of languages -- people want the language to support some new feature that hadn't been thought of at the time of the original language design. Often requirements can be accommodated without actually changing the behavior of anything else, but at times, something resembling a "version" is necessary.
  2. Difficulties are uncovered after CR: Two implementations aren't representative "Candidate Recommendation" exit criteria only needs two implementations, and does not even require spanning the breadth of applicable hardware and software. Can HTML5/CSS3 work well on an electronic paper display such as Kindle? Can it work well in a collaborative multi-pointer system? Is there a single "focus" or "tab order"? Does it work well with typical "remote control" devices used for TVs? These are current platforms which are not required to work well, in order to exit CR.
  3. ambiguities appear: Implementors get together and write a specification. They're happy because the spec matches what they implement -- or so they think! However, all of the implementors were part of the spec development process and .... amazingly .... there are some things they know and agree on that aren't part of the spec. (No matter how brilliant and wise the spec editor). Later, someone else comes along and implements the spec as written, but, either because of confusing wording or missing information, their implementation is incompatible. Then there's a desire to update the spec to resolve the ambiguity, but there is no way for authors to create material that acknowledges that the author has chosen the new (unambiguous) definition over the previous (ambiguous) one.

Certainly there are other reasons for language evolution and there's some overlap between these.

Version Indicators: How New Versions of languages might be marked

There are many ways in which the "version" or "nature" of an entity might be indicated. This section enumerates the kinds of version indicators available generally (out of band, in-band global, in-band local) and specifically for HTML  (MIME types, comments, DOCTYPE, new tags, namespaces).

Version indicators can either be

In-content version indicators can either be

Nature of Version Changes

Languages can change through

[[JAR: Because languages are contracts involving two parties, it is very easy to get confused here. We need an analysis that considers the idea that a language is a set of constraints on behavior. A language change can differentially relax or constrict the freedoms of the consumer. Consumers can be permitted to do things with texts they weren't allowed to do before (e.g. can perform new behavior with a text that previously had to be treated as an error), or prohibited from doing something that were allowed before (e.g. considering something to be an error).]]

Whether language changes can be recognized without version indicators depends on the type of change: Some augmentations might be recognized by appearance of syntax that wasn't previously recognized (i.e., the "version indictor" is the use of the feature itself). Augmentations might be ignored or merely processed incorrectly by old implementations rather than being recognized as intended with a formerly unimplemented interpretation. Restrictions, clarifications, incompatible changes cannot readily be determined by scanning, though.

Version Detection: How can consumers detect versions

Use of "Modes" in browsers: ("quirks mode", "near standards mode", etc.) in the browser seems like it would have some correlation to "versions".

Even though it is possible to avoid having out-of-band or global-scope version indicators for augmentations, this does not mean that there are no advantages or uses for in-band global indicators. If there are multiple languages (whether Algol 60 vs Algol 68) or just multiple "modes", having a global-scope in-band version indicator allows for switching between one interpreter and another. Indicating the version in-band but requiring parsing of the content means that it isn't possible to evolve syntax or parsers.

To modulate the interpretation of the text in question. That is, depending on what the version indicator is, interpreting agents might have to interpret the same text in two different ways.

Utility of Version Indicators outside of Publication / Distribution

Review the compatibility and development workflow strategies for using different kinds of version indicators (future content with current readers,  distinguishing current from future content with future readers) http://lists.w3.org/Archives/Public/www-tag/2009Apr/0064.html

To syntactically characterize the text in question.

One use case for embedded version indicators is to track versions during authoring, production and deployment before they are sent over the wire. Authors and authoring tools may well know which version of a language they are editing or producing content for, which features they are assuming and so forth. Without any way of marking the intended version in the content itself, it is likely that version indicators will be carried outside, and subject to loss. As has been seen with MIME types, external metadata is subject to risks of separation, lack of control by authors on deployment separation. Right now, new HTML features seem to be deployed on the web by advanced cites "sniffing" the User Agent version string and using it to determine which version of a HTML page should be generated. This process is subject to some significant failures, mainly because new or otherwise unrecognized servers have no way of indicating to such sniffers that they, too, intend to interpret the same features as one or another proprietary browser. We need to consider the use cases of language version management during pre-publication processes, and also the use case of "browser version" sniffing and the failure cases. This touches on the "content negotiation" issue (as the sub-case of negotiating versions).

Evaluation Criteria: reasons for and against using version indicators

Evaluate the use of version indicators against possible future language changes to determine what are the reasons for and against using version indictors (to be done). Version indicators should be approached with some amount of skepticism.

Sometimes versions can be detected by scanning the text to see whether it syntactically conforms to the corresponding language specification; the purpose of the indicator is to make it unnecessary for the consumer to do this.

Motivation of Implementors of agents

See This formalism ought to be modified to account for differential payoff to producer and consumer.

Situation: a producer generates a text containing parts that are not understood by the consumer

and distinguish between points in producer/consumer payoff space with good (positive payoff), punishing (negative payoff), or neutral (zero payoff) coordinates along the producer/consumer axes.

The payoff to the producer will shape the producer's behavior, and the payoff to the consumer will shape the consumer's behavior.

The payoff to the producer depends on what the consumer does (unless communicating just "feels good" or is required by law), but will often be indirect. E.g. in advertising, it's not the payoff from any particular transaction that matters, but the amortized payoff from many transactions. When a text is broadcast to multiple audiences with different capacities, it matters a lot whether the producer knows that this is the case.

Example: Creators of good children's TV shows (which I hereby define to be the ones I like to watch) know that there are two audiences and craft their material so that both appreciate the content. Creators of bad ones don't and only aim for one audience. But including material meant only for grownups (positive consumer 2 payoff), perhaps disguised or deemphasized so as not to make those who don't understand anxious (negative consumer 1 payoff), is, to me at least, the essence of craft and quality.

We have the same situation with content creators

Those exercising craft have a more difficult job in creation and testing - they have to think - and this extra effort will only be made if the perceived benefit outweighs the cost. In a sense material that is knowingly destined for different audiences constitutes multiple communications channels, and the question of server/client compatibility (payoff) might be better thought of not as a language extensibility problem but a multiplexing problem.

If in-line language extensibility (think: child-inaccessible puns) is outlawed, the new material will be communicated *somehow*. (This is similar to the rewrite-based extensibility question in programming languages. Macros happen whether a language spec supports them or not; it's just a question of how extensibility will be managed - forking new languages (think: content-types), external preprocessors, or in-language macro facilities.)

It's not clear what purpose version markers in HTML, indicating the presence of stuff not understood by some clients, should have for clients that don't understand that stuff. I guess it could lead to "upgrade your browser" or "get this plugin" dialogs, or "save this file or choose application", the payoff of which is the ultimate benefit minus the annoyance of having to deal with the dialog. But in some cases knowing that you don't understand will only lead to consumer anxiety.

Analogy: Suppose I go to Peru and am spoken to in Spanish, which I don't understand. How can I tell how important what's being said should be to me? Sometimes very, sometimes not; sometimes I can tell whether it has must-understand status, sometimes I can't; in the latter case I only have anxiety (perhaps a lot, if it's an armed soldier who's speaking) and in a sense I might prefer to have heard nothing, like the child who happily doesn't know that an adult pun has just flown past in their TV show.

Adoption of these recommendations

any decision to require or encourage use of version indicators as a way of modulating behavior will require some agreement of current browsers to do work that will only pay off in the future, and getting that agreement requires buy-in by the affected parties. However, I don’t want to start with the presumption that “they won’t go along” without first making the case for why allowing for future non-compatible extensions in current browsers is good practice, even when such changes should be avoided if at all possible.

Specific HTML recommendations

Use of <!DOCTYPE HTML> vs. specific version identified HTML5. Version indicator useful and traditional for authoring software. Some other DOCTYPE might signal validation behavior. No incompatible changes expected.

Ownership of application/xhtml+xml and the way in which the application/xhtml+xml migration might or might not be assigned to one or another development path

Conclusions

References

 

Appendix I: Review of past W3C TAG work on versioning

Appendix II: Use of Version Indicators in Other Languages

It is empirically true that one can version a language without having inline version indicators. For example, Algol 60 and Algol 68 do not have version indicators. Supplying version indicators is a design choice.

Appendix III: Versioning Formalism

JAR's attempt at a formalism for language compatibility. First published here:

http://w3.org/mid/063031F1-1645-4A4C-A350-2DF0077B9722@creativecommons.org

This framework was inspired by a paper on animal communication by Peter L. Hurd (J. Theor. Biol. 174:217-222 1995 [3]).

Two agents, a producer and a consumer, are playing a game that goes as follows:

  1. An objective o is chosen from O = a space of possible objectives
  2. Given o, the producer's (sender's) choice of text is via a function S: A->M where M = a space of possible texts (messages, strings)
  3. Given m, the consumer's (receiver's) choice of action is a function R: M->A where A = a space of possible actions (meanings, interpretations)
  4. Given o and a, success is judged according to a success (or "payoff") criterion Z(o,a). I.e. if Z(R(S(o)),o) then communication has been successful.

The simplest situation would be where the objective space is simply the action space, and the text specifies the desired action:

Z(a,o) iff a = o.

Note: M = text space includes all possible texts, including those that are not used for communication.

Note: A = action space includes all possible actions, not just those that might be achieved through communication. Examples of actions: producing a certain visual display of the information in a hypertext document, or generating the computational result specified by a computer program.

The functions S and R are not uniquely determined by A, so the producer and consumer will need to agree on a correspondence. I'll define a "language" to be a contract that might be entered into between a producer and a consumer, presumably for the purpose of maximizing communication success.

The simplest kind of language would be to agree on the particular function R that is to be implemented by the consumer. Then given an objective o the producer can choose any text for which Z(R(S(o)),o).

However, we are interested in language extensibility. A language defined to merely specify the behavior of R cannot be extended because the consumer cannot change the interpretation of any text for fear that a producer unaware of the change might send it, in which case there would be no way to guarantee that R's action would meet the objective. Therefore we consider languages in which some texts are reserved for future expansion (i.e. sent texts are limited):

[[JAR 2009-06-11: I need to tweak the terminology and presentation, since the following definition of "language" is not consistent with the previous ones. I like the previous ones better so will need to rework the following.]]

definition: A language L is a pair (F,I) where

The unused texts are those that are in M but not in F. Note that I defines actions for unused texts. This will come in handy later.

definition: A producer speaks L if the image of S is a subset of F.

definition: A consumer understands L if R(m) = I(m) for all m in F.

(The producer will generally choose to further constrain S in order to maximize success.)

Now consider a language change - a language L changing to become a language L'.

definition. L' is backward compatible with L iff any consumer that understands L' understands L in the same way on F.

Put more simply (trivial theorem): L' is backward compatible with L iff I'(m) = I(m) for m in F.

We would like to also have some notion of forward compatibility: Any producer that speaks L' also speaks L. Since a consumer that understands L cannot know how in advance what new texts (in F') are supposed to mean, we have a problem: what should R(m) be when m is new?

To answer this, we introduce the notion of adequate defaulting. The idea is that there might be communication success if we substitute some action a^ for the unknown future desired action a. In this situation we write a^ ~> a. To simplify the formalism we also allow a ~> a for all a in A.

definition: A consumer respects L iff R(m) ~> I(m) for all m in M.

We can define forward compatible as follows: L' is forward compatible with L iff any consumer that respects L also respects L', i.e. R(m) ~> I(m) implies R(m) ~> I'(m) for all m in M. Trivial theorem: Forwards compatibility holds iff I(m) ~> I'(m) for m in M.

definition: L weakly extends to L iff L' is backward and forward compatible with L, i.e. F' superset of F I(m) = I'(m) for all m in F I(m) ~> I'(m) for all m in {F' - F}

In order for forward compatibility to be transitive, we also need to make sure nothing happens with non-final texts to break future extensibility:

definition:. L extends to L' iff it weakly extends to L' and preserves or improves defaults defined by L:

Def. L is extensible if there is an L' that extends it.

Note. If ~> is transitive (is a partial order), then extends and the other relations are all transitive. [requires proof?]

 

Example:
  • A = O = {stop, go, caution, reverse, unassigned}
  • M = {red, yellow, green}
  • unassigned ~> o for all o in O
  • L = (F,I), L'=(F',I')
  • I = {green:go, stop:red, yellow:unassigned}
  • F = {red, green}
  • I' = {green:go, stop:red, yellow:caution}
  • F' = {red, green, yellow}

We want to be able to "kill off" a text - to decide in a future extension that it shouldn't have any meaning and shouldn't be sent. We can specify I(m) = k with Z(k,o) always false, and then the producer won't send it. This only works if either

  1. k is not on any "upgrade path" to an action that succeeds (i.e. not k ~> a), or
  2. k is not upgradable (k is in F).

(2) puts fewer constraints on A - it requires only one special undefined action, e.g. u as in the example, instead of two that behave differently in the partial order. However, (1) is better formally as one simply specifies a ~> k for any a, and then one may upgrade any text's action from a to k. (2) would require a change in some of our definitions to allow the meaning of a text to pass from a default action a to k, which is not related to it by ~>. (Maybe we could introduce a set K of killed texts as a component of a language.)

Note: If an action can only be done using a defaulted text, why would a producer not "cheat" by sending a text outside F in order to achieve that objective? We can remove the temptation by making sure that a defaulted objective is always achievable by a non-defaulted text: For all m, if Z(I(m),o), then there is some m' in F such that Z(I(m'),o).

Correspondence with [1]:

final = in defined set OR undefined by virtue of having been "killed"

text's action succeeds for some objective = in accept set

Enrichments:

We could extend the framework to nondeterministic producer and consumer behavior, to quantitative payoffs, and/or to differential producer/consumer payoffs. Evolutionary biologists do these things in order to explain the presence and absence of cheating in natural communication systems. There is probably no need to do this in a protocols and formal languages engineering setting, except by way of explaining why a producer would choose a richer action when a default would suffice.

[[JAR: I have revised my view on the importance of both differential payoff and nondeterminism. Given the difficult and contentious nature of the issues we're facing, I think it will pay off (so to speak) if the presentation were be changed to recognize these phenomena.]]

We should be able to model distributed extensibility in this setting: how precoordination (social agreement on how to split up the extension space) can enable the existence of upper bounds among the compatibility relations.

Thanks to Alan Bawden for checking my math. Any remaining errors are mine. - Jonathan

[1] http://www.w3.org/2001/tag/doc/versioning
[2] http://www.w3.org/2001/tag/doc/versioning-strategies
[3] http://scholar.google.com/scholar?hl=en&lr=&cluster=11821292421815354248

Appendix IV: Other considerations

This section is for things that haven't been integrated into the main doc. To modulate the interpretation of the text in question. That is, depending on what the version indicator is, interpreting agents might have to interpret the same text in two different ways.

This choice has profound consequences for the design of future versions. Suppose that an A (old) text is marked with indicator A. #1 does not in itself imply that a text generated by an A-interpreter will lead to the desired payoff for a B producer, for any text. That will only be true if when we designed the versioning regime we made a stipulation that all future versions will have this property (new producers "must be" happy with what old consumers do with all texts). If we stipulate only sense #1, then future version designers do not have the freedom to transition a given interpretation of a given text from acceptable (in A) to less acceptable (in B) - or vice versa.

If there is in fact forwards and backwards compatibility in a language series, there may be no no strong need for a version indicator, other than as a convenience (so that agents who care don't have to scan the document to see if it contains constructs it doesn't understand).

So in any discussion, you need to be clear about the sense of the version indicator.

Sense 1 is economical in that a consumer can always just use a B-interpreter to interpret according to language A. There is strong incentive for a consumer to assume it even when doing so isn't in spec. Sense 2 is harder to implement since the consumer needs two interpreters or two interpretation modes, one for A texts and one for B texts.

Version indicators can be helpful, but they just push off the problem one level - they are really part of the language(s) in question, so they have to be evaluated according to exactly the same criteria that one would apply to a language series that doesn't have them. Suppose you have language versions A and B, and then a "sum" language C = A + B whose texts consist of a version indicator followed by a text of either A or B. (If A and B both already have version indicators you *may* be able to take C texts = A texts union B texts.) You still have to agree ahead of time - before language B is invented - on how to interpret texts of C - that is, everyone concerned needs a priori knowledge of how to parse and understand version indicators, even if it's just to say that rejecting unknown versions, or unknown texts, is OK. When you design a language series initially, you may set aside a place for version indicators, and specify that the indicator "sublanguage" is extensible (i.e. new indicators may come along). If you get the indicator language wrong in the first place, e.g. if you define it to specify sense 1 instead of sense 2 or vice versa, then you may find yourself stuck, either underconstraining the series (so that old consumers can't consume new content with confidence) or overconstraining series (so that new content will be rejected by conforming old consumers).

So version indicators only support extensibility (or whatever other goal you're after) if the future consequences for both old and new consumers are articulated and documented before the whole process gets started.

Notes: Saying that C = A + B where B is not yet invented is not an nonsensical as it sounds. An extension may be thought of as a secret that is somehow known in principle, but not revealed to producers and consumers until some future date. I think of versioning and extension as being similar to the concept of single assignment or "future" in programming languages.

2. JAR used a different definition of "language" in his formal writing... I think that language (or language version) as class/predicate of interpreters, or equivalently requirements/specification/constraints on interpreters, is probably a more useful definition that either language as set of strings or language as single interpretation function on set of strings.

Languages can and have been evolved without changes that require implementations to implement forks to consume content from different versions of the language in different ways. On the Web, for example, URIs have evolved that way, and with some unnecessary exceptions, so have CSS, HTML, and the DOM APIs.

In fact, in the case of CSS and HTML, the only versioning has been quirks vs standards mode, a versioning that wasn't sanctioned by the specifications contemporary to its introduction, and which would have been unnecessary had the deviations from the original design required by deployed content been codified as standard, as we have been doing for the past few years with CSS 2.1 and HTML5.

Forking the language makes implementations orders of magnitude more complex. Watching the Internet Explorer engineers' pained expressions when one discusses the implications of their decision to ship multiple versions of their rendering engine makes this abundantly clear. It also makes the language less suitable for constrained devices (instead of one language to support, one effectively ends up with multiple languages to support), harder to test (instead of testing one language implementation, one effectively ends up testing multiple implementations, as well as their interactions in edge cases), and harder to document (instead of just specifying the weird behaviours that end up de-facto part of the language due to wide deployment of implementation bugs, one has to also specify the other behaviours expected in each version).

Language designers should strive to make their languages versionless at the syntax level.

Version indicators only support extensibility (or whatever other goal you're after) if the future consequences for both old and new consumers are articulated and documented before the whole process gets started.

That's not uniquely true of version indicators.

That's true no matter what technique is used to distinguish one version from another. The alternative, where there aren't any version identifiers, requires consumers to deal with both old and new markup as well.

For some languages and some applications, it may be reasonable to define a universal semantics for all versions, such as the HTML rule of ignoring wrappers it doesn't recognize. (Not that that hasn't introduced problems of its own, with special elements created over time just to work around the consequences of the "ignore wrappers" rule.)

For other languages and other applications, it may not be reasoanble to define a universal semantics. Applications must be expected in that case to do something else. Version identifiers offer a convenient mechanism to help users distinguish between versions, even if machines don't need them: "Unexpected element 'fribble' encountered in this V1.2.3 document. The element 'fribble' is not defined in V1.2.3."

To allow any particular input to be flagged as an error is itself (what I would call) semantics. We're having so much trouble with "error", "must accept", "must reject", "must understand", and so on, that it ought to be useful to deconstruct a bit and just talk about the desirability (payoffs) of these various outcomes for producer and consumer. I expect it to be helpful to treat outcomes such as reject, ignore, default, and understand uniformly, and talk about semantics (or specification) not as giving the single "correct" outcome for all consumers but as saying which possible outcomes are acceptable across consumers of varying abilities and inclinations. So ask not "should the consumer accept (or reject) X" but rather "should it be OK with the producer if the consumer rejects (or accepts) X".

Appendix V: Feedback on the June 1 draft