This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14363 - Update the registration mechanisms
Summary: Update the registration mechanisms
Status: RESOLVED MOVED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 editorial
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
: 12854 (view as bug list)
Depends on:
Blocks: 18300
  Show dependency treegraph
 
Reported: 2011-10-03 11:32 UTC by contributor
Modified: 2016-04-18 20:17 UTC (History)
11 users (show)

See Also:


Attachments

Description contributor 2011-10-03 11:32:48 UTC
Specification: http://www.w3.org/TR/2011/WD-html5-20110525/
Multipage: http://www.whatwg.org/C#top
Complete: http://www.whatwg.org/c#top

Comment:
Section 4.2.5.2 appears to be saying that conformance checkers must obtain the
list of valid meta names by screen-scraping a public wiki? *Seriously*? That's
a joke, right?

Posted from: 86.179.45.246
User agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.1 (KHTML, like Gecko) Chrome/14.0.835.187 Safari/535.1
Comment 1 John Foliot 2011-10-03 15:43:49 UTC
W3C Reference URL: http://www.w3.org/TR/2011/WD-html5-20110525/semantics.html#other-metadata-names

Outside of the crude method of data retrieval, I also have concerns over the following section:

Status
   Ratified
      The name has received wide peer review and approval. 

Please define "wide peer review". At issue is the accuracy and validity of the assertion of Ratified.

If I show it to a few of my friends via an IRC chat at 2:00 AM, and they all agree that it looks good, does that constitute a wide peer review? Can I then claim my newly minted metadata name Ratified? 


Proposal to resolve this bug:
Remove section "4.2.5.2 Other metadata names" from the W3C specification until such time as a more robust method of adding metadata names to the collection is established. 6 friends with the key to a public wiki hardly seems accountable and would likely be ignored by conformance checkers due to the high overhead imposed upon them to remain up-to-date.
Comment 2 Ian 'Hixie' Hickson 2011-10-03 18:45:02 UTC
Not a joke, no. It's in fact the same mechanism as the HTML working group agreed to use for rel="" values. Welcome to the new Web.

The exact mechanism needs work, but it's not a big problem.

(In reply to comment #1)
> 
> If I show it to a few of my friends via an IRC chat at 2:00 AM, and they all
> agree that it looks good, does that constitute a wide peer review? Can I then
> claim my newly minted metadata name Ratified?

Certainly within that community you should be able to use it, sure. That's always been the way Web standards work. If you have a community who want to do something, you just write a spec and agree to it and then within that community, that's how the technology works. The HTML spec actually calls that out explicitly; see the last few paragraphs of the "Extensibility" section.
Comment 3 John Foliot 2011-10-04 02:12:17 UTC
(In reply to comment #2)
> 
> Certainly within that community you should be able to use it, sure. That's
> always been the way Web standards work. If you have a community who want to do
> something, you just write a spec and agree to it and then within that
> community, that's how the technology works. The HTML spec actually calls that
> out explicitly; see the last few paragraphs of the "Extensibility" section.

Please define "wide peer review". 
At issue is the accuracy and validity of the assertion of Ratified.
This should be measurable and verifiable by any concerned 3rd party, and the specification should specify how this is done.
Comment 4 Jon Ribbens 2011-10-04 17:20:55 UTC
There are a couple of problems with what it says currently:

(a) There is no way defined to parse the list of acceptable names from the Wiki.
(b) Anyone anywhere could, at any time, blank the wiki page and hey presto a very large percentage of all HTML 5 documents in the world are suddenly invalid.
(c) It makes the W3C HTML standard dependent on an anonymous third-party website.

(a) in particular is surely a show-stopper.

I suggest that the list should be hosted on the w3c.org site, and should be in a computer-readable format. This list can then, behind the scenes, be automatically scraped from the Wiki if that's what you want to happen (and it could do things like alert someone at the W3C if the list suddenly changes significantly). You could also say that conformance checkers should pay attention to the HTTP Expires header when fetching the list, as an indication as to how long to cache the list for.
Comment 5 John Foliot 2011-10-04 18:26:28 UTC
(In reply to comment #4)
> There are a couple of problems with what it says currently:
> 
> (a) There is no way defined to parse the list of acceptable names from the
> Wiki.
> (b) Anyone anywhere could, at any time, blank the wiki page and hey presto a
> very large percentage of all HTML 5 documents in the world are suddenly
> invalid.
> (c) It makes the W3C HTML standard dependent on an anonymous third-party
> website.
> 
> (a) in particular is surely a show-stopper.
> 

...and I would add that b) is a highly plausible and very scary scenario as well. There is zero security in the proposed model if anyone can, at any time, make modifications unchecked.

Like it or not, certain things require a trusted gate-keeper.
Comment 6 Ian 'Hixie' Hickson 2011-10-06 23:12:42 UTC
(In reply to comment #4)
> There are a couple of problems with what it says currently:
> 
> (a) There is no way defined to parse the list of acceptable names from the
> Wiki.

I wouldn't expect any software to literally crawl the wiki. You'd do it manually, or have a custom script to do it.


> (b) Anyone anywhere could, at any time, blank the wiki page and hey presto a
> very large percentage of all HTML 5 documents in the world are suddenly
> invalid.

*shrug*. Vandalism happens. It is trivially reverted. This is not an issue.

Someone could crack into the HTML spec's Web server and changed the required DOCTYPE to <!DOCTYPE LADYGAGA> but that wouldn't make all the pages invalid. What matters is what people think matters.

Heck, I could change the spec tomorrow to say all documents are invalid. That wouldn't mean that all documents were invalid, it would just mean the spec was wrong.


> (c) It makes the W3C HTML standard dependent on an anonymous third-party
> website.

Anonymous?


> I suggest that the list should be hosted on the w3c.org site, and should be in
> a computer-readable format.

The W3C hasn't fared well with having computer-readable data in the past. (DTDs have caused the W3C to essentially DDOS itself by having lots of badly authored software read it continuously.)


Anyway, the whole registration mechanism really needs updating in general. Just need to work out what the right solution is first.
Comment 7 Jon Ribbens 2011-10-07 00:44:18 UTC
(In reply to comment #6)
> I wouldn't expect any software to literally crawl the wiki. You'd do it
> manually, or have a custom script to do it.

That's my whole point. Every "conformance checker" would do the scraping slightly differently, because there's no defined "correct way" of doing it. It would be incredibly fragile. Computers trying to parse non-computer-readable formats is never a good idea - it being a mandated part of a fundamental standard is inconceivable. Nobody's going to do it manually, that's ridiculous.

> > (b) Anyone anywhere could, at any time, blank the wiki page and hey presto a
> > very large percentage of all HTML 5 documents in the world are suddenly
> > invalid.
> 
> *shrug*. Vandalism happens. It is trivially reverted. This is not an issue.

I realise that you do indeed have a lot of authority here, but nevertheless "argument from authority" is still a logical fallacy. It is not "not an issue" simply because you say so.

> Someone could crack into the HTML spec's Web server and changed the required
> DOCTYPE to <!DOCTYPE LADYGAGA> but that wouldn't make all the pages invalid.

People hacking into secure servers is one thing. People trivially changing deliberately insecure public wikis is another.

> > (c) It makes the W3C HTML standard dependent on an anonymous third-party
> > website.
> 
> Anonymous?

Have you checked the 'whois' for whatwg.org recently? Or, for that matter, the whatwg.org web site?

The final HTML specification should not be fundamentally dependent on any site other than w3.org, ietf.org, or similar.

> The W3C hasn't fared well with having computer-readable data in the past.
> (DTDs have caused the W3C to essentially DDOS itself by having lots of badly
> authored software read it continuously.)

And this problem is somehow avoided by having the list hosted on a less-well-funded web site instead?

> Anyway, the whole registration mechanism really needs updating in general.
> Just need to work out what the right solution is first.

Excellent, well, hopefully things will improve.
Comment 8 Ian 'Hixie' Hickson 2011-10-21 22:37:00 UTC
> That's my whole point. Every "conformance checker" would do the scraping
> slightly differently, because there's no defined "correct way" of doing it.

Well we should definitely have a defined way to determine what the registered types are, sure. I don't see why this is a problem.


> > *shrug*. Vandalism happens. It is trivially reverted. This is not an issue.
> 
> I realise that you do indeed have a lot of authority here, but nevertheless
> "argument from authority" is still a logical fallacy. It is not "not an issue"
> simply because you say so.

Why would vandalism be an issue? It's not an issue because you say it is, either. :-)


> Have you checked the 'whois' for whatwg.org recently? Or, for that matter, the
> whatwg.org web site?

Currently, I pay for it.


> The final HTML specification should not be fundamentally dependent on any site
> other than w3.org, ietf.org, or similar.

I don't see why. Even if it was dependent on a site that went dark two months from now, it would just be updated to point to another site then.


> > The W3C hasn't fared well with having computer-readable data in the past.
> > (DTDs have caused the W3C to essentially DDOS itself by having lots of badly
> > authored software read it continuously.)
> 
> And this problem is somehow avoided by having the list hosted on a
> less-well-funded web site instead?

The problem is apparently not made worse, at least.
Comment 9 Jon Ribbens 2011-10-21 23:37:25 UTC
(In reply to comment #8)
> > That's my whole point. Every "conformance checker" would do the scraping
> > slightly differently, because there's no defined "correct way" of doing it.
> 
> Well we should definitely have a defined way to determine what the registered
> types are, sure. I don't see why this is a problem.

The problem is that currently you *don't* have a defined way to determine what the registered types are. If that's a known defect with the specification that will be fixed before it's finalised then that's fine.

> Why would vandalism be an issue? It's not an issue because you say it is,
> either. :-)

Because it's trivially easy and could potentially cause significant problems for people doing conformance checking (in that their tools will suddenly indicate that most websites are invalid).

> > Have you checked the 'whois' for whatwg.org recently? Or, for that matter,
> > the whatwg.org web site?
> 
> Currently, I pay for it.
> 
> > The final HTML specification should not be fundamentally dependent on any
> > site other than w3.org, ietf.org, or similar.
> 
> I don't see why. Even if it was dependent on a site that went dark two months
> from now, it would just be updated to point to another site then.

Both of these replies tend to indicate that we have a different idea of what a "standard" is. Generally speaking, one would expect a standard to be released on a certain date and not to change after that, or at least, not to change on a daily basis. They're something that are supposed to have some stability and people are supposed to be able to have some trust in. Contracting out part of a standard to a wiki would be, um, novel.

> > > The W3C hasn't fared well with having computer-readable data in the past.
> > > (DTDs have caused the W3C to essentially DDOS itself by having lots of
> > > badly authored software read it continuously.)
> > 
> > And this problem is somehow avoided by having the list hosted on a
> > less-well-funded web site instead?
> 
> The problem is apparently not made worse, at least.

Or rather, it's made a lot worse. Instead of the potential victim of the accidental DDoS being a sizeable organisation with the funds and experience to cope with the situation, it's just you. No criticism of you personally intended, but most people have more limited means in terms of time and money than most organisations, and it leaves HTML with a "bus factor" of 1, which is somewhat unfortunate.
Comment 10 Ian 'Hixie' Hickson 2011-12-02 18:12:23 UTC
(In reply to comment #9)
> > 
> > Well we should definitely have a defined way to determine what the registered
> > types are, sure. I don't see why this is a problem.
> 
> The problem is that currently you *don't* have a defined way to determine what
> the registered types are. If that's a known defect with the specification that
> will be fixed before it's finalised then that's fine.

Yes, this needs to be cleared up.


> > Why would vandalism be an issue? It's not an issue because you say it is,
> > either. :-)
> 
> Because it's trivially easy and could potentially cause significant problems
> for people doing conformance checking (in that their tools will suddenly
> indicate that most websites are invalid).

Tools shouldn't be just scraping the sites automatically (and don't, in practice).


> > I don't see why. Even if it was dependent on a site that went dark two months
> > from now, it would just be updated to point to another site then.
> 
> Both of these replies tend to indicate that we have a different idea of what a
> "standard" is. Generally speaking, one would expect a standard to be released
> on a certain date and not to change after that, or at least, not to change on a
> daily basis.

HTML will continue to evolve until it is dead. It's a living standard.


> Or rather, it's made a lot worse. Instead of the potential victim of the
> accidental DDoS being a sizeable organisation with the funds and experience to
> cope with the situation, it's just you. No criticism of you personally
> intended, but most people have more limited means in terms of time and money
> than most organisations, and it leaves HTML with a "bus factor" of 1, which is
> somewhat unfortunate.

This is false. If I were to die suddenly, people would just lift up the spec and wiki and put it elsewhere, assuming they couldn't get the site reassigned to them, which would be the more likely situation.
Comment 11 Ian 'Hixie' Hickson 2011-12-02 18:12:28 UTC
*** Bug 12854 has been marked as a duplicate of this bug. ***
Comment 12 Jon Ribbens 2011-12-02 19:18:24 UTC
(In reply to comment #10)
> Tools shouldn't be just scraping the sites automatically (and don't, in
> practice).

It's what they *must* do, according to the current HTML5 specification.

> HTML will continue to evolve until it is dead. It's a living standard.

> This is false. If I were to die suddenly, people would just lift up the spec
> and wiki and put it elsewhere, assuming they couldn't get the site reassigned
> to them, which would be the more likely situation.

I'm sure this argument has been done to death elsewhere, but suffice to say that your usage of the word "standard" appears to me to be somewhat... non-standard. This state of affairs is rather alarming for such an important specification.
Comment 13 contributor 2012-07-18 07:14:45 UTC
This bug was cloned to create bug 17899 as part of operation convergence.
Comment 14 Edward O'Connor 2012-10-12 23:08:41 UTC
Any revamp of the registration mechanisms will happen too late for HTML5. We'll revisit this in HTML.next.
Comment 15 Robin Berjon 2013-01-21 15:58:45 UTC
Mass move to "HTML WG"
Comment 16 Robin Berjon 2013-01-21 16:01:31 UTC
Mass move to "HTML WG"
Comment 17 Jon Ribbens 2013-02-13 02:03:15 UTC
This surely blocks "HTML5" because without resolving this issue you cannot by definition have a "standard" by any normal meaning of the word.
Comment 18 Michael[tm] Smith 2015-06-17 02:54:45 UTC
This doesn't affect UA behavior and isn't really a priority in any other way, because regardless of what spec change is made or not made here, it's not clear that in practice it will actually make any difference.
Comment 19 Jon Ribbens 2015-06-17 11:20:25 UTC
It's very obvious in what way it makes a practical difference - in whether or not it is possible to make HTML 5 conformance checkers that actually check whether documents fully conform to the standard, or just have to ignore parts of the standard as being "too vague to be checkable".
Comment 20 Travis Leithead [MSFT] 2016-04-18 20:17:10 UTC
HTML5.1 Bugzilla Bug Triage: Moved to GitHub issue: https://github.com/w3c/html/issues/213

If this resolution is not satisfactory, please copy the relevant bug details/proposal into a new issue at the W3C HTML5 Issue tracker: https://github.com/w3c/html/issues/new where it will be re-triaged. Thanks!