This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Informally, "safe content" is content than you can put in a script (or style) element in a polyglot document, and conversely content that is not safe should be placed in an external file and referenced. However http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#external-script-and-style says > Polyglot markup uses external scripts if that document's script or style sheet uses < or & or ]]> or --. The restriction on -- is not needed, <script> a-- </script> would parse the same way in xml or html. It's inclusion appears to be related to the side comment on not using <!-- comments in scripts, but it's inclusion in the list of strings that force the use of external files appears to be bogus. Conversely, the following section http://dev.w3.org/html5/html-xhtml-author-guide/html-xhtml-authoring-guide.html#in-line-script-and-style says Safe content is content that does not contain a < or & character. here, despite what it says in the previous section there is (correctly) no banning of -- and (incorrectly) no banning of ]]> proposal. Take the definition of "safe content" out of 9.1 and place it into section 9 immediately before 9.1 and 9.2 so both can reference it. then 9.1 can say scripts _must_ use external reference if the script uses unsafe content and 9.2 can say scripts may be inline if they only contain safe content. As a definition of "safe content" I think Content is not "safe" if it contains (after any xml or html entity or character references are expanded) the characters < or & or the substring ]]>
(In reply to comment #0) I agree w.r.t. '--'. Those situations when '--' (and '-->') inside <script>/<style> is potentially harmful, are already considered non-conforming by HTML5 itself. Hence it is "unsafe" (in some sense) even in HTML5 itself. Therefore I agree that it does not make sense to mention '--' in *this* definition of "unsafe". But I think 'unsafe' is perhaps not the most telling word. How about simply 'not polyglot'? ...snip... > As a definition of "safe content" I think > > Content is not "safe" if it contains (after any xml or html entity or character > references are expanded) the characters < or & or the substring ]]> The phrase "after any xml or html entity or character references are expanded" is quite confusing. It is clear that it is XML's "expansionism" that is the reason why there is a problem. However, it for instance sounds as if you say that ]]> is dangerous ... And it sounds as if it somehow is possible to avoid expansion, in XML - is it? I would like to propose the following, as more hands on and correct: NEW DEFINITION PROPOSAL: """ A <script> or <style> is not considered polyglot (that is: the XML interpretation will differ from the HTML interpretation) if it contains: 1) any < (this would begin a tag in XML only) 2) any & (this would begin a reference/entity in XML only) 3) any ]]> (this would be seen as a CDATA end in XML only) NOTE: * Point 1) means that '<!--' and '<![CDATA[' inside script and style are not polyglot. * Point 2) means that HTML entities, XML entities and character references inside script and style are not considere polyglot. """
(In reply to comment #0) * A (more) positive definition compared to the one in comment #1. * Instead of 'safe content'/'[not] polygot' => '[un]ambiguous code/content'. NOTE: 'safe' gives the wrong connotations - it reminds about the vague rules of Appendix C. """ 9.x Unambigious content in <script> and <style> Except for the well-defined exceptions (e.g. xml:lang="foo"), ambigious strings (strings that XML interprets different from HTML and vice-versa) are not used in Polyglot Markup. For the content of <script> and <style> this means that the following strings MUST NOT occur: 1) '<' - because XML sees it as a tag/comment/CDATA starter even inside <script>/<style>. As a consequence, '<!--' and '<![CDATA[' may not occur in the content of polyglot <script>/<style> elements. 2) '&' - because XML sees it as a reference/entity starter even inside <script>/<style>. As a consequence, HTML entities, XML entities and character references may not occur in the content of polyglot <script>/<style> elements. 3) ']]>' - (because XML sees it as a CDATA end mark) NOTE: When necessary, a possible workaround might be to include the properly escaped code inside the @src attribute of <style> and <script>. """
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the Editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the Tracker Issue; or you may create a Tracker Issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html Status: Accepted Change Description: Changed Section 9, Script and Style, as requested in these comments. Rationale: This change defines "ambiguous strings" and clarifies the roles of these characters in polyglot markup. new revision: 1.98; previous revision: 1.97