Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and serializing DOM nodes
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
Comments submitted in regard to this document should have their subject line prefixed with the string [DOM-Parsing]
to help facilitate tracking on the
www-dom mailing list.
This document was published by the Web Applications Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to www-dom@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Various issues are listed in the rest of the document.
This specification currently requires using the XML Parser for some APIs, when in an XML document. It is unclear whether consensus can be found for this approach.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.
The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]
Some of the terms used in this specification are defined in [DOM4], [HTML5], and [XML10].
Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.
If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.
When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.
The term context object means the object on which the method or attribute being discussed was called.
The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element.
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
DocumentFragment
whose
node document
is context element's
node document.
This ensures the node document for the new nodes is correct.
To serialize a Node node, the user agent must run the following steps:
To produce an HTML serialization of a Node node, the user agent must run the appropriate steps, depending on node's interface:
Element
Document
DocumentFragment
Run the HTML fragment serialization algorithm on node. Return the returned string.
Comment
Text
DocumentType
ProcessingInstruction
To produce an XML serialization of a Node node, the user agent must run the appropriate steps, depending on node's interface:
Element
Return the concatenation of the following strings:
<
" (U+003C LESS-THAN SIGN);
tagName
attribute;
escaping / throwing
>
" (U+003E GREATER-THAN SIGN);
</
" (U+003C LESS-THAN SIGN, U+002F SOLIDUS);
tagName
attribute;
>
" (U+003E GREATER-THAN SIGN).
Document
Run the XML fragment serialization algorithm on node. Return the string this produced.
Comment
Let markup the concatenation of "<!--
", node's
data
, and
"-->
".
If markup matches the
Comment
production, return
markup. Otherwise, throw a
DOMException
with name InvalidStateError
.
Text
Let data be node's
data
.
If node has its serialize as CDATA flag set, run the following steps:
CData
production, throw a
DOMException
with name InvalidStateError
and terminate the entire algorithm.
<![CDATA[
", data, and
"]]>
".
Otherwise, return data.
DocumentFragment
Let markup the empty string.
For each child of node, in order, produce an XML serialization of the child and concatenate the result to markup.
Return markup.
DocumentType
ProcessingInstruction
The XML serialization of the attributes of an element element is the result of the following algorithm:
DOMParser
interfaceenum SupportedType {
"text/html",
"text/xml",
"application/xml",
"application/xhtml+xml",
"image/svg+xml"
};
The DOMParser()
constructor
must return a new DOMParser
object.
[Constructor]
interface DOMParser {
Document parseFromString (DOMString str, SupportedType type);
};
parseFromString
The
parseFromString(str, type)
method must run these steps, depending on type:
text/html
"
Parse str with an
HTML parser
, and return the newly
created document.
The scripting flag must be set to "disabled".
meta
elements are not
taken into account for the encoding used, as a Unicode stream is passed into
the parser.
text/xml
"
application/xml
"
application/xhtml+xml
"
image/svg+xml
"
XML parser
.
XMLDocument
.
Let root be a new
Element
, with its
local name
set to "parsererror
" and its
namespace
set to
"http://www.mozilla.org/newlayout/xml/parsererror.xml
".
At this point user agents may append nodes to root, for example to describe the nature of the error.
In any case, the returned document's content type must be the type argument.
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
str | DOMString | ✘ | ✘ | |
type | SupportedType | ✘ | ✘ |
Document
XMLSerializer
interfaceThe XMLSerializer()
constructor must return a new XMLSerializer
object.
[Constructor]
interface XMLSerializer {
DOMString serializeToString (Node root);
};
Element
interfaceenum insertAdjacentHTMLPosition {
"beforebegin",
"afterbegin",
"beforeend",
"afterend"
};
partial interface Element {
attribute DOMString innerHTML;
attribute DOMString outerHTML;
void insertAdjacentHTML (insertAdjacentHTMLPosition position, DOMString text);
};
innerHTML
of type DOMStringThe innerHTML
IDL
attribute represents the markup of the
Element
's contents.
innerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element's contents.
Can be set, to replace the contents of the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name InvalidStateError
if the Element
cannot be serialized
to XML, and a
DOMException
with name SyntaxError
if the given string is not well-formed.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the context object instead (this might throw an exception instead of returning a string).
On setting, these steps must be run:
outerHTML
of type DOMStringThe outerHTML
IDL
attribute represents the markup of the
Element
and its contents.
outerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.
Can be set, to replace the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name InvalidStateError
if the element cannot be serialized to XML, and a
DOMException
with name SyntaxError
if the given string is not well-formed.
Throws a
DOMException
with name NoModificationAllowedError
if the parent of the element is the
Document
node.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on a fictional node whose only child is context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on that fictional node instead (this might throw an exception instead of returning a string).
On setting, the following steps must be run:
Document
, throw a
DOMException
with name NoModificationAllowedError
exception and terminate these steps.
DocumentFragment
, let
parent be a new
Element
with
body
as its
local name,
insertAdjacentHTML
insertAdjacentHTML
(position, text)
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
Throws a TypeError
exception if the position argument
has an invalid value.
In XML documents,
throws a
DOMException
with name SyntaxError
if the given string is not well-formed.
Throws a
DOMException
with name NoModificationAllowedError
if the given position isn't possible (e.g. inserting elements
after the root element of a Document
).
The
insertAdjacentHTML(position, text)
method must run these steps:
Let context be the context object's parent.
If context is null or a
document, throw
a
DOMException
with name NoModificationAllowedError
and terminate these steps.
Element
or the following are all true:
html
", and
let context be a new
Element
with
body
as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
position | insertAdjacentHTMLPosition | ✘ | ✘ | |
text | DOMString | ✘ | ✘ |
void
Text
interfacepartial interface Text {
attribute boolean serializeAsCDATA;
};
serializeAsCDATA
of type booleanserializeAsCDATA
[ = value ]
Text
nodes have an additional
associated flag, the serialize as CDATA flag.
The
serializeAsCDATA
attribute must return true if the context object has its
serialize as CDATA flag set, or false otherwise.
Setting the serializeAsCDATA
attribute must, if the new value is true, set the
context object's serialize as CDATA flag, or unset
it otherwise.
Range
interfacepartial interface Range {
DocumentFragment createContextualFragment (DOMString fragment);
};
createContextualFragment
createContextualFragment
(fragment)
DocumentFragment
, created
from the markup string given.
The
createContextualFragment(fragment)
method must run these steps:
DOMException
with name InvalidStateError
and terminate these steps.
Let element be as follows, depending on node's interface:
Document
DocumentFragment
Element
Text
Comment
DocumentType
ProcessingInstruction
html
", and
let element be a new element with
body
" as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
fragment | DOMString | ✘ | ✘ |
DocumentFragment
Thanks to Ms2ger for maintaining the initial drafts of this specification.
Thanks to Anne van Kesteren, Aryeh Gregor, Henri Sivonen, Simon Pieters and timeless for their useful comments.
Special thanks to Ian Hickson for defining the
innerHTML
and
outerHTML
attributes, and the
insertAdjacentHTML()
method in
[HTML5] and his useful comments.