Copyright © 2013 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification defines various APIs for programmatic access to HTML and generic XML parsers by web applications for use in parsing and serializing DOM nodes.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This specification is based on the original work of the DOM Parsing and Serialization Living Specification, though it has diverged in terms of supported features, normative requirements, and algorithm specificity. As appropriate, relevant fixes from the living standard are incorporated into this document.
This document was published by the Web Applications Working Group as a Last Call Working Draft.
This document is intended to become a W3C Recommendation.
If you wish to make comments regarding this document, please send them to
www-dom@w3.org
(subscribe,
archives)
with DOM-Parsing
at the start of your email's subject.
The Last Call comment period ends 07 January 2014.
All comments are welcome.
Publication as a Last Call Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This is a Last Call Working Draft and thus the Working Group has determined that this document has satisfied the relevant technical requirements and is sufficiently stable to advance through the Technical Recommendation process.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Open issues that appear throughout the remainder of this document will be highlighted like this.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].
Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and terminate these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
User agents may impose implementation-specific limits on otherwise unconstrained inputs, e.g. to prevent denial of service attacks, to guard against running out of memory, or to work around platform-specific limitations.
When a method or an attribute is said to call another method or attribute, the user agent must invoke its internal API for that attribute or method so that e.g. the author can't change the behavior by overriding attributes or methods with custom properties or functions in ECMAScript.
Unless otherwise stated, string comparisons are done in a case-sensitive manner.
If an algorithm calls into another algorithm, any exception that is thrown by the latter (unless it is explicitly caught), must cause the former to terminate, and the exception to be propagated up to its caller.
The IDL fragments in this specification must be interpreted as required for conforming IDL fragments, as described in the Web IDL specification. [WEBIDL]
Some of the terms used in this specification are defined in [DOM4], [HTML5], and [XML10].
Vendor-specific proprietary extensions to this specification are strongly discouraged. Authors must not use such extensions, as doing so reduces interoperability and fragments the user base, allowing only users of specific user agents to access the content in question.
If vendor-specific extensions are needed, the members should be prefixed by vendor-specific strings to prevent clashes with future versions of this specification. Extensions must be defined so that the use of extensions neither contradicts nor causes the non-conformance of functionality defined in the specification.
When vendor-neutral extensions to this specification are needed, either this specification can be updated accordingly, or an extension specification can be written that overrides the requirements in this specification. When someone applying this specification to their activities decides that they will recognise the requirements of such an extension specification, it becomes an applicable specification for the purposes of conformance requirements in this specification.
The term context object means the object on which the method or attribute being discussed was called.
The following steps form the fragment parsing algorithm, whose arguments are a markup string and a context element.
If the context element's node document is an HTML document: let algorithm be the HTML fragment parsing algorithm.
If the context element's node document is an XML document: let algorithm be the XML fragment parsing algorithm.
DocumentFragment
whose
node document
is context element's
node document.
This ensures the node document for the new nodes is correct.
To serialize a Node node, the user agent must run the following steps:
null
.
To produce an HTML serialization of a Node node, the user agent must run the HTML fragment serialization algorithm [HTML5] on node and return the string produced.
To produce an XML serialization of a Node node given a context namespace namespace and prefix list prefixes, the user agent must run the appropriate steps, depending on node's interface:
The following steps for serializing a node belonging to an XML document are designed to produce a serialization that is compatible with the HTML parser. For example, elements in the XHTML namespace that contain no child nodes are serialized with an explicit begin and end tag rather than using the XML self-closing syntax. Exceptions to this rule occur when an XHTML element's equivalent HTML element is a void element that would be auto-closed by the HTML parser.
Element
Run the following algorithm:
prefix
attribute.
namespaceURI
attribute.
false
.
<
" (U+003C LESS-THAN SIGN) to markup.
null
then append the following to
markup:
:
" (U+003A COLON).
localName
attribute to markup.
null
, then run these sub-steps:
These steps determine whether a namespace prefix is serialized for this node.
xmlns:
" with the value of prefix, abort
these sub-steps. The prefix namespace definition will be
serialized later as part of the XML
serialization of node's attributes.
" (U+0020 SPACE);
xmlns:
";
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK);
null
, then run these sub-steps:
These steps determine whether a default namespace is serialized for this node.
xmlns
", abort
these sub-steps. The default namespace will be
serialized later as part of the XML
serialization of node's attributes.
" (U+0020 SPACE);
xmlns
";
="
" (U+003D EQUALS SIGN, U+0022 QUOTATION MARK);
"
" (U+0022 QUOTATION MARK);
http://www.w3.org/1999/xhtml
",
and the node's list of
children
is empty, and the node's
tagName
matches any one of the following
void elements:
"area
",
"base
",
"br
",
"col
",
"embed
",
"hr
",
"img
",
"input
",
"keygen
",
"link
",
"menuitem
",
"meta
",
"param
",
"source
",
"track
",
"wbr
";
then append the following to markup, in order:
" (U+0020 SPACE);
/
" (U+002F SOLIDUS);
true
.
http://www.w3.org/1999/xhtml
",
and the node's list of
children
is empty, then append "/
" (U+002F SOLIDUS) to markup
and set the skip end tag flag to true
.
>
" (U+003E GREATER-THAN SIGN) to markup.
true
, then return
the value of markup and skip the remaining steps. The
node is a leaf-node.
</
" (U+003C LESS-THAN SIGN, U+002F SOLIDUS) to
markup.
null
, then append the
following to markup, in order:
:
" (U+003A COLON).
localName
attribute to markup.
>
" (U+003E GREATER-THAN SIGN) to markup.
Document
Return the result of concatenating the following, in order:
null
as the namespace and an
empty list as prefixes.
Comment
<!--
", node's
data
, and
"-->
".
Comment
production, return
markup. Otherwise, throw a
DOMException
with name InvalidStateError
.
CDATASection
<![CDATA[
",
node's
data
,
and "]]>
".
CDATASection objects may be created by the historical
document.createCDATASection
API, or as a result of parsing an
XML document.
Text
data
.
&
" in markup by
"&
".
<
" in markup by
"<
".
>
" in markup by
">
".
DocumentFragment
DocumentType
ProcessingInstruction
<?
",
node's
data
,
and "?>
".
ProcessingInstruction objects may be created by the historical
document.createProcessingInstruction
API, or as a result of parsing an
XML document.
To produce a DocumentType serialization of a Node node, the user agent must return the result of the following algorithm:
<!DOCTYPE
" to markup.
" (U+0020 SPACE) to markup.
name
attribute to markup. For a node belonging to an
HTML document,
the value will be all lowercase.
publicId
is not the empty string then append the following, in order, to markup:
" (U+0020 SPACE);
PUBLIC
";
" (U+0020 SPACE);
"
" (U+0022 QUOTATION MARK);
publicId
attribute;
"
" (U+0022 QUOTATION MARK);
systemId
is not the empty string and the node's
publicId
is set to the empty string, then append the following, in order, to markup:
" (U+0020 SPACE);
SYSTEM
";
systemId
is not the empty string then append the following, in order, to markup:
" (U+0020 SPACE);
"
" (U+0022 QUOTATION MARK);
systemId
attribute;
"
" (U+0022 QUOTATION MARK);
internalSubset
and the
internalSubset
attribute's value is a non-empty string, then append the following,
in order, to markup:
" (U+0020 SPACE);
[
" (U+005B LEFT SQUARE BRACKET);
internalSubset
attribute;
]
" (U+005D RIGHT SQUARE BRACKET);
A node belonging to an
HTML document
will never have an
internalSubset
because any internalSubset
markup is ignored by the parser.
>
" (U+003E GREATER-THAN SIGN) to markup.
The XML serialization of the attributes of an element element together with a prefix list prefixes is the result of the following algorithm:
xmlns:
", then:
xmlns:
" from the beginning of the value of attr's
name.
DOMParser
interfaceenum SupportedType {
"text/html",
"text/xml",
"application/xml",
"application/xhtml+xml",
"image/svg+xml"
};
The DOMParser()
constructor
must return a new DOMParser
object.
[Constructor]
interface DOMParser {
Document parseFromString (DOMString str, SupportedType type);
};
parseFromString
The
parseFromString(str, type)
method must run these steps, depending on type:
text/html
"
Parse str with an
HTML parser
, and return the newly
created document.
The scripting flag must be set to "disabled".
meta
elements are not
taken into account for the encoding used, as a Unicode stream is passed into
the parser.
text/xml
"
application/xml
"
application/xhtml+xml
"
image/svg+xml
"
XML parser
.
SyntaxError
.
Some UAs do not throw an exception, but rather return a minimal
well-formed XML document that describes the error. In these cases, the error
document's root element will be named parsererror
and its namespace
will be set to "http://www.mozilla.org/newlayout/xml/parsererror.xml
".
In any case, the returned
document's
content type
must be the type argument. Additionally, the
document must have a
URL value equal to
the URL of the
active document, a
location value of null
.
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
str | DOMString | ✘ | ✘ | |
type | SupportedType | ✘ | ✘ |
Document
XMLSerializer
interfaceThe XMLSerializer()
constructor must return a new XMLSerializer
object.
[Constructor]
interface XMLSerializer {
DOMString serializeToString (Node root);
};
serializeToString
serializeToString(root)
method must produce an XML serialization of root and return the result.Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
root | Node | ✘ | ✘ |
DOMString
Element
interfacepartial interface Element {
[TreatNullAs=EmptyString]
attribute DOMString innerHTML;
[TreatNullAs=EmptyString]
attribute DOMString outerHTML;
void insertAdjacentHTML (DOMString position, DOMString text);
};
innerHTML
of type DOMString, The innerHTML
IDL
attribute represents the markup of the
Element
's contents.
innerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element's contents.
Can be set, to replace the contents of the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name InvalidStateError
if the Element
cannot be serialized
to XML, and a
DOMException
with name SyntaxError
if the given string is not well-formed.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on the context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on the context object instead (this might throw an exception instead of returning a string).
On setting, these steps must be run:
outerHTML
of type DOMString, The outerHTML
IDL
attribute represents the markup of the
Element
and its contents.
outerHTML
[ = value ]
Returns a fragment of HTML or XML that represents the element and its contents.
Can be set, to replace the element with nodes parsed from the given string.
In the case of an XML document,
will throw a
DOMException
with name InvalidStateError
if the element cannot be serialized to XML, and a
DOMException
with name SyntaxError
if the given string is not well-formed.
Throws a
DOMException
with name NoModificationAllowedError
if the parent of the element is the
Document
node.
On getting, if the context object's node document is an HTML document, then the attribute must return the result of running the HTML fragment serialization algorithm on a fictional node whose only child is context object; otherwise, the context object's node document is an XML document, and the attribute must return the result of running the XML fragment serialization algorithm on that fictional node instead (this might throw an exception instead of returning a string).
On setting, the following steps must be run:
Document
, throw a
DOMException
with name NoModificationAllowedError
exception and terminate these steps.
DocumentFragment
, let
parent be a new
Element
with
body
as its
local name,
insertAdjacentHTML
insertAdjacentHTML
(position, text)
Parses the given string text as HTML or XML and inserts the resulting nodes into the tree in the position given by the position argument, as follows:
Throws a SyntaxError
exception if the arguments have invalid values (e.g., in the case of an
XML document, if the given string is
not well-formed).
Throws a
DOMException
with name NoModificationAllowedError
if the given position isn't possible (e.g. inserting elements
after the root element of a Document
).
The
insertAdjacentHTML(position, text)
method must run these steps:
Let context be the context object's parent.
If context is null or a
document, throw
a
DOMException
with name NoModificationAllowedError
and terminate these steps.
Throw a SyntaxError
exception.
Element
or the following are all true:
html
", and
let context be a new
Element
with
body
as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
position | DOMString | ✘ | ✘ | |
text | DOMString | ✘ | ✘ |
void
Range
interfacepartial interface Range {
DocumentFragment createContextualFragment (DOMString fragment);
};
createContextualFragment
createContextualFragment
(fragment)
DocumentFragment
, created
from the markup string given.
The
createContextualFragment(fragment)
method must run these steps:
Let element be as follows, depending on node's interface:
Document
DocumentFragment
Element
Text
Comment
DocumentType
ProcessingInstruction
html
", and
let element be a new element with
body
" as its
local name,
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
fragment | DOMString | ✘ | ✘ |
DocumentFragment
Thanks to Ms2ger [Mozilla] for maintaining the initial drafts of this specification and for its continued improvement in the Living Standard.
Thanks to Anne van Kesteren, Aryeh Gregor, Boris Zbarsky, Henri Sivonen, Simon Pieters and timeless for their useful comments.
Special thanks to Ian Hickson for defining the
innerHTML
and
outerHTML
attributes, and the
insertAdjacentHTML()
method in
[HTML5] and his useful comments.