The DOM Level 2 TreeWalker, NodeIterator, and, Filter interfaces provide
simple and efficient traversal for document nodes. The TreeWalker,
NodeIterator, and Filter interfaces are optional. A DOM application can
use the hasFeature
method of the
DOMImplementation
interface to determine whether they are
supported or not. The feature string for all the interfaces listed in
this section is "Traversal". Iterators and TreeWalkers are two different
ways of representing the nodes of a document subtree and a position
within that collection. An Iterator flattens a subtree into an ordered
list of document nodes, presented in document order. Because this is a
flat list, presented without respect to hierarchy, Iterators have methods
to move forward and backward, but not to move up and down. A TreeWalker
maintains the hierarchical relationships of the subtree, allowing
navigation of this hierarchy.
Iterators and TreeWalkers each present a view of a document subtree that may not contain all nodes found in the subtree. In this specification, we refer to this as the logical view to distinguish it from the physical view, which corresponds to the document subtree per se. When an Iterator or TreeWalker is created, it may be associated with a Filter, which examines a node and determines whether it should be appear in the logical view. In addition, flags may be used to specify which node types should occur in the logical view. Iterators and TreeWalkers are dynamic - the logical view changes to reflect changes made to the underlying document.
An Iterator allows a list of nodes to be returned sequentially. In the
current DOM interfaces, this list will always consist of the nodes of a
subtree, presented in document order. When an Iterator is first
created, calling nextNode() returns the first node in the logical view
of the subtree; in most cases, this is the root of the subtree. When no
more nodes are present, nextNode() returns null
.
Iterators are created using the createNodeIterator method found in the
DocumentTraversal
interface. When an Iterator is created,
flags can be used to determine which node types will be "visible" and
which nodes will be "invisible" while traversing the tree; these flags
can be combined using the OR operator. Nodes that are "invisible" are
skipped over by the Iterator as though they did not exist. The
following code creates an Iterator, then calls a function to print the
name of each element:
NodeIterator iter=document.createNodeIterator(root, SHOW_ELEMENT, null); while (Node n = iter.nextNode()) printMe(n);
Iterators present nodes as an ordered list, and move forward and backward within this list. The Iterator's position is always between two nodes, before the first node, or after the last node. When an Iterator is first created, the position is set before the first item. The following diagram shows the list view that an Iterator might provide for a particular subtree, with the position indicated by an asterisk '*' :
* A B C D E F G H I
Each call to nextNode() returns the next node and advances the position. For instance, if we start with the above position, the first call to nextNode() returns "A" and advances the Iterator:
[A] * B C D E F G H I
An iterator uses the node last returned to maintain its position. This node is known as the reference node. In these diagrams, we use square brackets to indicate the reference node.
A call to previousNode() returns the previous node and moves the position backward. For instance, if we start with the Iterator between "A" and "B", it would return "A" and move to the position shown below:
* [A] B C D E F G H I
If nextNode() is called at the end of a list, or previousNode() is
called at the beginning of a list, it returns null
and
does not change the position of the Iterator. When an Iterator is
first created, the reference node is the first node:
* [A] B C D E F G H I
An Iterator may be active while the data structure it navigates is being edited, so an Iterator must behave gracefully in the face of change. Additions and removals in the underlying data structure do not invalidate an Iterator; in fact, an Iterator is never invalidated. To make this possible, the Iterator uses the reference node to maintain its position. The state of an Iterator also depends on whether the Iterator is positioned before or after the reference node. If the reference node is removed, another node becomes the reference node.
If changes to the iterated list do not remove the reference node, they do not affect the state of the Iterator. For instance, the Iterator's state is not affected by inserting new nodes in the vicinity of the iterator or removing nodes other than the reference node. Suppose we start from the following position:
A B C [D] * E F G H I
Now let's remove "E". The resulting state is:
A B C [D] * F G H I
If a new node is inserted, the iterator stays close to the reference node, so if a node is inserted between "D" and "F", it will occur between the Iterator and "F":
A B C [D] * X F G H I
Moving a node is equivalent to a removal followed by an insertion. If we move "I" to the position before "X" the result is:
A B C [D] * I X F G H
If the reference node is removed, a different node is selected as the reference node. If the reference node is before the Iterator, which is usually the case after nextNode() has been called, the nearest node before the Iterator is chosen as the new reference node. Suppose we remove the "D" node, starting from the following state:
A B C [D] * F G H I
The "C" node becomes the new reference node, since it is the nearest node to the Iterator that is before the Iterator:
A B [C] * F G H I
If the reference node is after the Iterator, which is usually the case after previousNode() has been called, the nearest node after the Iterator is chosen as the new reference node. Suppose we remove "E", starting from the following state:
A B C D * [E] F G H I
The "F" node becomes the new reference node, since it is the nearest node to the Iterator that is after the Iterator:
A B C D * [F] G H I
Moving a node is equivalent to a removal followed by an insertion. Suppose we wish to move the "D" node to the end of the list, starting from the following state:
A B C [D] * F G H I C
The resulting state is as follows:
A B [C] * F G H I D
One special case arises when the reference node is the last node in the list and the reference node is removed. Suppose we remove node "C", starting from the following state:
A B * [C]
According to the rules we have given, the new reference node should be the nearest node after the Iterator, but there are no further nodes after "C". If there is no node in the original direction of the reference node, the nearest node in the opposite direction is selected as the reference node:
A [B] *
If the Iterator is part of a block of nodes that is removed, the above rules clearly indicate what is to be done. For instance, suppose "C" is the parent node of "D", "E", and "F", and we remove "C", starting with the following state:
A B C [D] * E F G H I D
The resulting state is as follows:
A [B] * G H I D
The underlying data structure that is being iterated may contain nodes that are not part of the logical view, and therefore will not be returned by the Iterator. If invisible nodes are present, nextNode() returns the next visible node, skipping over nodes that are to be excluded because of the value of the whatToShow flag. If a filter is present, it is applied before returning a node; if the filter rejects a node, the process is repeated until a node is accepted by the filter. That node is returned. If no visible nodes are encountered, a null is returned and the Iterator is positioned at the end of the list. In this case, the reference node is the last node in the list, whether or not it is visible. The same approach is taken, in the opposite direction, for previousNode().
In the following examples, we will use lower case letters to represent nodes that are in the data structure, but which are not in the logical view. For instance, consider the following list:
A [B] * c d E F G
A call to nextNode() returns E and advances to the following position:
A B c d [E] * F G
Nodes that are not visible may nevertheless be used as reference nodes if a reference node is removed. Suppose node "E" is removed, started from the state given above. The resulting state is:
A B c [d] * F G
Suppose a new node "X", which is visible, is inserted before "d". The resulting state is:
A B c X [d] * F G
Note that a call to previousNode() now returns node X. It is important not to skip over invisible nodes when the reference node is removed, because there are cases, like the one just given above, where the wrong results will be returned. When "E" was removed, if the new reference node had been "B" rather than "d", calling previousNode() would not return "X".
Filters allow the user to create objects that "filter out" nodes. Each filter contains a user-written function that looks at a node and determines whether or not it should be filtered out. To use a filter, you create an Iterator or a TreeWalker that uses the filter. The Iterator or TreeWalker applies the filter to each node, and if the filter rejects the node, the Iterator or TreeWalker skips over the node as though it were not present in the document. Filters need not know how to navigate the structure that contains the nodes on which they operate.
A Filter contains one method named acceptNode(), which allows an Iterator or TreeWalker to pass a Node to a filter and ask whether it should be present in the logical view. The acceptNode() function returns one of three values to state how the Node should be treated. If acceptNode() returns FILTER_ACCEPT, the Node will be present in the logical view; if it returns FILTER_SKIP, the Node will not be present in the logical view, but the children of the Node may; if it returns FILTER_REJECT, neither the Node nor its descendants will be present in the logical view. Since Iterators present nodes as an ordered list, without hierarchy, FILTER_REJECT and FILTER_SKIP are synonyms for Iterators, skipping only the single current node.
Consider a filter that accepts the named anchors in an HTML document. In HTML, an HREF can refer to any A element that has a NAME attribute. Here is a filter in Java that looks at a node and determines whether it is a named anchor:
class NamedAnchorFilter implements NodeFilter { short acceptNode(Node n) { if (n instanceof Element) { Element e = (element)n; if (! e.getNodeName().equals("A")) return FILTER_SKIP; if (e.getAttributeNode("NAME") != null) { return FILTER_ACCEPT; } } return FILTER_SKIP; } }
If the above Filter were to be used with only Iterators, it could have used FILTER_REJECT wherever FILTER_SKIP is used, and the behavior would not change. For TreeWalker, though, FILTER_REJECT would reject the children of any element that is not a named anchor, and since named anchors are always contained within other elements, this would have meant that no named anchors would be found. FILTER_SKIP rejects the given node, but continues to examine the children; therefore, the above filter will work with either an Iterator or a TreeWalker.
To use this filter, the user would create an instance of the filter and create an Iterator using it:
NamedAnchorFilter myFilter = new NamedAnchorFilter(); NodeIterator iter=(DocumentTraversal)document.creatNodeIterator(node, SHOW_ELEMENT, myFilter);
When writing a Filter, users should avoid writing code that can throw an exception. However, because an implementation can not prevent users from doing so, it is important that the behavior of filters that throw an exception be well-defined. A TreeWalker or Iterator does not catch or alter an exception thrown by a filter, but lets it propagate up to the user's code. The following functions may invoke a Filter, and may therefore propagate an exception if one is thrown by a Filter:
Well-designed Filters do not modify the underlying document structure, but a Filter implementation can not prevent a user from writing code that does modify the document structure. Filters do not provide any special processing to handle this case. For instance, if a Filter removes a node from a document, it can still accept the node, which means that the node may be returned by the Iterator or TreeWalker even though it is no longer in the document. In general, this may lead to inconsistent, confusing results, so we encourage users to write Filters that make no changes to document structures.
Iterator and TreeWalker apply whatToShow flags before applying Filters. If a node is rejected by the active whatToShow flags, a Filter will not be called to evaluate that node. When a node is rejected by the active whatToShow flags, children of that node will still be considered, and Filters may be called to evaluate them.
The TreeWalker
interface provides many of the same
benefits as the Iterator interface. The main difference between these
two interfaces is that the TreeWalker
presents a
tree-oriented view of the nodes in a subtree, and an Iterator presents
a list-oriented view. In other words, an Iterator allows you to move
forward or back, but a TreeWalker
allows you to move to
the parent of a node, to one of its children, or to a sibling.
Using a TreeWalker
is quite similar to navigation using
the Node directly, and the navigation methods for the two interfaces
are analogous. For instance, here is a function that walks over a tree
of nodes in document order, taking separate actions when first entering
a node and after processing any children:
processMe(Node n) { nodeStartActions(n); for (Node child=n.firstChild(); child != null; child=child.nextSibling()) processMe(child); } nodeEndActions(n); }
Doing the same thing using a TreeWalker
is quite
similar. There is one difference: since navigation on the
TreeWalker
changes the current position, the position at
the end of the function has changed. A read/write attribute named
currentNode
allows the current node for a
TreeWalker
to be set. We will use this to ensure that the
position of the TreeWalker
is restored when this function
is completed:
processMe(TreeWalker tw) { Node n = tw.getCurrentNode(); nodeStartActions(tw); for (Node child=tw.firstChild(); child!=null; child=tw.nextSibling()) { processMe(tw); } tw.setCurrentNode(n); nodeEndActions(tw); }
The advantage of using a TreeWalker
instead of direct
Node navigation is that the TreeWalker
allows the user to
choose an appropriate view of the tree. Flags may be used to show or
hide comments or processing instructions, entities may be expanded or
left as entity references. In addition, Filters may be used to present
a custom view of the tree. Suppose a program needs a view of a document
that shows which tables occur in each chapter, listed by chapter. In
this view, only the chapter elements and the tables that they contain
are seen. The first step is to write an appropriate filter:
class TablesInChapters implements NodeFilter { short acceptNode(Node n) { if (n instanceof Element) { if (n.getNodeName().equals("CHAPTER")) return FILTER_ACCEPT; if (n.getNodeName().equals("TABLE")) return FILTER_ACCEPT; if (n.getNodeName().equals("SECT1") || n.getNodeName().equals("SECT2") || n.getNodeName().equals("SECT3") || n.getNodeName().equals("SECT4") || n.getNodeName().equals("SECT5") || n.getNodeName().equals("SECT6") || n.getNodeName().equals("SECT7")) return FILTER_SKIP; } return FILTER_REJECT; } }
Now the program can create an instance of this Filter, create a
TreeWalker
that uses it, and pass this
TreeWalker
to our ProcessMe() function:
TablesInChapters tablesInChapters = new TablesInChapters(); TreeWalker tw = someDocTraversal.createTreeWalker(root, SHOW_ELEMENT, tablesInChapters); processMe(tw);
Without making any changes to the above ProcessMe() function, it now
processes only the CHAPTER and TABLE elements. The programmer can write
other filters or set other flags to choose different sets of nodes; if
functions use TreeWalker
to navigate, they will support
any view of the document defined with a TreeWalker
.
NodeIterators are used to step through a set of nodes, e.g. the set of nodes in a NodeList, the document subtree governed by a particular node, the results of a query, or any other set of nodes. The set of nodes to be iterated is determined by the implementation of the NodeIterator. DOM Level 2 specifies a single NodeIterator implementation for document-order traversal of a document subtree. Instances of these iterators are created by calling DocumentTraversal.createNodeIterator().
Any Iterator that returns nodes may implement the
NodeIterator
interface. Users and vendor libraries may
also choose to create Iterators that implement the
NodeIterator
interface.
// Introduced in DOM Level 2: interface NodeIterator { readonly attribute long whatToShow; // Constants for whatToShow const unsigned long SHOW_ALL = 0x0000FFFF; const unsigned long SHOW_ELEMENT = 0x00000001; const unsigned long SHOW_ATTRIBUTE = 0x00000002; const unsigned long SHOW_TEXT = 0x00000004; const unsigned long SHOW_CDATA_SECTION = 0x00000008; const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010; const unsigned long SHOW_ENTITY = 0x00000020; const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040; const unsigned long SHOW_COMMENT = 0x00000080; const unsigned long SHOW_DOCUMENT = 0x00000100; const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200; const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400; const unsigned long SHOW_NOTATION = 0x00000800; readonly attribute NodeFilter filter; readonly attribute boolean expandEntityReferences; Node nextNode(); Node previousNode(); };
whatToShow
of type long
, readonly
These are the available values for the whatToShow parameter. They are the same as the set of possible types for Node, and their values are derived by using a bit position corresponding to the value of NodeType for the equivalent node type.
SHOW_ALL | Show all nodes. |
SHOW_ELEMENT | Show element nodes. |
SHOW_ATTRIBUTE | Show attribute nodes. This is meaningful only when creating an Iterator with an attribute node as its root; in this case, it means that the attribute node will appear in the first position of the iteration. Since attributes are not part of the document tree, they do not appear when iterating over the document tree. |
SHOW_TEXT | Show text nodes. |
SHOW_CDATA_SECTION | Show CDATASection nodes. |
SHOW_ENTITY_REFERENCE | Show Entity Reference nodes. |
SHOW_ENTITY | Show Entity nodes. This is meaningful only when creating an Iterator with an Entity node as its root; in this case, it means that the Entity node will appear in the first position of the iteration. Since entities are not part of the document tree, they do not appear when iterating over the document tree. |
SHOW_PROCESSING_INSTRUCTION | Show ProcessingInstruction nodes. |
SHOW_COMMENT | Show Comment nodes. |
SHOW_DOCUMENT | Show Document nodes. |
SHOW_DOCUMENT_TYPE | Show DocumentType nodes. |
SHOW_DOCUMENT_FRAGMENT | Show DocumentFragment nodes. |
SHOW_NOTATION | Show Notation nodes. This is meaningful only when creating an Iterator with a Notation node as its root; in this case, it means that the Notation node will appear in the first position of the iteration. Since notations are not part of the document tree, they do not appear when iterating over the document tree. |
filter
of type NodeFilter
, readonly
expandEntityReferences
of type boolean
, readonly
nextNode
previousNode
Filters are objects that know how to "filter out" nodes. If an
Iterator or TreeWalker
is given a filter, before it
returns the next node, it applies the filter. If the filter says to
accept the node, the Iterator returns it; otherwise, the Iterator
looks for the next node and pretends that the node that was
rejected was not there.
The DOM does not provide any filters. Filter is just an interface that users can implement to provide their own filters.
Filters do not need to know how to iterate, nor do they need to know anything about the data structure that is being iterated. This makes it very easy to write filters, since the only thing they have to know how to do is evaluate a single node. One filter may be used with a number of different kinds of Iterators, encouraging code reuse.
// Introduced in DOM Level 2: interface NodeFilter { // Constants returned by acceptNode const short FILTER_ACCEPT = 1; const short FILTER_REJECT = 2; const short FILTER_SKIP = 3; short acceptNode(in Node n); };
The following constants are returned by the acceptNode() method:
FILTER_ACCEPT |
Accept the node. Navigation methods defined for
Iterator or TreeWalker will return this
node. |
FILTER_REJECT |
Reject the node. Navigation
methods defined for Iterator or TreeWalker will
not return this node. For TreeWalker , the children
of this node will also be rejected. Iterators treat this as a
synonym for FILTER_SKIP. |
FILTER_SKIP |
Reject the
node. Navigation methods defined for Iterator or
TreeWalker will not return this node. For both
Iterator and Treewalker, the children of this node will still
be considered. |
acceptNode
| The node to check to see if it passes the filter or not. |
|
a constant to determine whether the node is accepted, rejected, or skipped, as defined above. |
TreeWalker
objects are used to navigate a document
tree or subtree using the view of the document defined by its
whatToShow
flags and any filters that are defined for
the TreeWalker
. Any function which performs navigation
using a TreeWalker
will automatically support any view
defined by a TreeWalker
.
Omitting nodes from the logical view of a subtree can result in a structure that is substantially different from the same subtree in the complete, unfiltered document. Nodes that are siblings in the TreeWalker view may be children of different, widely separated nodes in the original view. For instance, consider a Filter that skips all nodes except for Text nodes and the root node of a document. In the logical view that results, all text nodes will be siblings and appear as direct children of the root node, no matter how deeply nested the structure of the original document.
// Introduced in DOM Level 2: interface TreeWalker { readonly attribute long whatToShow; // Constants for whatToShow const unsigned long SHOW_ALL = 0x0000FFFF; const unsigned long SHOW_ELEMENT = 0x00000001; const unsigned long SHOW_ATTRIBUTE = 0x00000002; const unsigned long SHOW_TEXT = 0x00000004; const unsigned long SHOW_CDATA_SECTION = 0x00000008; const unsigned long SHOW_ENTITY_REFERENCE = 0x00000010; const unsigned long SHOW_ENTITY = 0x00000020; const unsigned long SHOW_PROCESSING_INSTRUCTION = 0x00000040; const unsigned long SHOW_COMMENT = 0x00000080; const unsigned long SHOW_DOCUMENT = 0x00000100; const unsigned long SHOW_DOCUMENT_TYPE = 0x00000200; const unsigned long SHOW_DOCUMENT_FRAGMENT = 0x00000400; const unsigned long SHOW_NOTATION = 0x00000800; readonly attribute NodeFilter filter; readonly attribute boolean expandEntityReferences; attribute Node currentNode; Node parentNode(); Node firstChild(); Node lastChild(); Node previousSibling(); Node nextSibling(); Node previousNode(); Node nextNode(); };
whatToShow
of type long
, readonly
These are
the available values for the whatToShow
parameter. They are the same as the set of possible types for
Node
, and their values are derived by using a bit
position corresponding to the value of NodeType for the
equivalent node type.
SHOW_ALL | Show all nodes. |
SHOW_ELEMENT |
Show
Element nodes. |
SHOW_ATTRIBUTE |
Show Attribute
nodes. This is meaningful only when creating a
TreeWalker with an attribute node as its root; in
this case, it means that the attribute node will appear as the
root node of the filtered tree. Since attributes are not part
of the document tree, they do not appear when iterating over
the document tree. |
SHOW_TEXT |
Show Text
nodes. |
SHOW_CDATA_SECTION |
Show CDATASection
nodes. |
SHOW_ENTITY_REFERENCE | Show Entity Reference nodes. |
SHOW_ENTITY | Show Entity nodes. This is meaningful only when creating an Iterator with an Entity node as its root; in this case, it means that the Entity node will appear in the first position of the iteration. Since entities are not part of the document tree, they do not appear when iterating over the document tree. |
SHOW_PROCESSING_INSTRUCTION |
Show
ProcessingInstruction
nodes. |
SHOW_COMMENT |
Show
Comment nodes. |
SHOW_DOCUMENT |
Show Document
nodes. |
SHOW_DOCUMENT_TYPE | Show DocumentType nodes. |
SHOW_DOCUMENT_FRAGMENT |
Show DocumentFragment
nodes. |
SHOW_NOTATION | Show Notation nodes. This is meaningful only when creating an Iterator with a Notation node as its root; in this case, it means that the Notation node will appear in the first position of the iteration. Since notations are not part of the document tree, they do not appear when iterating over the document tree. |
filter
of type NodeFilter
, readonly
expandEntityReferences
of type boolean
, readonly
currentNode
of type Node
parentNode
The new parent node, or null if the current node has no parent in the TreeWalker's logical view. |
firstChild
TreeWalker
to the first child of the
current node, and returns the new node. If the current node has
no children, returns null
, and retains the current
node.
The new node, or |
lastChild
TreeWalker
to the last child of the
current node, and returns the new node. If the current node has
no children, returns null
, and retains the current
node.
The new node, or |
previousSibling
TreeWalker
to the previous sibling of the
current node, and returns the new node. If the current node has
no previous sibling, returns null
, and retains the
current node.
The new node, or |
nextSibling
TreeWalker
to the next sibling of the
current node, and returns the new node. If the current node has
no next sibling, returns null
, and retains the
current node.
The new node, or |
previousNode
TreeWalker
to the previous node in
document order relative to the current node, and returns the new
node. If the current node has no previous node, returns
null
, and retains the current node.
The new node, or |
nextNode
TreeWalker
to the next node in document
order relative to the current node, and returns the new node. If
the current node has no next node, returns null
, and
retains the current node.
The new node, or |
DocumentTraversal
contains methods that creates
Iterators to traverse a node and its children in document order
(depth first, pre-order traversal, which is equivalent to the order
in which the start tags occur in the text representation of the
document).
// Introduced in DOM Level 2: interface DocumentTraversal { NodeIterator createNodeIterator(in Node root, in long whatToShow, in NodeFilter filter, in boolean entityReferenceExpansion); TreeWalker createTreeWalker(in Node root, in long whatToShow, in NodeFilter filter, in boolean entityReferenceExpansion) raises(DOMException); };
createNodeIterator
| The node which will be iterated together with its children. The iterator is initially positioned just before this node. The whatToShow flags and the filter, if any, are not considered when setting this position. | |||
|
| This flag specifies which node types may appear in the logical view of the tree presented by the Iterator. See the description of Iterator for the set of possible values. These flags can be combined using OR. These flags can be combined using | ||
| The Filter to be used with this TreeWalker, or null to indicate no filter. | |||
|
| The value of this flag determines whether entity reference nodes are expanded. |
The newly created |
createTreeWalker
| The node which will serve as the root for the
| |||
|
| This flag specifies which node types may appear in the logical view of the tree presented by the Iterator. See the description of TreeWalker for the set of possible values. These flags can be combined using OR. These flags can be combined using | ||
| The Filter to be used with this TreeWalker, or null to indicate no filter. | |||
|
| The value of this flag determines whether entity reference nodes are expanded. |
The newly created |
Raises the exception NOT_SUPPORTED_ERR if the specified root node is null. |