Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification defines the WebDriver API, a platform-and language-neutral interface that allows programs or scripts to introspect into, and control the behaviour of, a web browser. The WebDriver API is primarily intended to allow developers to write tests that automate a browser from a separate controlling process, but may also be implemented in such a way as to allow in-browser scripts to control a browser.
The WebDriver API is defined by a set of interfaces to discover and manipulate DOM elements on a page, and to control the behaviour of the containing browser.
This specification also includes a non-normative reference serialisation (to JSON) of the interface's invocations and responses that may be useful for browser vendors.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
If you wish to make comments regarding this document, please email feedback to public-browser-tools-testing@w3.org. All feedback is welcome, and the editors will read and consider all feedback.
This specification is still under active development and may not be stable. Any implementors who are not actively participating in the preparation of this specification may find unexpected changes occurring. It is suggested that any implementors join the WG for this specification. Despite not being stable, it should be noted that this specification is strongly based on an existing Open Source project — Selenium WebDriver — and the existing implementations of the API defined within that project.
This document was published by the Browser Testing and Tools Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-browser-tools-testing@w3.org (subscribe, archives). All feedback is welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
The WebDriver API aims to provide a synchronous API that can be used for a variety of use cases, though it is primarily designed to support automated testing of web apps.
This specification is intended for implementors of the WebDriver API. It is not intended as light bed time reading.
Where possible and appropriate, the WebDriver API references existing specifications. For example, the list of boolean attributes for elements is drawn from the HTML5 specification. When references are made, this specification will link to the relevant sections.
The WebDriver API can be thought of as a client/server process. However, implementation details can mean that this terminology becomes confusing. For this reason, the two sides of the API are called the "local" and the "remote" ends.
There is no requirement that the local and remote ends be in different processes.
The WebDriver API is designed to be used both in-process and out-of-process. The IDL given in this specification and summarized in Appendix XXXX should be used as the basis for the user-facing API. When used out-of-process, the WebDriver API defines command/repsonse objects that must be used. How these are encoded and transmitted between the browser being automated and the user of the API is left undefined, but a non-normative implementation of this as JSON over HTTP is given in appendix XXXX.
A command represents a call to the remote end of the WebDriver API.
interface Command {
attribute string
name;
attribute dictionary
parameters;
attribute string
sessionId;
};
name
of type string
parameters
of type dictionary
sessionId
of type string
A response represents the value returned from the remote end of the WebDriver API.
interface Response {
readonly attribute string
sessionId;
readonly attribute integer
status;
readonly attribute object
value;
};
sessionId
of type string
, readonlystatus
of type integer
, readonlyvalue
of type object
, readonlyCommand
that has been executed. In the specification, each command definition will make clear what the expected return type is.Any Command or Response may contain additional fields than those listed above. The content of fields must be maintained, unaltered by any intermeditate processing nodes. There is no requirement to maintain the ordering of fields.
This requirement exists to allow for extension of the protocol, and to allow implementors to decorate Commands and Responses with additional information, perhaps giving context to a series of messages. or providing security information.
The WebDriver API indicates the success or failure of a command invocation via a status code on the
Response
object. The following values are used and have the following meanings.
Status Code | Summary | Detail |
---|---|---|
0 | Success | The command executed successfully. |
7 | NoSuchElement | An element could not be located on the page using the given search parameters. |
8 | NoSuchFrame | A request to switch to a frame could not be satisfied because the frame could not be found. |
9 | UnknownCommand | The requested resource could not be found, or a request was received using an HTTP method that is not supported by the mapped resource. |
10 | StaleElementReference | An element command failed because the referenced element is no longer attached to the DOM. |
11 | ElementNotVisible | An element command could not be completed because the element is not visible on the page. |
12 | InvalidElementState | An element command could not be completed because the element is in an invalid state (e.g. attempting to click a disabled element). |
13 | UnknownError | An unknown server-side error occurred while processing the command. |
15 | ElementIsNotSelectable | An attempt was made to select an element that cannot be selected. |
17 | JavaScriptError | An error occurred while executing user supplied !JavaScript. |
19 | XPathLookupError | An error occurred while searching for an element by XPath. |
21 | Timeout | An operation did not complete before its timeout expired. |
23 | NoSuchWindow | A request to switch to a different window could not be satisfied because the window could not be found. |
24 | InvalidCookieDomain | An illegal attempt was made to set a cookie under a different domain than the current page. |
25 | UnableToSetCookie | A request to set a cookie's value could not be satisfied. |
26 | UnexpectedAlertOpen | A modal dialog was open, blocking this operation |
27 | NoAlertOpenError | An attempt was made to operate on a modal dialog when one was not open. |
28 | ScriptTimeout | A script did not complete before its timeout expired. |
29 | InvalidElementCoordinates | The coordinates provided to an interactions operation are invalid. |
30 | IMENotAvailable | IME was not available. |
31 | IMEEngineActivationFailed | An IME engine could not be started. |
32 | InvalidSelector | Argument was an invalid selector (e.g. XPath/CSS). |
33 | SessionNotCreatedException | A new session could not be created. |
34 | MoveTargetOutOfBounds | The target for mouse interaction is not on the viewport and cannot be brought into the viewport. |
Different browsers support different levels of various specifications. For example, some support SVG or the CSS Selector API, but only browsers that implement HTML5 will support LocalStorage. The WebDriver API provides a mechanism to query the supported capabilities of a browser. Each broad area of functionality within the WebDriver API has an associated capability string. Whether a particular capability must or may be supported — as well as fallback mechanisms for handling those cases where a capability is not supported — is discussed where the capability string is defined.
interface Capabilities {
readonly attribute dictionary
capabilities;
boolean
has (string
capabilityName);
(string
or boolean
or number
)? get (string
capabilityName);
};
capabilities
of type dictionary
, readonlyboolean
, numerical
or string
.get
capabilities
or null
if no value is defined.Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
capabilityName |
| ✘ | ✘ |
stringbooleannumber
, nullablehas
capabilities
to see whether the
value is set. This will return true if the capabilities contain a key with
the given capabilityName
and the value of that key is defined.
If the value is a boolean, this function will return that boolean value. If
the value is null
, an empty string or a 0 then this method will return
false.Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
capabilityName |
| ✘ | ✘ |
boolean
A Capabilities
instance must be immutable. If a mutable Capabilities
instance is required, then the MutableCapabilities
must be used instead.
interface MutableCapabilities : Capabilities
{
void
set (string
capabilityName, (string
or boolean
or number
)? value);
};
set
capabilityName
to the given value
. If the value is not a boolean, numerical type or a string, a WebDriverException
should be thrown.Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
capabilityName |
| ✘ | ✘ | |
value |
| ✔ | ✘ |
void
Non-normative summary: A session is equivalent to a single instantiation of a particular browser, including all
child windows. The WebDriver API gives each session a UUID stored as a string that can be used to differentiate one
session from another, allowing multiple browsers to be controlled on the same machine if needed, and allowing
sessions to be routed via a multiplexer. This ID is sent with every Command
and returned with every
Response
and is stored on the sessionId
field.
The process for successfully creating a session follows.
SessionNotCreatedException
error code. The error message should list all unmet required capabilities though only the
first unmet required capability must be given.SessionNotCreatedException
and the value being an explanation that the UUID has previously been used.SUCCESS
error code.There is no requirement for the local end to validate that some or all of the fields sent on the Capabilities associated with the Command match those returned in the Response.
The following keys are to be used in the Capabilities instances.
These should be named in the style of enums in C-like languages.
In addition "ANY" may be used to indicate the underlying OS is either unknown or does not matter. Implementors may add additional platform names.
The following status codes must be returned by the "newSession" command. Please consult the table in the "commands" section for numerical values:
This section is non-normative.
The suggested order for comparing keys in the Capabilities instance when creating a session is:
For all comparisons, if the key is missing (as determined by a call to Capability.has()), that particular criteria shall not factor into the comparison.
Within this specification, a window equates to anything that would be referred to as "window.top" in javascript. Put another way, within this spec browser tabs are counted as separate windows.
TODO: define "frame"
Each window has a "window handle" associated with it. This is an opaque string which is unique to the window. The suggested implementation is as a UUID. The "getWindowHandle" command can be used to obtain the window handle for the window that commands are currently acting upon:
Command Name | getWindowHandle |
Parameters | "sessionId" {string} The key that identifies which session this request is for. |
Return Value | string |
Command Name | getWindowHandles |
Parameters | "sessionId" {string} The key that identifies which session this request is for. |
Return Value | Array.<string> |
This array of returned strings must contain a handle for every window associated with the browser session and no others. In addition, at the time of collecting the window handles the javascript expression "window.top.closed" must evaluate to false.
The ordering of the keys is not defined, but should be determined by iterating over each top level browser window and returning the tabs within that window before iterating over the tabs of the next top level browser window. For example, in the diagram below, the window handles should be returned as the handles for: win1tab1, win1tab2, win2.
Command Name | close |
Parameters | "sessionId" {string} The key that identifies which session this request is for. |
Return Value | None |
The close command closes the window that commands are currently being sent to. If this means that a call to get the list of window handles returns an empty list, then this close command must be the equivalent of calling "quit". In all other cases, control must be returned to the calling process once the window has been closed or an alert is displayed by the closing window.
Once the window has closed, future commands must return an error NoSuchWindowException until a new window is selected for receiving commands.
Command Name | setWindowSize |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize. "width" {number} The new window width. "height" {number} The new window height. |
Return Value | None |
Errors | UnsupportedOperationException if the window could not be resized. |
Command Name | getWindowSize |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize. |
Return Value | An object with two keys: "width" {number} The width of the specified window. "height" {number} The height of the specified window. |
Command Name | maximizeWindow |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize. |
Return Value | None |
Errors | UnsupportedOperationException if the window could not be resized. |
Command Name | fullscreenWindow |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "windowHandle" {string} The handle referring to the window to resize. |
Return Value | None |
Errors | UnsupportedOperationException if the window could not be resized. |
Each of these commands accept the window handles returned by "getWindowHandles" and "getWindowHandle". In addition, the window handle may be "current", in which case the window that commands are currently being handled by must be acted upon.
The "width" and "height" values refer to the "window.outerheight" and "window.outerwidth" properties. For those browsers that do not support these properties, these represent the height and width of the whole browser window including window chrome and window resizing borders/handles.
After setWindowSize, the whole browser window must be left as if the restore button had been pressed, and must not be in the maximised state.
After maximizeWindow, the whole browser window must be left as if the maximise button had been pressed; it is not sufficient to leave the window "restored", but with the full screen dimensions.
If a request is made to resize a window to a size which cannot be performed (e.g. the browser has a minimum, or fixed window size), an UnsupportedOperationException must be thrown.
TODO
Web applications can be composed of multiple windows and/or frames. For a normal user, the context in which an operation is performed is obvious: it's the window or frame that currently has OS focus and which has just received user input. The WebDriver API does not follow this convention. There is an expectation that many browsers using the WebDriver API may be used at the same time on the same machine. This section describes how WebDriver tracks which window or frame is currently the context in which commands are being executed.
WebDriver's default context is the equivalent of window.top
.
Command Name | switchToWindow |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "name" {string} The identifier used for a window. |
Return Value | None |
Errors | NoSuchWindowException if no matching window can be found |
The "switchToWindow" command is used to select which window should currently be accepting commands. In order to determine which window should be used for accepting commands, the "switchToWindow" command will iterate over all windows. For each window, the following will be compared --- in this order --- with the "name" parameter:
If no windows match, then a "NoSuchWindowException" must be thrown, otherwise the "default content" of the first window to match will be selected for accepting commands.
When a new browser session is started by WebDriver and only a single window is present then the default content of that window becomes the "current" window. When more than one window is opened, the "current" window is undefined. Any commands that are executed at this point that require a window must throw an exception (TODO: Which exception? Ideally the same as if a window had just been closed). The correct way for a user to recover from this situation is to obtain the list of window handles and to "switch to" one of these.
Command Name | switchToFrame |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "id" {?(string|number|!WebElement=)} The identifier used for a window. |
Return Value | None |
Errors | NoSuchFrameException if no matching frame can be found |
The "switchToFrame" command is used to select which frame within a window should be used for handling future commands. All frame switching is taken from the current context from which commands are currently being handled. The "id" parameter can be one of a string, number of an element. WebDriver implementations must determine which frame to select using the following algorithm:
In all cases if no match is made a "NoSuchFrameException" must be thrown.
Frame switching must succeed even if doing so would cross a security origin, or javascript executing in window.top's context would otherwise not be able to access the frame being switched to.
All browsers must comply with the focus section of the [HTML5] spec. In particular, the requirement that the element within a top-level browsing context be independent of whether or not the top-level browsing context itself has system focus must be followed.
This requirement is put in place to allow efficient machine utilization when using the WebDriver API to control several browsers independently on the same desktop
One of the key abstractions of the WebDriver API is the
WebElement
interface. Each instance of this interface represents an
Element as defined in the
[DOM4] specification. Because the WebDriver API is designed to allow users to interact with apps as if
they were actual users, the capabilities offered by the
WebElement
interface are somewhat different from those offered by the DOM Element interface.
Each WebElement instance must have an ID, which is distinct from the value of the DOM Element's "id" property. The ID for every WebElement representing the same underlying DOM Element must be the same. The IDs used to refer to different underlying DOM Elements must be unique.
interface WebElement {
readonly attribute DOMString
id;
};
id
of type DOMString
, readonlyThis requirement around WebElement IDs allows for efficient equality checks when the WebDriver API is being used out of process.
This section of the specification covers finding elements. Later sections deal with querying and interacting with
these located elements. The primary interface used by the WebDriver API for locating elements is the
SearchContext
.
The primary grouping of WebElement
instances is an array
of WebElement
instances
A reference to an WebElement is obtained via a SearchContext. The key interfaces are:
interface SeachContext {
WebElement
[] findElements (Locator
locator);
WebElement
findElement (Locator
locator);
};
findElement
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
locator |
| ✘ | ✘ |
WebElement
findElements
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
locator |
| ✘ | ✘ |
WebElement
[]
This section is non-normative: It should be possible to find elements using their ARIA roles. It may be possible to find elements using their ARIA states and properties. All references to "ARIA" refer to [WAI-ARIA]
Capability Name | Type |
---|---|
cssSelectors | boolean |
If a browser supports the
CSS Selectors API ([SELECTORS-API]) it must support locating elements by
CSS Selector. If the browser does not support the browser CSS Selector spec it may chose to implement locating
by this mechanism. Elements must be returned in the same order as if "querySelectorAll
" had been called. Compound selectors are allowed.
Finding elements by ecmascript is covered in the ecmascript part of this spec.
This strategy must be supported by all WebDriver implementations.
The HTML5 specification ([HTML5]) states that element
IDs must be unique within their home subtree. Sadly, this uniqueness
requirement is not always met. Consequently, this strategy is equally
valid for finding a single element, or groups of elements. In the case of
finding a single WebElement, this must be functionally identical to a call
to "document.getElementById()"
from the Web DOM Core specification ([DOM4]). When finding multiple
elements, this is equivalent to an CSS query of "#value
" where "value" is the ID being searched for with all "'" characters being properly escaped..
This strategy must be supported by all WebDriver implementations.
The following algorithm must be used:
This strategy must be supported by all WebDriver implementations.
The following algorithm must be used:
All WebDriver implementations must support finding elements by XPath 1.0 [XPATH] with the edits from section 3.3 of the [HTML5] specification made. If no native support is present in the browser, a pure JS implementation may be used. When called, the returned values must be equivalent of calling "evaluate" function from the DOM Level 3 XPath spec [DOM-LEVEL-3-XPATH] with the result type set to "ORDERED_NODE_SNAPSHOT_TYPE (7).
The following algorithm is used to determine if an element has been displayed.
Command Name | isDisplayed |
Parameters | "id" {string} The ID of the WebElement on which to operate. |
Return Value | {boolean} Whether the element is displayed. |
Errors | StaleElementReferenceException if the element referenced is no longer attached to the DOM |
WebDriver determines whether a WebElement is selected using the following algorithm:
Command Name | isSelected |
Parameters | "id" {string} The ID of the WebElement on which to operate. |
Return Value | {boolean} Whether the element is selected, according to the above algorithm. |
Errors | StaleElementReferenceException if the element referenced is no longer attached to the DOM |
Although the [HTML5] spec is very clear about the difference between the properties and attributes of a DOM element, users are frequently confused between the two. Because of this, the WebDriver API offers a single command ("getElementAttribute") which covers the case of returning both the value of a DOM element's property or attribute. If a user wishes to refer specifically to an attribute or a property, they should evaluate Javascript in order to be unambiguous. In this section, the "attribute" with name name
shall refer to the result of calling the Javascript "getAttribute" function on the element, with the following exceptions:
name
reflects a boolean IDL attribute, as per the HTML specification, the value must be the string 'true' if that IDL attribute's value is true or the null value if the IDL attribute's value is false.name
is "value" and there is no "value" attribute, then the text content of the OPTION element must be returned, in accordance with [HTML401] spec, specifically the section on pre-selected options. The text content must be the result of calling the "getElementText" command on the OPTION element.name
is "selected", or the element is an INPUT element of type "checkbox" or "radio" and name
is "checked", return the string 'true' if the element is selected, and the null value otherwise.name
is "style", the value returned must be serialized as defined in the [CSSOM-VIEW] spec. Notably, css property names must be cased the same as specified in in section 6.5.1 of the [CSSOM-VIEW] spec.
var style = element.getAttribute('style'); driver.executeScript('arguments[0].style = arguments[1]', element, style); var recovered = element.getAttribute('style'); assertEquals(style, recovered);
rgba\(\d+, \d+, \d+, (1|0(\.\d+)?)\)
.name
, i.e. a fully resolved URL:
TODO: This doesn't feel like an exhaustive list
Tag name | "name" value |
---|---|
A | href |
IMG | src |
name
is in the below table, and the above stages have not yielded a defined, non-null value, the value of the aliased attribute in the table below should be returned:
Original property name | Aliased property name |
---|---|
class | className |
readonly | readOnly |
These aliases provide the commonly used names for element properties.
Command Name | getElementAttribute |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "id" {string} The ID of the WebElement on which to operate. "name" {string} The name of the property of attribute to return. |
Return Value | {string|null} The value returned by the above algorithm, coerced to a nullable string, or null if no value is defined. |
Errors | StaleElementException If the element is no longer attached to the DOM. |
All WebDriver implementations must support getting the visible text of a WebElement, with excess whitespace compressed.
The following definitions are used in this section:
\s
.
[^\S\xa0]
[\x20\t\u2028\u2029]
.The expected return value is roughly what a text-only browser such as Lynx would display. The algorithm for determining this text is as follows:
Let lines
equal an empty array. Then:
child
of node, at time of execution, in order:
child
is:
text
equal the nodeValue
property of child
. Then:
text
.text
to a single newline (greedily matching (\r\n|\r|\n)
to a single \n)text
with a single space character (\x20). If the parent's effective CSS whitespace style is 'pre' or 'pre-wrap' replace each horizontal whitespace character with a non-breaking space character (\xa0). Otherwise replace each sequence of horizontal whitespace characters except non-breaking spaces (\xa0) with a single space characterlast(lines)
ends with a space character and text
starts with a space character, trim the first character of text
.text
to last(lines)
in-placelines
and continuelast(lines)
is not '', push '' to lines
.child
set to the current elementlast(lines)
does not end with whitespace append a single space character to last(lines)
[Note: Most innerText implementations append a \t here]lines
lines
trim any leading and trailing whitespace excluding non-breaking space characters.s
be lines.join('\n')
s
.s
.s
.Open questions: What happens if a user's JS triggers a modal dialog? Blocking seems like a reasonable idea, but there is an assumption that WebDriver is not threadsafe. What happens to unhandled JS errors? Caused by a user's JS? Caused by JS on a page? How does a user of the API obtain the list of errors? Is that list cleared upon read?
If a browser supports JavaScript and JavaScript is enabled, it must set the "javascriptEnabled" capability to true, and it must support the execution of arbitrary JavaScript.
Capability Name | Type |
javascriptEnabled | boolean |
The Argument
type is defined as being {(number|boolean|DOMString|WebElement|dictionary|Array.>Argument>)?}
interface JavascriptCommandParameters {
readonly attribute DOMString
script;
readonly attribute Argument
[] args;
};
When executing Javascript, it must be possible to reference the args
parameter using the function's arguments
object. The arguments must be in the same order as defined in args
. Each WebDriver implementation must preprocess the values in args
using the following algorithm:
For each index, index
in args
, if args[index]
is...
null
, then let args[index] = args[index]
.args[index]
and assign the result to args[index]
.args[index]
and assign the result to args[index]
.args[index]
be the underlying DOMElement.Command Name | executeScript |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "script" {string} The script to execute. "args" {Array.<Argument>} The script arguments. |
Return Value | {Argument} The value returned by the script, or null. |
Errors |
JavascriptError if the executing script threw an exception. StaleElementReferenceException if a WebElement referenced is no longer attached to the DOM. UnknownError if an argument or the return value is of an unhandled type. |
When executing JavaScript, the WebDriver implementations must use the following algorithm:
window
be the Window object for WebDriver's current command context.
script
be the DOMString from the command's script
parameter.
fn
be the Function object created by executing new Function(script);
args
be the JavaScript array created by the pre-processing algorithm defined above.
fn.apply(window, args);
error
be the thrown value.
dict
.
error
is an Error, then set a "message" entry in dict
whose value is the DOMString defined by error.message
.
dict
whose value is the DOMString representation of error
.
result
be the value returned by the function in step #5.
value
be the result of the following algorithm:
result
is:
undefined
or null
, return null
.
result
.
result
.
result
.
value
.Command Name | executeAsyncScript |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "script" {string} The script to execute. "args" {Array.<Argument>} The script arguments. |
Return Value | {Argument} The value returned by the script, or null. |
Errors |
JavascriptError if the executing script threw an exception. StaleElementReferenceException if a WebElement referenced is no longer attached to the DOM. Timeout if the callback is not called within the time specified by the "script" timeout. UnknownError if an argument or the return value is of an unhandled type. |
When executing asynchronous JavaScript, the WebDriver implementation must use the following algorithm:
timeout
be the value of the last "script" timeout command, or 0 if no such commands have been received.window
be the Window object for WebDriver's current command context.script
be the DOMString from the command's script
parameter.fn
be the Function object created by executing new Function(script);
args
be the JavaScript array created by the pre-processing algorithm defined above.callback
be a Function object pushed to the end of args
.window
set to fire timeout
milliseconds in the future.fn.apply(window, args);
error
be the thrown value.dict
.error
is an Error, then set a "message" entry in dict
whose value is the DOMString defined by error.message
.dict
whose value is the DOMString representation of error
.window
fires an unload
event, the WebDriver implementation must immediately set the command response status to JavascriptError and return with the error message set to "Javascript execution context no longer exists.".callback
function is invoked, then:
result
be the first argument passed to callback
.value
be the result of the following algorithm:
result
is:
undefined
or null
, return null
.result
.result
. WebDriver implementations should limit the recursion depth.result
.value
.This section describes how timeouts and implicit waits are handled within WebDriver
The "timeouts" command is used to set the value of a timeout that a command can execute for.
Command Name | timeouts |
Parameters | "sessionId" {string} The key that identifies which session this request is for. "type" {string} The type of operation to set the timeout for. Valid values are: "implicit", "page load", "script" "ms" - {number} The amount of time, in milliseconds, that time-limited commands are permitted to run. |
Return Value | None |
Errors | None |
There are two ways to interact with elements: directly or implicitly. The difference between the two is similar to the difference between "Do what I mean" vs "Do as I say": The commands for directly interacting with elements express explicit intention for the desired outcome. For this kind of interaction, the implementation of this specification should take additional steps to ensure the interaction would happen as the user intended (for example, by scrolling the element into the viewport). Implicit interaction differs by following the user's instructions without additional interpretation. Interaction would be with the currently active element, as defined by the browser. The implementation would maintain the current keyboard and mouse state in order to fulfill the user's instructions.
Some user-input actions required the element to be interactable. The following conditions must be met for the element to be considered interactable:
partial interface WebElement {
void
click ();
};
click
WebElement
instance. The middle of the element is defined as the middle of the box returned by
calling getBoundingClientRect on the underlying DOM Element, according to the [CSSOM-VIEW] spec. If the element is outside the viewport (according to the [CSS21] spec), the implementation should bring the element into view first. The implementation may invoke scrollIntoView on the underlying DOM Element. The element must be visible, as defined in section 10.1. See the note below for when the element is obscured by another element.
Exceptions:
The possible errors for this command:
void
As the goal is to emulate users as closely as possible, the implementation should not allow clicking on elements that are obscured by other elements. The implementation should try to scroll the element into view, but in case it is fully obscured, it should not be clickable.
A pre-requirement for keys-based interaction with an element is that it is interactable (as defined earlier in the section). Typing into an element could take place if one of the following conditions is met:
activeElement
. In addition to focusable elements, this allows typing to the body
element.contentEditable
attribute set or the containing document is in designMode
.Prior to any keyboard interaction, the element should be focused if it does not currently have the focus. This is the case if one of the following holds:
document.activeElement
In case focusing is needed, the implementation must follow the focusing steps as described in the focus management section of the [HTML5] spec. The focus must not leave the element at the end of the interaction, other than as a result of the interaction itself (i.e. when the tab key is sent).
partial interface WebElement {
void
clear ();
void
sendKeys (string
[] keysToSend);
};
clear
void
sendKeys
Caret positioning: If focusing was needed, after following the focusing steps, the caret must be positioned at the end of the text currently in the element. At the end of the interaction, the caret must be positioned at the end of the typed text sequence, unless the keys sent position it otherwise (e.g. using the LEFT key).
There are four different types of keys that are emulated:
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
keysToSend |
| ✘ | ✘ |
void
When emulating user input, the implementation must generate
the same sequence of events that would have been produced if a real user was
sitting in front of the keyboard and typing the sequence of characters. In
cases where there is more than one way to type this sequence, the
implementation must choose one of the valid ways. For example, typing AB
may be achieved by:
The implementation may use the following algorithm to generate the events. If the implementation is using a different algorithm, it must adhere to the requirements listed below.
For each key, key
in keysToSend
, do
key
is a lower-case symbol:
key
as
the character to emulateuppercaseKey
be the upper-case character matching key
uppercaseKey
as the character to emulatekey
is an upper-case symbol:
key
as
the character to emulatekey
as the character to emulatekey
represents a modifier key:
modifier
be the modifier key represented by
key
modifier
is currently held down:
modifier
modifier
key
represents the NULL key:
key
represents a special key:
key
to the special key it representsOnce keyboard input is complete, an implicit NULL key is sent unless the final character is the NULL key.
Any implementation must comply with these requirements:
sendKeys
call and the appropriate key-up events
generated
The keysToSend
parameter contains a mix of printable
characters and pressable keys that aren't text. Press-able keys that aren't
text are stored in the Unicode PUA (Private Use Area) code points,
0xE000-0xF8FF. The following table describes the mapping between PUA and
key:
Key | Code | Type |
---|---|---|
NULL | \uE000 | NULL |
CANCEL | \uE001 | Special key |
HELP | \uE002 | Special key |
BACK_SPACE | \uE003 | Special key |
TAB | \uE004 | Special key |
CLEAR | \uE005 | Special key |
RETURN | \uE006 | Special key |
ENTER | \uE007 | Special key |
SHIFT | \uE008 | Modifier |
LEFT_SHIFT | \uE008 | Modifier |
CONTROL | \uE009 | Modifier |
LEFT_CONTROL | \uE009 | Modifier |
ALT | \uE00A | Modifier |
LEFT_ALT | \uE00A | Modifier |
PAUSE | \uE00B | Special key |
ESCAPE | \uE00C | Special key |
SPACE | \uE00D | Special key |
PAGE_UP | \uE00E | Special key |
PAGE_DOWN | \uE00F | Special key |
END | \uE010 | Special key |
HOME | \uE011 | Special key |
LEFT | \uE012 | Special key |
ARROW_LEFT | \uE012 | Special key |
UP | \uE013 | Special key |
ARROW_UP | \uE013 | Special key |
RIGHT | \uE014 | Special key |
ARROW_RIGHT | \uE014 | Special key |
DOWN | \uE015 | Special key |
ARROW_DOWN | \uE015 | Special key |
INSERT | \uE016 | Special key |
DELETE | \uE017 | Special key |
SEMICOLON | \uE018 | Special key |
EQUALS | \uE019 | Special key |
NUMPAD0 | \uE01A | Special key |
NUMPAD1 | \uE01B | Special key |
NUMPAD2 | \uE01C | Special key |
NUMPAD3 | \uE01D | Special key |
NUMPAD4 | \uE01E | Special key |
NUMPAD5 | \uE01F | Special key |
NUMPAD6 | \uE020 | Special key |
NUMPAD7 | \uE021 | Special key |
NUMPAD8 | \uE022 | Special key |
NUMPAD9 | \uE023 | Special key |
MULTIPLY | \uE024 | Special key |
ADD | \uE025 | Special key |
SEPARATOR | \uE026 | Special key |
SUBTRACT | \uE027 | Special key |
DECIMAL | \uE028 | Special key |
DIVIDE | \uE029 | Special key |
F1 | \uE031 | Special key |
F2 | \uE032 | Special key |
F3 | \uE033 | Special key |
F4 | \uE034 | Special key |
F5 | \uE035 | Special key |
F6 | \uE036 | Special key |
F7 | \uE037 | Special key |
F8 | \uE038 | Special key |
F9 | \uE039 | Special key |
F10 | \uE03A | Special key |
F11 | \uE03B | Special key |
F12 | \uE03C | Special key |
META | \uE03D | Special key |
COMMAND | \uE03D | Special key |
ZENKAKU_HANKAKU | \uE040 | Special key |
The keys considered upper-case symbols are either defined by the current keyboard locale or are derived from the US 104 keys Windows keyboard layout, which are:
When the user input is emulated natively (see note below), the implementation should use the current keyboard locale to determine which symbols are upper case. In all other cases, the implementation must use the US 104 key Windows keyboard layout to determine those symbols.
The state of the physical keyboard must not affect emulated user input.
Non-latin symbols: TBD
Complex scripts using Input Method Editor (IME): TBD
User input should be emulated natively: The input events should be indistinguishable from a user, behind a screen and a keyboard, interacting with the browser. For that purpose, it is highly recommended that input events not be generated at the DOM level. Instead, emulated input events should originate from the browser's own event queue, just like other user input. This is the order of preference for methods to emulate user input:
These input methods could be used to interact with whe browser's chrome. However, the way to do so is not defined as it most likely to be implementation-specific.
This section deals with implicit interaction. In this kind of interaction, a user describes a series of input actions which the implementation should fulfill with little to no additional actions. This kind of interactions allow dragging and dropping or combining keyboard and mouse actions.
interface Actions {
void
keyDown (Keys
theKey);
void
keyUp (Keys
theKey);
void
sendKeys (string
[] keysToSend);
void
moveToElement (WebElement
toElement);
void
moveByOffset (int
xOffset, int
yOffset);
void
clickAndHold ();
void
release ();
void
click ();
void
doubleClick ();
void
contextClick ();
};
click
void
clickAndHold
void
contextClick
void
doubleClick
void
keyDown
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
theKey |
| ✘ | ✘ |
void
keyUp
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
theKey |
| ✘ | ✘ |
void
moveByOffset
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
xOffset |
| ✘ | ✘ | |
yOffset |
| ✘ | ✘ |
void
moveToElement
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
toElement |
| ✘ | ✘ |
void
release
void
sendKeys
Parameter | Type | Nullable | Optional | Description |
---|---|---|---|---|
keysToSend |
| ✘ | ✘ |
void
The following conditions must hold for implementation of this high-level API:
Mouse movement and scrolling: Despite the implicit nature of this API, some mouse movement actions still cause implicit scrolling. The alternative, of providing an API to scroll the viewport and forcing the user to do so, would be inconvenient. When moving to an element, the implementation should use the same method of scrolling as is used for WebElement.click()
. When moving by an offset, the implementation has a greater freedom to decide how much to scroll. In both cases, the implementation must not scroll if the target coordinates are already in the viewport.
The methods described in this interface are a minimal set. An implementation may add additional helper methods for convenience. Such a method could be dragAndDrop(WebElement, WebElement)
as a shorthand form of calling moveTo(WebElement), clickAndHold(), moveTo(WebElement), release()
TODO: Describe the commands for basic control of keyboard and mouse.
This entire section should be considered non-normative
The remote end must fail fast if any command is received while a modal dialog is open, unless that command handles the dialog.
This is an exhaustive list of the commands listed in this specification.
This is essentially the content at the start of the json wire protocol
Many thanks to Robin Berjon for making our lives so much easier with his cool tool. Thanks to Jason Leyba and Malcolm Rowe for proof reading and suggesting areas for improvement. Thanks to Jason Leyba, Eran Messeri and Daniel Wagner-Hall for contributing sections to this document.