Copyright © 2003 W3C® ( MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This document describes the syntax and semantics for the Ink Markup Language for use in the W3C Multimodal Interaction Framework as proposed by the W3C Multimodal Interaction Activity. The Ink Markup Language serves as the data format for representing ink entered with an electronic pen or stylus. The markup allows for the input and processing of handwriting, gestures, sketches, music and other notational languages in Web-based applications. It provides a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a public W3C Working Draft for review by W3C members and other interested parties. Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than 'work in progress.
This specification describes the syntax and semantics for ink markup, as a basis for a common format for the exchange of ink data between components such as handwriting and gesture recognizers, signature verifiers, and other ink-aware modules.
This document has been produced as part of the W3C Multimodal Interaction Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Multimodal Interaction Working Group (W3C Members only).
Patent disclosures relevant to this specification may be found on the Working Group's patent disclosure page in conformance with W3C policy.
This document is for public review, and comments and discussion are welcomed on the (archived) public mailing list <www-multimodal@w3.org>.
As more electronic devices with pen interfaces have and continue to become available for entering and manipulating information, applications need to be more effective at leveraging this method of input. Handwriting is an input modality that is very familiar for most users since everyone learns to write in school. Hence, users will tend to use this as a mode of input and control when available.
A pen-based interface consists of a transducer device and a pen so that the movement of the pen is captured as digital ink. Digital ink can be passed on to recognition software that will convert the pen input into appropriate computer actions. Alternatively, the handwritten input can be organized into ink documents, notes or messages that can be stored for later retrieval or exchanged through telecommunications means. Such ink documents are appealing because they capture information as the user composed it, including text in any mix of languages and drawings such as equations and graphs.
Hardware and software vendors have typically stored and represented digital ink using proprietary or restrictive formats. The lack of a public and comprehensive digital ink format has severely limited the capture, transmission, processing, and presentation of digital ink across heterogeneous devices developed by multiple vendors. In response to this need, the Ink Markup Language (InkML) provides a simple and platform-neutral data format to promote the interchange of digital ink between software applications.
InkML supports a complete and accurate representation of hand-drawn ink. For instance, in addition to the pen position over time, InkML allows recording of information about transducer device characteristics and detailed dynamic behavior to support applications such as handwriting recognition and authentication. For example, there is support for recording additional channels such as pen tilt, or pen tip force (commonly referred to as pressure in manufacturers' documentation).
InkML provides means for extension. By virtue of being an XML-based language, users may easily add application-specific information to ink files to suit the needs of the application at hand.
Note: A media type will be registered for InkML instances. It is expected that this media type will be application/inkml+xml as recommended by RFC3023.
This specification was developed to fulfill the W3C requirements for the Ink Markup Language.
The question of whether this specification will use the term "pressure" or "force" has not been decided yet. The Working Group welcomes feedback from the public on this issue.
With the establishment of a non-proprietary ink standard, a number of applications, old and new, are expanded where the pen can be used as a very convenient and natural form of input. Here are a few examples.
Two-way transmission of digital ink, possibly wireless, offers mobile-device users a compelling new way to communicate. Users can draw or write with a pen on the device's screen to compose a note in their own handwriting. Such an ink note can then be addressed and delivered to other mobile users, desktop users, or fax machines. The recipient views the message as the sender composed it, including text in any mix of languages and drawings.
A photo taken with a digital camera can be annotated with a pen; the digital ink can be coordinated with a spoken commentary. The ink annotation could be used for indexing the photo (for example, one could assign different handwritten glyphs to different categories of pictures).
A software application may allow users to archive handwritten notes and retrieve them using either the time of creation of the handwritten notes or the tags associated with keywords. The tags are typically text strings created using a handwriting recognition system.
In support of natural and robust data entry for electronic forms on a wide- spectrum of keyboardless devices, a handwriting recognition engine developer may define an API that takes InkML as input.
Robust and flexible user interfaces can be created that integrate the pen with other input modalities such as speech. Higher robustness is achievable because cross-modal redundancy can be used to compensate for imperfect recognition on each individual mode. Higher flexibility is possible because users can choose the most appropriate from among various modes for achieving a task or issuing commands. This choice might be based on user preferences, suitability for the task, or external conditions. For instance, when noise in the environment or privacy is a concern, the pen modality is preferred over voice.
The current InkML specification defines a set of primitive elements
sufficient for all basic ink applications. Few semantics are attached
to these elements. All content of an InkML document is contained
within a single <ink>
element. The fundamental
data element in an InkML file is the <trace>
. A
trace represents a sequence of contiguous ink points -- e.g., the X
and Y coordinates of the pen's position. A sequence of traces
accumulates to meaningful units, such as characters and words. The
<traceFormat>
element is used to define the format
of data within a trace.
In its simplest form, an InkML file with its enclosed traces looks like this:
<ink> <trace> 10 0 9 14 8 28 7 42 6 56 6 70 8 84 8 98 8 112 9 126 10 140 13 154 14 168 17 182 18 188 23 174 30 160 38 147 49 135 58 124 72 121 77 135 80 149 82 163 84 177 87 191 93 205 </trace> <trace> 130 155 144 159 158 160 170 154 179 143 179 129 166 125 152 128 140 136 131 149 126 163 124 177 128 190 137 200 150 208 163 210 178 208 192 201 205 192 214 180 </trace> <trace> 227 50 226 64 225 78 227 92 228 106 228 120 229 134 230 148 234 162 235 176 238 190 241 204 </trace> <trace> 282 45 281 59 284 73 285 87 287 101 288 115 290 129 291 143 294 157 294 171 294 185 296 199 300 213 </trace> <trace> 366 130 359 143 354 157 349 171 352 185 359 197 371 204 385 205 398 202 408 191 413 177 413 163 405 150 392 143 378 141 365 150 </trace> </ink>
These traces consist simply of alternating X and Y values, and may look like this when rendered:
Figure 1: example trace rendering
Figure 1 shows a trace of a sampled handwriting signal representing. The dots mark the sampling positions which were interpolated by the blue line. Green points represent pen-downs whereas red dots indicate pen-ups.
Information about the transducer device used to collect the ink
(e.g., the sampling rate and resolution) is specified with the <captureDevice>
element. The Multimodal Interaction Working Group is currently working
with the Device Independence Working Group to make sure that transducer
characteristics are also represented as a CC/PP profile that can be
included inside an ink document by reference. See
"Composite Capability/Preference Profiles (CC/PP): Structure and Vocabularies"
Ink traces can have certain attributes such as color and width.
These attributes are captured in the <brush>
element. Traces that share the same characteristics, such as being
written with the same brush, can be grouped together with the
<traceGroup>
element.
For applications that require ink sharing, such as collaborative whiteboards,
where ink coming from different devices is drawn on a common canvas, the
<context>
element allows representation and grouping of the pertinent
information, such as the trace format, brush, and canvas.
The <traceRefGroup>
element is provided as a building block for semantic
labelling of groups of traces. It includes a generic contentCategory attribute
that can be used by applications to describe at a basic level the category of
content that the traces represent (e.g., "handwritten text", "drawing",
etc.).
In all appropriate cases, the InkML specification defines default values for elements that are not specified, and rules that establish the scope of a given attribute.
Application-specific elements are expected to be
defined to provide a higher-level description of the digital ink captured in the
primitive elements. Some application-specific elements would reference the
primitive elements. For example, a page tag may be useful in a document
management application to indicate groups of traces belonging to a particular
page. In a form processing application, a field tag might indicate a group of
traces belonging to a particular field. Another example of an
application-specific element is <writerInfo>
which
could be used to record information about
the age and handedness of the writer.
When combining InkML and other XML elements within applications, elements from different namespaces may be disambiguated by use of the namespace qualifier. InkML element names are defined within the InkML namespace, specifically http://www.w3.org/2003/InkML
Finally, the InkML specification is currently restricted to fixed Cartesian coordinate systems. Similarly, it does not support detailed timestamp handling, events (although these could be handled via application-specific elements), or sophisticated compression of trace data.
Most ink-related applications fall into two broad categories: Streaming and Archival.
Archival ink applications capture and store digital ink for later processing,
such as document storage/retrieval applications and remote on-line forms
processing (where forms are filled on electronic tablet computers and processed
remotely). In these applications, all primitive elements are written prior to
processing. For ease of processing, it is recommended that, in
archival mode, referenced elements be defined inside of a declaration block
using the <defs>
element.
Streaming ink applications, on the other hand, capture and transmit digital ink in essentially real time, such as in the electronic whiteboard example mentioned above. In order to support a streaming style of ink markup generation, the InkML language supports the notion of a "current" state (e.g., the current brush) and allows for incremental changes to this state.
Traces are the basic element used to record the trajectory of the pen as the user writes digital ink. More specifically, these recordings describe sequences of connected points. On most devices, these sequences of points will be bounded by pen contact change events (pen-up and pen-down), although some application simply record proximity and force data without providing an interpretation of pen-up or pen-down state.
The simplest form of encoding specifies the X and Y coordinates of each sample point. For compactness, it may be desirable to specify absolute coordinates only for the first point in the trace and use delta-x and delta-y values to encode subsequent points. Some devices record acceleration rather than absolute or relative position; some provide additional data that may be encoded in the trace, including Z coordinates or tip force (pressure), or the state of side switches or buttons.
These variations in the information available from different
capture devices, or needed by different applications, are supported in
InkML through the <traceFormat>
and
<trace>
elements. The
<traceFormat>
element specifies the encoding format
for each sample of a recorded trace, while <trace>
elements are used to represent the actual trace data. If no
<traceFormat>
is specified, a default encoding
format of X and Y coordinates is assumed.
Traces generated by differing devices, or used in differing applications, may contain different types of information. InkML defines channels to describe the data that may be encoded in a trace.
A channel can be characterized as either regular--meaning that its value is recorded for every sample point of the trace, or intermittent--meaning that its value may change infrequently and thus will not necessarily be recorded for every sample point. X and Y coordinates are examples of likely regular channels, while the state of a pen button is likely to be an intermittent channel.
The <traceFormat>
element describes the
format used to encode points within <trace>
elements. In particular, it defines the sequence of channel values
that occurs within <traceFormat>
elements. The
order of declaration of channels in the
<traceFormat>
element determines the order of
appearance of their values within <trace>
elements. X and Y should be the first two channels of the
<traceFormat>
if they are used.
Regular channels appear first in the <trace>
,
followed by any intermittent channels. Correspondingly, the
<traceFormat>
element contains a
<regularChannels>
section followed by a
<intermittentChannels>
section. The
<regularChannels>
element lists those channels
whose value must be recorded for each sample point, while the
<intermittentChannels>
lists those channels
whose value may optionally be recorded for each sample point. If no
channels of either type exist, the corresponding element may be
omitted.
Within a <regularChannels>
or
<intermittentChannels>
element, channels are
described using the empty element <channel>
,
with name, type, default, and mapping
attributes.
The required name attribute specifies the interpretation of the channel in the trace data. The following channel names, with their specified meanings, are reserved:
channel name | interpretation |
---|---|
X | X coordinate (horizontal pen position) |
Y | Y coordinate (vertical pen position) |
Z | Z coordinate (height of pen above paper/digitizer) |
F | pen tip force (tablet pressure) |
S | tip switch state (touching/not touching the digitizer) |
B1...Bn | side button states |
Tx | tilt along the x-axis |
Ty | tilt along the y-axis |
Az | azimuth angle of the pen (yaw) |
El | elevation angle of the pen (pitch) |
R | rotation (rotation about pen axis - i.e., like the roll axis of an airplane) |
There are 5 channels defined for recording of pen orientation data. Implementers may choose to use either Azimuth and Elevation, or tilt angles. The latter are the angles of projections of the pen axis onto the XZ and YZ planes, measured from the vertical. It is often useful to record the sine of this angle, rather than the angle itself, as this is usually more useful in calculations involving angles. The specification does not yet include a mechanism for distinguishing these two.
The third degree of freedom in orientation is generally defined as the rotation of the pen about its axis. This is potentially useful (in combination with tilt) in application such as illustration or calligraphy, and signature verification.
Figure 2: (a) azimuth and elevation angles, (b) tilt angles
Figure 3: (a) pen orientation decomposition, (b) pen rotation
Figure 2a displays the pen orientation using Azimuth and Elevation. The origin of the Azimuth is at the Y-axis. Azimuth increases anticlockwise up to 360 degrees. The origin of Elevation is located within the XY-plane. Elevation increases up to 90 degrees, at which point the pen is perpendicular to the XY-plane.
Figure 2b explains the definition of the Tilt-X and the Tilt-Y angles. For both the origin is along the Z-axis. Tilt-X increases up to +90 degrees for inclinations along the positive X-axis and decreases up to -90 degrees for inclinations along the negative X-axis. Respectively, Tilt-Y is defined for pen inclinations along the Y-axis.
Figure 3a displays the pen orientation decomposition as functions of Azimuth/Elevation or alternatively as function of Tilt-X/Tilt-Y. Thereby, Elevations of the pen which are mapped to the XZ- and to the YZ- plane lead to Tilt-X and Tilt-Y.
Figure 3b shows the Rotation of the pen along its longitudinal axis.
In addition, user-defined channels are allowed, although their interpretation is not required by conforming ink markup processors.
The type attribute defines the encoding type for the channel (either boolean, decimal, or integer). If type is not specified, it defaults to decimal.
A default value can be specified for the channel using the default attribute; the use of default values within a trace is described in the next section. If no default is specified, it is assumed to be zero for integer and decimal-valued channels, and false for boolean channels.
Typically, a channel in the <traceFormat>
will map
directly to a corresponding channel provided by the digitizing
device, and its values as recorded in the trace data will be the
original channel values recorded by the device. However, for some
applications, it may be useful to store normalized channel values
instead, or even to remap the channels provided by the digitizing
device to different channels in the trace data. This correspondence
between the trace data and the device channels is recorded using
the mapping attribute of the <channel>
element.
The mapping attribute has three forms. Identity mappings from device channels are described using a mapping value of "*". The following example defines a channel in the trace data which records the values obtained directly from the X coordinate channel provided by the device:
<channel name="X" type="decimal" mapping="*"/>
Simple mappings such as scaling, and translation, can be specified using a mapping value of the form "formula(...)", where the expression enclosed in the parentheses contains only channel names (from the device element), integer and decimal values, mathematical operators +, -, *, /, and boolean operators !, &, |. Formulae syntax is defined to be standard ANSI-C expression syntax, including use of integer and decimal values, restricted to the listed operators. The examples below define a channel for Y coordinates which is derived from the original device y-coordinate channel by scaling by 2 and translating by 10 units, and another channel which normalizes the device's tip force values from the range 0..1024 to 0..128:
<channel name="Y" type="decimal" mapping="formula(2*Y+10)"/> <channel name="F" type="decimal" mapping="formula(F*.125)"/>
More complex relations can be described using a mapping value of the form "uri(...)", where the URI enclosed within the parentheses can refer to a resource such as a MathML document. The following element defines a force channel in the trace data whose values were obtained by some mapping of device channels specified in a separate resource called fxform:
<channel name="F" type="decimal" mapping="uri('http://www.example.org/fxform')"/>
If no mapping is specified for a channel, it is assumed to be unknown.
The following example defines a
<traceFormat>
which reports decimal-valued X and
Y coordinates for each point, and intermittent boolean values for
the states of two buttons B1 and B2, which have default values of
"false":
<traceFormat id="xyb1b2"> <regularChannels> <channel name="X" type="decimal" mapping="*"/> <channel name="Y" type="decimal" mapping="*"/> </regularChannels> <intermittentChannels> <channel name="B1" type="boolean" default="F" mapping="*"/> <channel name="B2" type="boolean" default="F" mapping="*"/> </intermittentChannels> </traceFormat>
The appearance of a
<traceFormat>
element in an ink markup file both
defines the format and installs it as the current format for
subsequent traces (except within a <defs>
block,
discussed later in section 3.4). The id attribute of a
<traceFormat>
allows the format to be reused by
multiple contexts (section 3.2). If no
<traceFormat>
is specified, the following default
format is assumed for all traces:
<traceFormat id="default"> <regularChannels> <channel name="X" type="decimal"/> <channel name="Y" type="decimal"/> </regularChannels> </traceFormat>
Thus, in the simplest case, an ink markup file need only contain traces.
Should a <traceFormat>
be allowed to reference another
<traceFormat>
? If so, what is the nature of the modifications
which would be allowed? One possibility is to allow extension only;
i.e. the channels defined in the <traceFormat>
are added in
order after the ones in the referenced <traceFormat>
. Another
is to allow overriding of the attributes of channels in the
referenced <traceFormat>
; e.g. any channel whose name matches
that of a channel in the referenced <traceFormat>
replaces
its definition.
Additional detail about formula syntax is still open. Lookup tables, < > == operators, ...
The <trace>
element is used to record the data
captured by the digitizer. It contains a sequence of points encoded
according to the specification given by the
<traceFormat>
element.
The type attribute of a <trace>
indicates the pen
contact state (either "pen-up" or "pen-down") during its recording.
A value of "indeterminate" is used if the contact-state is neither pen-up
nor pen-down, and may be either unknown or variable within the trace.
For example, a signature may be captured as a single indeterminate trace
containing both the actual writing and the trajectory of the pen between strokes.
A value of "continuation" means both that
the pen contact state is retained from the previous trace element
and that the points of the current trace element are a temporally
contiguous continuation of (and thus should be connected to) the
previous trace element. This allows a trace to be spread across
several elements for purposes such as streaming.
Regular channels may be reported as explicit values, differences, or second differences. Prefix symbols are used to indicate the interpretation of a value. A preceding exclamation point indicates an explicit value, a single quote indicates a single difference, and a double quote prefix indicates a second difference. If there is no prefix, then the channel value is interpreted as explicit, difference, or second difference based on the last prefix for the channel. If there is no last prefix, the value is interpreted as explicit.
A second difference encoding must be preceded by a single difference representation; which, in turn, must be preceded with an explicit encoding.
NOTE: All traces must begin with an explicit value, not with a first or second difference. This is true of continuation traces as well. This allows the location and velocity state information to be discarded at the end of each trace, simplifying parser design.
Intermittent channels are always encoded explicitly, and prefixes are not allowed.
Both regular and intermittent channels may be encoded with a wildcard character *. The wildcard character means either that the value of the channel remains at the previous channel value (if explicit), or that the channel continues integrating the previous velocity and acceleration values.
Booleans are encoded as "T" or "F".
For each point in the trace, regular channel values are reported
first in the order given by the <traceFormat>
. If any
intermittent values are reported for the point, the set of
intermittent values is preceded by a colon and ended with a
semicolon. Within these delimiters, the intermittent channels are
represented in the order given by the <traceFormat>
. The list
may be terminated early with the semicolon, and the unreported
intermittent channels are interpreted with wildcards.
Here is an example of a trace of 11 points, using the following traceFormat:
<traceFormat> <regularChannels> <channel name="X" type="decimal"> <channel name="Y" type="decimal"> </regularChannels> <intermittentChannels> <channel name="B1" type="boolean" default="F"/> <channel name="B2" type="boolean" default="F"/> </intermittentChannels> </traceFormat> <trace id = "id4525abc"> 1125 18432'23'43"7"-8 3-5+7 -3+6+2+6 8+3+6:T;+2+4:*T;+3+6+3-6:FF; </trace>
The trace is interpreted as follows:
Trace | X | Y | vx | vy | B1 | B2 | Comments |
---|---|---|---|---|---|---|---|
1125 18432 | 1125 | 18432 | ? | ? | F | F | button default values |
'23'43 | 1148 | 18475 | 23 | 43 | F | F | velocity values |
"7"-8 | 1178 | 18510 | 30 | 35 | F | F | acceleration Values |
3-5 | 1211 | 18540 | 33 | 30 | F | F | implicit acceleration whitespace token sep |
+7 -3 | 1251 | 18567 | 40 | 27 | F | F | optional whitespace |
+6+2 | 1297 | 18596 | 46 | 29 | F | F | |
+6 8 | 1349 | 18633 | 52 | 37 | F | F | space instead of + |
+3+6:T; | 1404 | 18676 | 55 | 43 | T | F | an optional value |
+2+4:*T; | 1461 | 18723 | 57 | 47 | T | T | wildcard |
+3+6 | 1521 | 18776 | 60 | 53 | T | T | optional keep last |
+3-6:FF; | 1584 | 18823 | 63 | 47 | F | F | optionals |
One would not typically see both a "+"and a "space" used as a separator in the same trace or document, but it is legal.
An ink markup generator might also include additional whitespace formatting for clarity. The following trace specification is identical in meaning to the more compact version shown above:
<trace id = "id4525abc"> 1125 18432 '23 '43 "7 "-8 3 -5 7 -3 6 2 6 8 3 6 :T; 2 4 : *T; 3 6 3-6 :F F; </trace>
In addition, the alphabetic characters may be used to encode small negative and positive integer values. These may be substituted anywhere for an integer value between -25 and +25.
<trace id="4525BCD"> 1125 18432'W'43"G"hCeGcFBFHCF:T;BD:*T;CFCf:FF; </trace>
Note that the true and false values for the side buttons use symbols that are also used to encode numbers. However, they are unambiguous because of their location.
The grammar for trace encoding is described in Backus-Naur Form (BNF) using the following notation:
The grammar is as follows:
trace ::= wsp* point+ point ::= regularPart intermittentPart? regularPart ::= regularValue+ intermittentPart ::= ":" wsp* intermittentValue* ";" wsp* regularValue ::= qualifier? value wsp* intermittentValue ::= value wsp* value ::= integer | decimal | code integer ::= sign? digit+ decimal ::= sign? digit+ "." digit+ code ::= "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" | "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" | "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z" | "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" | "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" | "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z" | "*" digit ::= "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" sign ::= "+" | "-" qualifier ::= "!" | "'" | """ wsp ::= #x20 | #x9 | #xD | #xA
The number of regularValue
tokens appearing within a trace must match the number of regular
channels specified in the <traceFormat>
, and the number of
intermittentValue tokens must
be no greater than the number of intermittent channels.
Whitespace is optional before and after regularValue and intermittentValue tokens (unless required to separate two adjacent positive integer or decimal tokens values without + signs).
Since many sources of digital ink are temporal, many digital ink
records will have significant time information. The "current" or
"cumulative" time may be expressed in several ways, depending on
what is available at the time of capture. The most explicit
expression of time is by the use of a startTime
attribute in
any element. This is not an ideal solution and should be considered
more carefully by the working group.
There is currently some discussion about whether to make continuation a separate attribute, rather than a type. This would allow specification of whether a continuation trace was pen-up, pen-down, or indeterminate in addition to the fact that it is a continuation.
The <traceGroup>
element is used to group successive traces which share common characteristics,
such as the same <traceFormat>
. The brush and context sections describe other contextual
values that can be specified for a <traceGroup>
. In the following example the two traces enclosed
in the <traceGroup>
share the same brush (see section 3.2 for a description of brushes).
<traceGroup brushRef="penA"> <trace>...</trace> <trace>...</trace> </traceGroup>
The use of <traceGroup>
is reserved for the
containment of traces according to their properties at the time of
capture. The element may not be nested, and it is not meant to be a
generic grouping mechanism for the semantic labelling of traces. For
that purpose, InkML provides the <traceRefGroup>
element, which is described in section 5.
Trace groups are the primary mechanism for assigning
<context>
to traces in archival ink markup. For
additional details about this usage, see section 4.1.
We recently clarified that <traceGroup>
elements
may not be nested. <traceRefGroup>
provides most of
the functionality for which this would be desireable.
Is there any use case we have overlooked that would require nested
traceGroups
?
A number of device, data format, and coordinate system details comprise the context in which ink is written and recorded. These contextual details need to be captured by the ink markup language in order to fully characterize the recorded ink data.
The <context>
element (section 3.3) provides various
attributes such as canvas and mapping by which
InkML addresses this need. In addition, the
<captureDevice>
element (section 3.1) describes how InkML allows
accurate recording of the hardware characteristics relevant during the
capture of the ink traces.
Different pen tips (e.g. eraser vs. writing end) or
entirely different pens, physical or virtual, may be used on the same
input device. These details are captured by the <brush>
element (Section 3.2).
The following sections describe the elements which are used to capture the context in which the ink data was recorded.
One of the important requirements for the ink format is to allow
accurate recording of meta-data about the hardware that was used to
acquire the ink contained in a file. This is accomplished in the
<captureDevice>
block, which may contain either very basic information,
or very detailed information about a number of device
characteristics.
Some of these characteristics are already commonly used in digitizer specifications, while others are somewhat more esoteric, but nonetheless potentially very useful. Most digitizer manufacturers do not spec them, and many are not able to measure them. However, these device characteristics influence signal fidelity and impose some limits on how the data can be used. Hopefully by beginning to standardize the recording of these characteristics, we can raise awareness and encourage device manufacturers to take them into consideration.
The <captureDevice>
block, including
<channelList>
, will often be specified by reference
to a separate xml document, either local or at some remote
URI. Ideally, <captureDevice>
blocks for common
devices will become publicly available.
The <captureDevice>
element will allow specification of:
<captureDevice id="foo" manufacturer="AcmePen" model="FooBar 2000 USB" sampleRate="100" uniform="TRUE" latency="50"> <channelList> ... </channelList> </captureDevice>
id | A unique identifier for this captureDevice element |
---|---|
manufacturer | String identifying the digitizer device manufacturer |
model | String identifying the digitizer model |
sampleRate | The basic sample rate in samples/sec. May be "unknown" |
uniform | TRUE or FALSE indication of whether sample rate is consistent, with no dropped points |
latency | The basic device latency that applies to all channels, in msec |
The <channelList>
element lists all data channels that the device is capable of
reporting. Channels include:
<channelList id="foo"> <channel name="X"> ... </channel> </channelList>
id | A unique identifier for this channelList element |
---|
In addition, each channel may specify any of the following when known and appropriate:
For continuous channels, like X, Y and Z, and Force, these additional characteristics may be specified:
<channel name="X"> <representation value="INTEGER"/> <range min="0" max="8191"/> <threshold value="0.1" units="newtons"/> <resolution value="0.1" units="mm"/> <quantization value="0.01" units="mm"/> <noise value="0.05" units="mm"/> <accuracy value="0.5" units="mm"/> <crossCoupling otherChannel="Tx" value="0.1"/> <crossCoupling otherChannel="Ty" value="0.01"/> <skew value="2" units="msec"/> <minBandwidth value="15.0"/> <distortion value=".001"/> </channel>
name | The name of the channel described by this channel element |
---|
<channel name="S"> <representation value="BOOLEAN"/> <threshold value="0.1" units="newtons"/> <skew value="5" units="msec"/> </channel> <channel name="X"> <representation value="INTEGER"/> <range min="0" max="8191"/> <resolution value="0.1" units="mm"/> <quantization value="0.01" units="mm"/> <noise value="0.05" units="mm"/> <accuracy value="0.5" units="mm"/> <crossCoupling otherChannel="Tx" value="0.1"/> <crossCoupling otherChannel="Ty" value="0.01"/> <skew value="2" units="msec"/> <minBandwidth value="15.0"/> <distortion value=".001"/> </channel>
This Error Calculations section is informative.
The following are some suggestions for how error estimates might be derived from the basic fidelity information in a spatial channel (x or y):
All errors are subject to additional distortion from a signal exceeding the channel bandwidth.
The attribute for identifying the capture device info block has not been incorporated into the Context section.
There should be a "time" channel. We recently noticed that it is missing, and it will be incorporated in the next draft.
There have been last minute additions to try to flesh out the syntax and examples. These are preliminary, and may be changed.
Along with trace data, it is often necessary to record certain attributes of the pen during ink capture. For example, in a notetaking application, it is important to be able to distinguish between traces captured while writing as opposed to those which represent erasures. Because these attributes will often be application specific, this specification does not attempt to enumerate the brush attributes which can be associated with a trace. It also does not provide a language for describing brush attributes, since it is possible to imagine attributes which are described using complex functions parameterized by time, pressure, or other factors. Instead, the specification allows for capturing the fact that a given trace was recorded in a particular brush context, leaving the details of precisely specifying that context to a higher-level, application specific layer.
Depending on the application, brush attributes may change frequently. Accordingly, there should be a concise mechanism to assign the attributes for an individual trace. On the other hand, it is likely that many traces will be recorded using the same sets of attributes; therefore, it should not be necessary to explicitly state the attributes of every trace (again, for reasons of conciseness). Furthermore, it should be possible to define entities which encompass these attribute sets and refer to them rather than listing the entire set each time. Since many attribute sets will be similar to one another, it should also be possible to inherit attributes from a prior set while overriding some of the attributes in the set.
In the ink markup, brush attributes are described by the
<brush>
element. This element allows for the definition of
reusable sets of brush attributes which may be associated with traces.
For reference purposes, a brush specifies an identifier which can be
used to refer to the brush. A brush can inherit the attributes of
another <brush>
element by including a brushRef attribute which
contains the referenced brush's id.
Brush attributes are associated with traces using the brushRef
attribute. When it appears as an attribute of an individual
<trace>
, the brushRef specifies the brush attributes for that
trace. When it appears as an attribute of a <traceGroup>
element, the brushRef specifies the common brush attributes for all
traces enclosed in the <traceGroup>
. Within the
<traceGroup>
, an individual trace may still override the
traceGroup's brush attributes using a brushRef attribute.
Brush attributes can also be associated with a context by including
the brushRef attribute on a <context>
element. Any traces which
reference the context using a contextRef attribute are assigned the
brush attributes defined by the context. If a trace includes both
brushRef and contextRef attributes, the brushRef overrides any brush
attributes given by the contextRef.
In streaming ink markup, brushes are assigned to a trace according
to the current brush, which can be set using the
<context>
and <brush>
elements. See section 4.2 for a detailed description of streaming
mode.
This section describes the <context>
element and its
attributes: canvas, mapping
traceFormatRef, and brushRef. The
context element both defines the shared context (canvas) and serves as
a convenient agglomeration of contextual attributes. It is used by the
<traceGroup>
(Section 2.3) element to define the complete
shared context of a group of traces or may be referred to as part of a
context change in streaming mode. In either mode, individual
attributes may be overridden at time of use. Additionally, individual
traces may refer to a previously defined context (again optionally
overriding its attributes) to describe a context change that persists
only for the duration of that trace.
Although the use of the <context>
element and attributes
is strongly encouraged, default interpretations are provided so that
they are not required in an ink markup file if all trace data is
recorded in the same virtual coordinate system, and its relationship
to digitizer coordinates is either not needed or unknown.
A shared context, called a canvas, is needed for the ink markup to support screen sharing amongst multiple devices, each of which might have a different set of capture characteristics. For example, a single ink markup stream or file may contain traces that are captured on a tablet computer, a PDA device, and an opaque graphics tablet attached to a desktop computer. The size of these traces on each capture device and corresponding display might differ, yet it may be necessary to relate these traces to one another. They could represent scribbles on a shared electronic whiteboard, annotations of a common document, or the markings of two players in a distributed tic-tac-toe game.
The trace data for these different ink sessions could be recorded
using the same set of virtual coordinates; however, it is often useful
and occasionally may even be necessary to record the data in the
capture device coordinates, in order to more precisely represent the
original capture conditions, for compactness, or to avoid round-off
errors that might be associated with the use of a common coordinate
system. Thus the mapping; (section 3.3.2) from trace
coordinates to the shared canvas coordinates may vary from device to
device. The <traceFormat>
(Section 2.1) used to
record trace data may also vary, therefore the
<context>
element also contains a
traceFormatRef attribute.
Finally, the <context>
element provides a
brushRef attribute to record the attributes of the pen
during the capture of the digital ink, for a particular
context.
In order to render data from a participant in a multi-party ink app, it is necessary to know how to transform trace data to screen coordinates.
Each party may have a different coordinate system for their traces.
Each party will need a mapping to their display that allows scrolling
and zooming. Call this S[k]
.
Party k
still needs to determine the meaning of the traces from
party i
. This is most simply accomplished by having each party define
the relationship between their trace coordinate system, and an
arbitrary reference coordinate system.
This virtual coordinate system does not have any physical dimensions, because each party will render it differently, and each person will draw onto it differently, with arbitrary zoom and scrolling. Thus the virtual coordinate system is arbitrary.
This virtual coordinate system is provided by the canvas, declared via the canvas attribute. This uniquely identifies a shared virtual coordinate system for cooperating ink applications. Together with the trace-to-canvas coordinate mapping (discussed below), it provides a common frame of reference for ink collected in multiple sessions on different devices. In the example above, trace data collected from the tablet computer can be combined with trace data collected from the PDA by specifying a common canvas and describing the relationships between each device's trace data and the common canvas coordinate system.
In the ink markup, the canvas is an unbounded space oriented so that x and y coordinates increase as one moves to the right and down, respectively. Specifying a standard handedness for the canvas coordinate system allows each device to orient and display ink from every other device.
To collaborate in the multi-party ink exchange, party k
needs to
know the orientation and handedness of the virtual coordinate system
(in order to determine their own local S[k]
), and the mapping of each
other party's data to that virtual coordinate system. Call these
mappings T[i]
To map from trace coordinates to screen coordinates, we compose the
transform from party i to virtual space with my transform from virtual
space to screen space, S[k]
. This is M = T * S
. This matrix is used to
transform all points from that traceGroup.
When the display is zoomed or scrolled, S[k]
changes, and M
is recomputed.
When a new traceGroup with a different T[i]
is encountered, it is
composed with S[k]
, and rendering continues.
The S[k]
matrix is not part of the inkML file, but is determined locally
during capture or rendering.
T
and S
are the minimum necessary information to be able to render
some data. However, in order to determine S or T, it is also necessary to make
a decision about the orientation of the virtual space. If everyone
makes this determination independently, there is no common virtual
space. Consequently, the virtual space, or canvas is
defined to have a specific orientation.
The orientation of this canvas does not effect anyone, as it
disappears when T
and S
are composed. It simply provides a common
intermediate space that everyone uses when computing T
(which goes
into the xml) and S
(which is used only to display the data).
Since a canvas identifier is a simple string, the id of the default canvas is defined to be "default". This is sufficient to allow simple single-canvas sharing without further action on the part of devices or applications.
The trace-to-canvas coordinate system mapping, declared via the mapping attribute, defines the transformation from trace coordinates to the shared canvas coordinate system.
The trace-to-canvas coordinate system mapping is expressed as a standard 2x3 2D transformation matrix (at this time, we ignore the additional complication of nonlinearity in the digitizing device's coordinate system). The default mapping is the identity matrix (with a zero offset).
The format of the trace data--both the mapping from digitizer to
trace coordinates and the channels and channel formats present in the
data--for a given context is specified via the
traceFormatRef attribute, which refers to a
<traceFormat>
element (Section 3.x).
Note: As it is primarily intended as an input specification, the ink markup language does not provide a mechanism for representing the transformations to screen or view coordinates, which relate to ink display and are typically transient.
The trace format to associate with the context being defined is
specified with a traceFormatRef
attribute, which refers
to a <traceFormat>
element (Section 2.1).
The brush to associate with the context being defined is specified with a
brushRef
attribute, which refers to a <brush>
element (Section 3.3).
The <context>
element consolidates all salient characteristics of one or more ink
traces. It may be specified by declaring all non-default attributes,
or by referring to a previously defined context and overriding
specific attributes.
<context id="" contextRef="" canvas="" mapping="" traceFormatRef="" brushRef=""/>
id | A unique identifier for this context. |
---|---|
contextRef | A previously defined context upon which this context is to be based. |
canvas | The unique identifier of the canvas for this context. |
mapping | The standard 2x3 matrix representation of the transformation from the trace data
coordinates to the canvas; expressed as the six values of the transformation matrix in row
order xx xy x0 yx yy y0 . |
traceFormatRef | A reference to the traceFormat for this context. |
brushRef | A reference to the brush for this context. |
<context id="context1" canvas="canvas1" traceFormatRef="format1" brushRef="brush1"/> <context id="context2" contextRef="context1" brushRef="brush2"/> <context id="context3" canvas="canvas1" mapping="2 0 0 0 2 0" traceFormatRef="format2" brushRef="brush3"/>
The first example is a hypothetical device #1, using a previously defined format1 and brush1, and indicating that it can share trace data using canvas1. Its trace coordinates are mapped to this shared canvas using the default identity matrix with zero offset.
The second example is the same device #1, using a different brush: brush2.
The third example is a hypothetical device #2, using previously defined format2 and brush3, and sharing trace data with the first device by using the common canvas1. Its trace coordinates require a scale factor of 2 to map to the canvas.
The <defs>
element is a container which is used
to define reusable content. The definitions within a
<defs>
block can be referenced by other elements
using the appropriate syntax. Content within a
<defs>
has no impact on the interpretation of
traces, unless referenced from outside the
<defs>
. In order to allow them to be
referenced, elements within a <defs>
block must
include an id; attribute. Therefore, an element which is
defined inside a <defs>
without an id, or
that is never referenced, serves no purpose.
The three elements which can be defined inside a
<defs>
are: <context>
,
<brush>
and <traceFormat>
.
The attributes which are used to reference these definitions are the
associated contextRef, brushRef and
traceFormatRef attributes. The following simple example
illustrates usage of the <defs>
element.
<ink> <defs> <brush id="redPen"/> <brush id="bluePen"/> <traceFormat id="normal"/> <traceFormat id="noForce"/> <context id="context1" brushRef="redPen" traceFormatRef="normal"/> <context id="context2" contextRef="context1" brushRef="bluePen"/> </defs> <context contextRef="context2" traceFormatRef="noForce"/> <context id="context3"/> </ink>
More details on the usage of the <defs>
element are provided in section 4.
The ink markup is expected to be utilized in many different scenarios. Ink markup data may be transmitted in substantially real time while exchanging ink messages, or ink documents may be archived for later retrieval or processing.
These examples illustrate two different styles of ink generation and usage. In the former, the markup must facilitate the incremental transmission of a stream of ink data, while in the latter, the markup should provide the structure necessary for operations such as search and interpretation. In order to support both cases, InkML provides archival and streaming modes of usage.
In archival usage, contextual elements are defined within a <defs>
element and assigned identifiers using the id attribute. References to
defined elements are made using the corresponding brushRef,
traceFormatRef, and contextRef attributes. The following example:
<defs> <brush id="penA"/> <brush id="penB"/> <traceFormat id="fmt1"> <regularChannels> <channel name="X" type="integer"> <channel name="Y" type="integer"> <channel name="Z" type="integer"> </regularChannles> </traceFormat> <context id="context1" canvas="canvasA" mapping="1 0 0 0 1 0" traceFormatRef="fmt1" brushRef="penA"/> <context id="context2" canvas="canvasA" mapping="2 0 0 0 2 0" traceFormatRef="fmt1" brushRef="penB"/> </defs>
defines two brushes ("penA" and "penB"), a traceFormat ("fmt1"), and
two contexts ("context1" and "context2") which both refer to the same
canvas ("canvasA") and traceFormat ("fmt1"), but with different
mappings and brushes. Note the use of the brushRef and traceFormatRef
attributes to refer to the previously defined <brush>
and <traceFormat>
.
Within the scope of a <defs>
element, unspecified attributes of a
<context>
element are assumed to have their default values. This
<defs>
block:
<defs> <brush id="penA"> <context id="context1" canvas="canvasA" brushRef="penA"/> </defs>
defines "context1", which is comprised of "canvasA" with the default mapping and traceFormat (the identity mapping and a traceFormat consisting of decimal X-Y coordinate pairs), and "penA".
A <context>
element can inherit and override the values of a
previously defined context by including a contextRef attribute, so:
<defs> <brush id="penA"/> <context id="context1" canvas="canvasA" mapping="1 0 0 0 1 0"/> <context id="context2" contextRef="context1" mapping="2 0 0 0 2 0" brushRef="penA"/> </defs>
defines "context2" which shares the same canvas ("canvasA") and traceFormat (the default format) as "context1", but has a different mapping and brush.
Within archival ink markup, traces can either explicitly specify their context through the use of contextRef and brushRef attributes, or they can have their context provided by an enclosing traceGroup. In the following:
<trace id="t001" contextRef="context1"/>...</trace> <trace id="t002" brushRef="penA"/>...</trace> <traceGroup contextRef="context1"> <trace id="t003">...</trace> </traceGroup>
traces "t001" and "t003" have the context defined by "context1", while trace "t002" has a context consisting of the default canvas, mapping and traceFormat, and "penA".
Traces within a <traceGroup>
element can also override the
context or brush specified by the traceGroup. In this example:
<traceGroup contextRef="context1"> <trace id="t001">...</trace> <trace id="t002" brushRef="penA">...</trace> <trace id="t003">...</trace> </traceGroup>
traces "t001" and "t003" have their context specified by "context1" while trace "t002" overrides the default brush of "context1" with "penA".
A trace or traceGroup can both reference a context and override its brush, as in the following:
<trace id="t001" contextRef="context1" brushRef="penA">...</trace> <traceGroup contextRef="context1" brushRef="penA"> <trace id="t002">...</trace> </traceGroup>
which assigns the context specified by "context1" to traces "t001" and "t002", but with "penA" instead of the default brush.
In archival mode, the ink markup processor can straightforwardly
determine the context for a given trace by examining only the
<defs>
blocks within the markup and the enclosing traceGroup for
the trace.
In streaming ink markup, changes to trace context are expressed
directly using the <brush>
, <traceFormat>
, and
<context>
elements. This corresponds to an event-driven model of
ink generation, where events which result in contextual changes map
directly to elements in the markup.
In the streaming case, the current context consists of the set of
canvas, mapping, traceFormat and brush which are associated with
subsequent traces in the ink markup. Initially, the current context
contains the default canvas, an identity mapping, the default
traceFormat, and a brush with no attributes. Each <brush>
,
<traceFormat>
, and <context>
element which appears outside
of a <defs>
element changes the current context accordingly
(elements appearing within a <defs>
block have no effect on the
current context, and behave as described above in the archival
section).
The appearance of a <brush>
element in the ink markup sets
the current brush attributes, leaving all other contextual values the
same. Likewise, the appearance of a <traceFormat>
element sets
the current traceFormat, and the appearance of a <context>
element sets the current context.
Outside of a <defs>
block, any values which are not specified
within a <context>
element are taken from the current context.
For instance, the <context>
element in the following example
changes the current brush from "penB" to "penA", leaving the canvas,
mapping, and traceFormat unchanged from trace "t001" to trace
"t002".
<brush id="penA"/> <brush id="penB"/> <trace id="t001">...</trace> <context brushRef="penA"/> <trace id="t002">...</trace>
In order to change a contextual value back to its default value, its attribute can be specified with the value "". In the following:
<context canvas="canvasA" brushRef="penA"/> <trace id="t001">...</trace> <context canvas="" brushRef=""/> <trace id="t002">...</trace>
trace "t001" is on "canvasA" and has the brush specified by "penA", while trace "t002" is on the default canvas and has the default brush.
Brushes, traceFormats, and contexts which appear outside of a
<defs>
block and contain an id attribute both set the current
context and define contextual elements which can be reused (as shown
above for the brushes "penA" and "penB"). This example:
<context id="context1" canvas="canvasA" mapping="2 0 0 0 2 0" traceFormatRef="fmt1" brushRef="penA"/>
defines a context which can be referred to by its identifier
"context1". It also sets the current context to the values specified
in the <context>
element.
A previously defined context is referenced using the contextRef
attribute of the <context>
element. For example:
<context contextRef="context1"/>
sets the current context to have the values specified by
"context1". A <context>
element can also override values of a
previously defined context by including both a contextRef attribute
and canvas, mapping, traceFormatRef or brushRef attributes. The
following:
<context contextRef="context1" brushRef="penB"/>
sets the current context to the values specified by "context1", except that the current brush is set to "penB" instead of "penA".
A <context>
element which inherits and overrides values from
a previous context can itself be reused, so the element:
<context id="context2" contextRef="context1" brushRef="penB"/>
defines "context2" which has the same context values as "context1" except for the brush.
Finally, a <context>
element with only an id has the effect
of taking a "snapshot" of the current context which can then be
reused. The element:
<context id="context3"/>
defines "context3", whose values consist of the current canvas, mapping, traceFormat, and brush at the point where the element occurs (note that since "context3" does not specify any values, the element has no effect on the current context).
An advantage of the streaming style is that it is easier
to express overlapping changes to the individual elements of the
context. However, determining the context for a particular trace can
require more computation from the ink markup processor, since the
entire file may need to be scanned from the beginning in order
to establish the current context at the point of the <trace>
element.
The following examples of archival and streaming ink markup data are equivalent, but they highlight the differences between the two styles:
Archival
<ink> ... <defs> <brush id="penA"/> <brush id="penB"/> <context id="context1" canvas="canvas1" mapping="1 0 0 0 1 0" traceFormatRef="format1"/> <context id="context2" contextRef="context1" mapping="2 0 50 0 2 50"/> </defs> <traceGroup contextRef="context1"> <trace>...</trace> ... </traceGroup> <traceGroup contextRef="context2"> <trace>...</trace> ... </traceGroup> <traceGroup contextRef="context2" brushRef="penB"> <trace>...</trace> ... </traceGroup> <traceGroup contextRef="context1" brushRef="penB"> <trace>...</trace> ... </traceGroup> <traceGroup contextRef="context1" brushRef="penA"> <trace>...</trace> ... </traceGroup> </ink>
Streaming
<ink> ... <defs> <brush id="penA"/> <brush id="penB"/> </defs> <context id="context1" canvas="canvas1" mapping="1 0 0 0 1 0" traceFormatRef="format1"/> <trace>...</trace> ... <context id="context2" contextRef="context1" mapping="2 0 50 0 2 50"/> <trace>...</trace> ... <context brushRef="penB"/> <trace>...</trace> ... <context contextRef="context1"/> <trace>...</trace> ... <context brushRef="penA"/> <trace>...</trace> ... </ink>
In the archival case, the context for each trace is simply
determined by the <trace>
element, its enclosing traceGroup, and
contextual elements defined in the <defs>
block, while in the
streaming case, the context for a trace can depend on the entire
sequence of context changes up to the point of the <trace>
element.
However, the streaming case more simply expresses the changes of context involving "penB", "context1", and "penA", whereas the archival case requires the restatement of the unchanged values in the successive traceGroups.
The two styles of ink markup are equally expressive, but impose different requirements on the ink markup processor and generator. The working group is considering the usefulness of additional mechanisms for distinguishing between the two forms, such as separate profiles for archival and streaming ink markup. Tools to translate from streaming to archival style might also be of use to applications which work on stored ink markup.
The <traceRefGroup>
element provides the basis for most
semantic labelling of groups of traces. It should be used as the
base class for all application specific elements that identify
collections of traces.
The <traceRefGroup>
element has the
following syntax:
<traceRefGroup id="" contentCategory=""> <traceref xpath=""> <traceref xpath="" from="" to=""> <traceRefGroup id=""> <!-- a nested traceRefGroup, which has attributes of all parent traceRefGroups --> ... </traceRefGroup> </traceRefGroup>
Traces listed within a <traceRefGroup>
are included by
reference only. The xpath attribute of the <traceRef>
element is used to refer to traces within the current document, or
from external documents. The from and to attributes can be used
to reference a (contiguous) subset of the points within a given
trace.
<traceRefGroup>
elements may also include other
<traceRefGroup>
elements by reference. A
<traceRefGroup>
element may be overlapping, i.e., a
trace may be referenced in multiple groups.
<traceRefGroup>
elements will typically be used
either to tag a group of traces for further processing, to tag a group
of traces with some metadata, or to provide a concise reference to a
group of traces for external use.
One of the common attributes of <traceRefGroup>
will be contentCategory, which describes at a basic level the
category of content that the traces represent; e.g., "Text/English",
"Drawing", "Math", "Music". Such categories are useful for general
data identification purposes, and may be essential for selecting data
to train handwriting recognizers in different problem domains.
A number of likely, common categories are suggested below. However, since this attribute:
it is defined as a general-purpose string, to be used as necessary by applications. If, however, the data fits conveniently into one of the following basic categories, it is recommended that the appropriate suggested category (and optional sub-category) be used.
Suggested categories:
The language specification may be made using any of the language identifiers specified in ISO 639, using 2-letter codes, 3-letter codes, or country names. Some text may also require a script specification (such as Kanji, Katakana, or Hiragana) in addition to the language.
For some applications it may be useful to provide additional sub-categories defining the type of the data.
Suggested sub-categories for Text:
Suggested sub-categories for Drawing: