ruby
elementHTMLElement
.The ruby
element allows one or more spans of
phrasing content to be marked with ruby annotations. Ruby
annotations are short runs of text presented alongside base text,
primarily used in East Asian typography as a guide for
pronunciation or to include other annotations. In Japanese, this
form of typography is also known as furigana.
The content model of ruby
elements consists of one
or more of the following sequences:
ruby
elements and with no ruby
element descendants
ruby
element, but with no further ruby
element descendants
The ruby
and rt
elements can be used
for a variety of kinds of annotations, including in particular those
described below. For more details on Japanese Ruby in particular,
and how to render Ruby for Japanese, see Requirements for
Japanese Text Layout. [JLREQ]
At the time of writing, CSS does not yet provide a
way to fully control the rendering of the HTML ruby
element. It is hoped that CSS will be extended to support the styles
described below in due course.
One or more hiragana or katakana characters (the ruby annotation) are placed with each ideographic character (the base text). This is used to provide readings of kanji characters.
<ruby>B<rt>annotation</ruby>
In this example, notice how each annotation corresponds to a single base character.
<ruby>君<rt>くん</ruby><ruby>子<rt>し</ruby>は<ruby>和<rt>わ</ruby>して<ruby>同<rt>どう</ruby>ぜず。
君子は和して同ぜず。
This is similar to the previous case: each ideographic character in the compound word (the base text) has its reading given in hiragana or katakana characters (the ruby annotation). The difference is that the base text segments form a compound word rather than being separate from each other.
<ruby>B<rt>annotation</rt>B<rt>annotation</ruby>
In this example, notice again how each annotation corresponds to a single base character. In this example, each compound word (jukugo) corresponds to a single ruby
element.
The rendering here is expected to be that each annotation be placed over (or next to, in vertical text) the corresponding base character, with the annotations not overhanging any of the adjacent characters.
<ruby>鬼<rt>き</rt>門<rt>もん</rt></ruby>の<ruby>方<rt>ほう</rt>角<rt>がく</rt></ruby>を<ruby>凝<rt>ぎょう</rt>視<rt>し</rt></ruby>する
鬼門の方角を凝視する
This is semantically identical to the previous case (each individual ideographic character in the base compound word has its reading given in an annotation in hiragana or katakana characters), but the rendering is the more complicated Jukugo Ruby rendering.
This is the same example as above for mono-ruby for compound words. The different rendering is expected to be achieved using different styling (e.g. in CSS), and is not shown here.
<ruby>鬼<rt>き</rt>門<rt>もん</rt></ruby>の<ruby>方<rt>ほう</rt>角<rt>がく</rt></ruby>を<ruby>凝<rt>ぎょう</rt>視<rt>し</rt></ruby>する
For more details on Jukugo Ruby rendering, see Appendix F in the Requirements for Japanese Text Layout. [JLREQ]
The annotation describes the meaning of the base text, rather than (or in addition to) the pronunciation. As such, both the base text and the annotation can be multiple characters long.
<ruby>BASE<rt>annotation</ruby>
Here a compound ideographic word has its corresponding katakana given as an annotation.
<ruby>境界面<rt>インターフェース</ruby>
境界面
Here a compound ideographic word has its translation in English provided as an annotation.
<ruby lang="ja">編集者<rt lang="en">editor</ruby>
編集者
A phonetic reading that corresponds to multiple base characters, because a one-to-one mapping would be difficult. (In English, the words "Colonel" and "Lieutenant" are examples of words where a direct mapping of pronunciation to individual letters is, in some dialects, rather unclear.)
In this example, the name of a species of flowers has a phonetic reading provided using group ruby:
<ruby>紫陽花<rt>あじさい</ruby>
紫陽花
Sometimes, ruby styles described above are combined.
<ruby>BASE<rt>annotation 1<rt>annotation 2</ruby>
<ruby><ruby>B<rt>a</rt>A<rt>n</rt>S<rt>t</rt>E<rt>n</rt></ruby><rt>annotation</ruby>
Here both a phonetic reading and the meaning are given in ruby annotations. The annotation on the nested ruby
element gives a mono-ruby phonetic annotation for each base character, while the annotation in the rt
element that is a child of the outer ruby
element gives the meaning using hiragana.
<ruby><ruby>東<rt>とう</rt>南<rt>なん</rt></ruby><rt>たつみ</rt></ruby>の方角
東南の方角
This is the same example, but the meaning is given in English instead of Japanese:
<ruby><ruby>東<rt>とう</rt>南<rt>なん</rt></ruby><rt lang=en>Southeast</rt></ruby>の方角
東南の方角
Within a ruby
element that does not have a
ruby
element ancestor, content is segmented and
segments are placed into three categories: base text segments,
annotation segments, and ignored segments. Ignored segments do not
form part of the document's semantics (they consist of some
inter-element whitespace and rp
elements,
the latter of which are used for legacy user agents that do not
support ruby at all). Base text segments can overlap (with a limit
of two segments overlapping any one position in the DOM, and with
any segment having an earlier start point than an overlapping
segment also having an equal or later end point, and any segment
have a later end point than an overlapping segment also having an
equal or earlier start point). Annotation
segments correspond to rt
elements. Each annotation
segment can be associated with a base text segment, and each base
text segment can have annotation segments associated with it. (In a
conforming document, each base text segment is associated with at
least one annotation segment, and each annotation segment is
associated with one base text segment.) A ruby
element
represents the union of the segments of base text it
contains, along with the mapping from those base text segments to
annotation segments. Segments are described in terms of DOM ranges;
annotation segment ranges always consist of exactly one element. [DOMCORE]
At any particular time, the segmentation and categorisation of
content of a ruby
element is the result that would be
obtained from running the following algorithm:
Let base text segments be an empty list of base text segments, each potentially with a list of base text subsegments.
Let annotation segments be an empty list of annotation segments, each potentially being associated with a base text segment or subsegment.
Let root be the ruby
element for which the algorithm is being run.
If root has a ruby
element
ancestor, then jump to the step labeled end.
Let current parent be root.
Let index be 0.
Let start index be null.
Let parent start index be null.
Let current base text be null.
Start mode: If index is equal to or greater than the number of child nodes in current parent, then jump to the step labeled end mode.
If the indexth node in current parent is an rt
or
rp
element, jump to the step labeled annotation
mode.
Set start index to the value of index.
Base mode: If the indexth node in
current parent is a ruby
element,
and if current parent is the same element as
root, then push a ruby level and
then jump to the step labeled start mode.
If the indexth node in current parent is an rt
or
rp
element, then set the current base
text and then jump to the step labeled annotation
mode.
Increment index by one.
Base mode post-increment: If index is equal to or greater than the number of child nodes in current parent, then jump to the step labeled end mode.
Jump back to the step labeled base mode.
Annotation mode: If the indexth
node in current parent is an rt
element, then push a ruby annotation and jump to the
step labeled annotation mode increment.
If the indexth node in current parent is an rp
element, jump
to the step labeled annotation mode increment.
If the indexth node in current parent is not a Text
node, or
is a Text
node that is not inter-element
whitespace, then jump to the step labeled base
mode.
Annotation mode increment: Let lookahead index be index plus one.
Annotation mode white-space skipper: If lookahead index is equal to the number of child nodes in current parent then jump to the step labeled end mode.
If the lookahead indexth node in current parent is an rt
element or an
rp
element, then set index to lookahead index and jump to the step labeled
annotation mode.
If the lookahead indexth node in current parent is not a Text
node, or
is a Text
node that is not inter-element
whitespace, then jump to the step labeled base mode
(without further incrementing index, so the
inter-element whitespace seen so far becomes part of
the next base text segment).
Increment lookahead index by one.
Jump to the step labeled annotation mode white-space skipper.
End mode: If current parent is not the same element as root, then pop a ruby level and jump to the step labeled base mode post-increment.
End: Return base text segments
and annotation segments. Any content of the
ruby
element not described by segments in either of
thost lists is implicitly in an ignored segment.
When the steps above say to set the current base text, it means to run the following steps at that point in the algorithm:
Let text range a DOM range whose start is the boundary point (current parent, start index) and whose end is the boundary point (current parent, index).
Let new text segment be a base text segment described by the range annotation range.
Add new text segment to base text segments.
Let current base text be new text segment.
Let start index be null.
When the steps above say to push a ruby level, it means to run the following steps at that point in the algorithm:
Let current parent be the indexth node in current parent.
Let index be 0.
Set saved start index to the value of start index.
Let start index be null.
When the steps above say to pop a ruby level, it means to run the following steps at that point in the algorithm:
Let index be the position of current parent in root.
Let current parent be root.
Increment index by one.
Set start index to the value of saved start index.
Let saved start index be null.
When the steps above say to push a ruby annotation, it means to run the following steps at that point in the algorithm:
Let rt be the rt
element
that is the indexth node of current parent.
Let annotation range a DOM range whose start is the boundary point (current parent, index) and whose end is the boundary point (current parent, index plus one) (i.e. that contains only rt).
Let new annotation segment be an annotation segment described by the range annotation range.
If current base text is not null, associate new annotation segment with current base text.
Add new annotation segment to annotation segments.
In this example, each ideograph in the Japanese text 漢字 is annotated with its reading in hiragana.
...
<ruby>漢<rt>かん</rt>字<rt>じ</rt></ruby>
...
This might be rendered as:
In this example, each ideograph in the traditional Chinese text 漢字 is annotated with its bopomofo reading.
<ruby>漢<rt>ㄏㄢˋ</rt>字<rt>ㄗˋ</rt></ruby>
This might be rendered as:
In this example, each ideograph in the simplified Chinese text 汉字 is annotated with its pinyin reading.
...<ruby>汉<rt>hàn</rt>字<rt>zì</rt></ruby>...
This might be rendered as:
In this more contrived example, the acronym "HTML" has four annotations: one for the whole acronym, briefly describing what it is, one for the letters "HT" expanding them to "Hypertext", one for the letter "M" expanding it to "Markup", and one for the letter "L" expanding it to "Language".
<ruby> <ruby>HT<rt>Hypertext</rt>M<rt>Markup</rt>L<rt>Language</rt></ruby> <rt>An abstract language for describing documents and applications </ruby>