Tibetan script Layout Requirements

Abstract

This document describes or points to requirements for the layout and presentation of text in languages that use the Tibetan script. The target audience is developers of Web standards and technologies, such as HTML, CSS, Mobile Web, Digital Publications, and Unicode, as well as implementers of web browsers, ebook readers, and other applications that need to render Tibetan script text.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document describes the basic requirements for Tibetan script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications about how to support users of Tibetan script languages. Currently the document focuses on the Tibetan script as used for Tibetan script. The information here is developed in conjunction with a document that summarises gaps in support on the Web for Tibetan script.

The editor's draft of this document is being developed by the Tibetan Layout Task Force, part of the W3C Internationalization Interest Group. It is published by the Internationalization Working Group. The end target for this document is a Working Group Note.

To make it easier to track comments, please raise separate issues or emails for each comment, and point to the section you are commenting on using a URL.

This document was published by the Internationalization Working Group as a Group Draft Note using the Note track.

Group Draft Notes are not endorsed by W3C nor its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The W3C Patent Policy does not carry any licensing requirements or commitments on this document.

This document is governed by the 03 November 2023 W3C Process Document.

The initial information in this document was provided by Richard Ishida, drawing on the structure and text in Tibetan Orthography Notes.

Some additional information was based on a talk by Jianxin Yin.

胡春明 (Chunming Hu) prepared an early translation of parts of this document (now removed).

This document has been developed with contributions from participants of the Chinese Layout Requirement Task Force, with kind help from experts from 信标委中文信息处理分技术委员会及藏文信息处理工作组.

See also the GitHub contributors list for the Tibetan Asia Language Enablement project, and the discussions related to Tibetan.

The aim of this document is to describe the basic requirements for Tibetan script layout and text support on the Web and in eBooks. These requirements provide information for Web technologies such as CSS, HTML and digital publications, and for application developers, about how to support users of the Tibetan script.

The document focuses on typographic layout issues. For a deeper understanding of Tibetan script using the Tibetan script and how it works see Tibetan Orthography Notes, which includes topics such as: Phonology, Vowels, Consonants, Encoding choices, and Numbers.

This document should contain no reference to a particular technology. For example, it should not say "CSS does/doesn't do such and such", and it should not describe how a technology, such as CSS, should implement the requirements. It is technology agnostic, so that it will be evergreen, and it simply describes how the script works. The gap analysis document is the appropriate place for all kinds of technology-specific information.

This document should be used alongside a separate document, Tibetan script Gap Analysis, which describes gaps in support for Tibetan script on the Web, and prioritises and describes the impact of those gaps on the user.

Gap reports are brought to the attention of spec and browser implementers, and are tracked via the Gap Analysis Pipeline. (Filter it for Tibetan)

To complement any content authored specifically for this document, the sections in the document also point to related, external information, tests, GitHub discussions, etc.

The document Language enablement index points to this document and others, and provides a central location for developers and implementers to find information related to various scripts.

The W3C also has a repository with discussion threads related to the Tibetan script, including requests from developers to the user community for information about how scripts/languages work, and a notification system that tracks issues in W3C working groups related to the Tibetan script. See a list of unresolved questions for Tibetan script experts. Each section below points to related discussions. See also the repository home page.

Tibetan can be written using two different styles: དབུ་ཅན dbu can with a head, the block style of the Tibetan script used in print, pronounced u.cen; and དབུ་མེད dbu med headless, the cursive style of the Tibetan script used in shorthand and calligraphy, pronounced u.me. This page concentrates on the former. Pronunciations are based on the central, Lhasa dialect.

Historically, Tibetan text was written on loose-leaf sheets called pechas, ( དཔེ་ཆ pé.t͡ɕʰá book, scripture ). Some of the characters used and formatting approaches are different in books and pechas.

Tibetan text runs left to right in horizontal lines.

Words boundaries are not indicated. However, Tibetan words are made up of one or more units called tsheg-bar which are basically equivalent to phonological syllables. The tsheg-bar units are separated using ་ U+0F0B TIBETAN MARK INTERSYLLABIC TSHEG.

These tsheg-bar units are composed of structural elements that include vowel signs and consonants used as prefixes, root characters, subscripts, superscripts, suffixes, and secondary suffixes. A common realisation includes a stack and additional consonants to either side of the root consonant. These may indicate syllable-final consonant sounds, but more often than not they qualify or modify the root value, and are not associated with their nominal sound value. The actual pronunciation of Tibetan is usually much more simple than a typical romanisation would suggest. For example, the word བཀོད kǿː to create is transcribed as bkod.

རྒྱུད་ — Figure 1 The single-syllable word cy᷈ː string with an initial stack of three consonants plus a vowel sign. followed by a suffix consonant (to the right).

To write the sounds of the standard Lhasa dialect, Tibetan uses 28 consonant letters (plus their subjoined forms). 6 more letters are used to write Sanskrit.

A distinguishing feature of Tibetan is the set of separate code points for subjoined consonants, used to create consonant stacks. Of the 77 combining characters in the Tibetan block, 48 represent subjoined consonant forms. Unlike many other Indic scripts, the modern Tibetan orthography doesn't use a virama to create stacks.

Tibetan is an abugida with one inherent vowel. When writing the Lhasa dialect, other post-consonant vowels are represented using 4 vowel signs, all combining marks.

There are no pre-base, circumgraph, or multipart vowels in the Tibetan used to write the Llasa dialect (though there are when writing in Sanskrit).

Standalone vowels are written by adding vowel signs to either འ U+0F60 TIBETAN LETTER -A or ཨ U+0F68 TIBETAN LETTER A, depending on the tone.

Sanskrit vowels written in Tibetan use additional vowel signs and combining marks, some of which represent diphthongs, and some of which form circumgraphs or multipart characters, depending on the encoding.

Tone is indicated by the choice of root character and/or its associated prefixes and superscripts.

Modern Tibetan writing uses few punctuation marks or symbols, but the Tibetan script block in Unicode contains many of these.

Tibetan has its own set of numbers.

The following diagram shows characters in all of the syllabic positions, and lists the characters that can appear in each of the non-root locations. The two-syllable word in the example is འགྲེམས་སྟོན 'grems-ston ɖɹemton exhibition.

Picture of syllable composition. — Figure 2 Syllable composition in Tibetan

See more information about how the various parts of the tsheg-bar work together.

Requirements

List of system fonts

GitHub discussions

Type samples

Tests

Gap analysis

Fonts & font styles

LE Index

Fonts & font styles

Requirements

Tibetan Orthography Notes: Context-based shaping & positioning

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Context-based shaping & positioning

LE Index

Context-based shaping & positioning

Requirements

Tibetan Orthography Notes: Letterform slopes, weights, & italics

GitHub discussions

Type samples

Tests

Gap analysis

Letterform slopes, weights, & italics

LE Index

Letterform slopes, weights, & italics

Requirements

Tibetan Orthography Notes: Vowels • Consonants
Character usage: Tibetan

GitHub discussions

Type samples

W3C type samples
r12a type samples: consonants • vowels

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Characters & encoding

LE Index

Characters & encoding

Requirements

Tibetan Orthography Notes: Word boundaries

GitHub discussions

Type samples

Tools

Grapheme segmenter

Tests

Gap analysis

Grapheme/word segmentation & selection

LE Index

Grapheme/word segmentation & selection

Requirements

Tibetan Orthography Notes: Phrase & section boundaries

GitHub discussions

Type samples

W3C type samples
r12a type samples: phrases • bracketing

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Phrase & section boundaries

LE Index

Phrase & section boundaries

Requirements

Wikipedia: Quotation mark

GitHub discussions

Type samples

Tests

Gap analysis

Quotations & citations

LE Index

Quotations & citations

Requirements

Tibetan Orthography Notes: Emphasis & highlighting

GitHub discussions

Type samples

Tests

Gap analysis

Emphasis & highlighting

LE Index

Emphasis & highlighting

Modern texts tend to bold text for emphasis.

However, ༵ U+0F35 TIBETAN MARK NGAS BZUNG NYI ZLA may also be used to create a similar effect to underlining or to mark emphasis/honorifics.

Requirements

Tibetan Orthography Notes: Abbreviation, ellipsis & repetition

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Abbreviation, ellipsis & repetition

LE Index

Abbreviation, ellipsis & repetition

Requirements

Tibetan Orthography Notes: Inline notes & annotations

GitHub discussions

Type samples

Tests

Gap analysis

Inline notes & annotations

LE Index

Inline notes & annotations

Requirements

GitHub discussions

Type samples

Tests

Gap analysis

Text decoration & other inline features

LE Index

Text decoration & other inline features

Requirements

Tibetan Orthography Notes: Numbers

GitHub discussions

Type samples

Tests

Gap analysis

Data formats & numbers

LE Index

Data formats & numbers

Requirements

Tibetan Orthography Notes: Line breaking & hyphenation
Approaches to line-breaking

GitHub discussions

Type samples

Tests

Gap analysis

Line breaking & hyphenation

LE Index

Line breaking • Hyphenation

Requirements

Tibetan Orthography Notes: Text alignment & justification
Approaches to full justification

GitHub discussions

Type samples

Tests

Gap analysis

Text alignment & justification

LE Index

Text alignment & justification

Requirements

GitHub discussions

Type samples

Tests

Gap analysis

Text spacing

LE Index

Text spacing

Requirements

Tibetan Orthography Notes: Baselines, line height, etc.

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Baselines, line-height, etc

LE Index

Baselines, line-height, etc

Requirements

Tibetan Orthography Notes: Counters, lists, etc.
Ready-made Counter Styles: Tibetan

GitHub discussions

Type samples

Tests

Gap analysis

Lists, counters, etc

LE Index

Lists, counters, etc

Tibetan numerals can be used for list counters. The Tibetan numbers are used in a simple decimal notation, ie. in the same way as European numerals; they differ only in shape.

༡ འ་ཞ་མི་རིགས་ཀྱིས་བསྐྲུན་པའི་ཤིང་གི་ཟམ་པ།

༢ ལོ་ངོ་800ཡི་ལོ་རྒྱུས་ལྡན་པའི་དགོན་རྙིང་ཆོས་པོ་དགོ།

༣ ཆི་ཅ་ཞེས་པའི་ཁྱིམ་རྒྱུད་ཀྱི་བང་སོའི་ཚོགས།

Figure 3 Examples of Tibetan counters in a list.

European numerals can also be used for list counters. The European numeral is followed by a period.

1. འ་ཞ་མི་རིགས་ཀྱིས་བསྐྲུན་པའི་ཤིང་གི་ཟམ་པ།

2. ལོ་ངོ་800ཡི་ལོ་རྒྱུས་ལྡན་པའི་དགོན་རྙིང་ཆོས་པོ་དགོ།

3. ཆི་ཅ་ཞེས་པའི་ཁྱིམ་རྒྱུད་ཀྱི་བང་སོའི་ཚོགས།

Figure 4 Examples of European numeral counters in a list.

Requirements

GitHub discussions

Type samples

Tests

Gap analysis

Styling initials

LE Index

Styling initials

Requirements

Tibetan Orthography Notes: General page layout & progression

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

General page layout & progression

LE Index

General page layout & progression

Requirements

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Grids & tables

LE Index

Grids & tables

Requirements

Tibetan Orthography Notes: Notes, footnotes, etc

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Footnotes, endnotes, etc

LE Index

Footnotes, endnotes, etc

Requirements

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Page headers, footers, etc

LE Index

Page headers, footers, etc

Requirements

GitHub discussions

Type samples

Tests

Exploratory/interactive test results (tbc)
Exploratory/interactive test repo (tbc)

Gap analysis

Forms & user interaction

LE Index

Forms & user interaction

Tibetan script Layout Requirements

Abstract

Status of This Document

1. Introduction

1.1 Contributors

1.2 About this document

1.3 Gap analysis

1.4 Other related resources

2. Tibetan script overview

2.1 Tibetan Syllables

3. All topics

4. Text direction

4.1 Vertical text

5. Glyph shaping & positioning

5.1 Fonts & font styles

5.2 Context-based shaping & positioning

5.3 Letterform slopes, weights, & italics

6. Typographic units

6.1 Characters & encoding

6.2 Grapheme/word segmentation & selection

7. Punctuation & inline features

7.1 Phrase & section boundaries

7.2 Quotations & citations

7.3 Emphasis & highlighting

7.4 Abbreviation, ellipsis & repetition

7.5 Inline notes & annotations

7.6 Text decoration & other inline features

7.7 Data formats & numbers

8. Line & paragraph layout

8.1 Line breaking & hyphenation

8.2 Text alignment & justification

8.3 Text spacing

8.4 Baselines, line height, etc.

8.5 Lists, counters, etc.

8.6 Styling initials

9. Page & book layout

9.1 General page layout & progression

9.2 Grids & tables

9.3 Footnotes, endnotes, etc

9.4 Page headers, footers, etc

9.5 Forms & user interaction

A. Change log