What is the most appropriate way to associate CSS styles with text in a particular language in a multilingual HTML or XML document?
Presentation styles are commonly used to control changes in fonts, font sizes and line heights when language changes occur in the document. This can be particularly useful when dealing with Simplified versus Traditional Chinese, where users tend to prefer different fonts, even though they may be using many of the same characters. It can also be useful to better harmonize the look of mixed, script-specific fonts, such as when mixing Arabic and Latin fonts.
This page looks at available options for doing this most effectively.
The best way to style content by language in HTML is to use the :lang
selector in your CSS style sheet. For example:
The rest of the article adds some detail about :lang
, and compares with two other approaches.
Three CSS selectors are commonly used to apply styles where the language changes in a document.
All match the value of a lang
attribute in HTML, and all are supported by major browsers (see the test results).
[lang="..."]
selector Use this selector to style an element where the lang
value exactly matches that in the selector.
The following CSS:
Will style the span
element below:
However, it will not match a span
element with the lang
value of zh-Hans
. The attribute value has to match the selector value exactly.
[lang|="..."]
selector Use this selector to style an element where the lang
value starts with the value in the selector.
The following CSS:
Will style the span
element below:
In fact, it will match any element with a lang
value that starts with the zh
language subtag, including zh
, zh-Hant
, zh-TW
, zh-Hans-CN
, etc.
A significant difference between :lang
and the other methods is that it recognizes the language of the content of an element even
if the language is declared outside the element in question.
Suppose, for example, that in a future English document containing Japanese text you wanted to style emphasized Japanese text using special Asian CSS3 properties, rather than italicization (which doesn't always work well with the complex characters of Japanese). You might have the following rules in your style sheet:
Now assume that you have the following content, that the user agent supports :lang
, and that the html
tag
states that this is an English document.
You would expect to see the emphasized English word italicized, but the emphasized Japanese word in regular text with small dots above each character, something like this:
The important point to be made in this section is that this would not be possible using the [lang|="..."]
or [lang="..."]
selectors. For those to work you would have to declare the language explicitly on each Japanese em
tag.
This is a significant difference between the usefulness of these different selectors.
The lang
attribute is used to identify the language of text served as HTML. Text served as XML should use the xml:lang
attribute.
For XHTML that is served as text/html, it is recommended that you use both attributes, since the HTML parser will pick up on the lang
attribute, whereas if you parse the content as XML the xml:lang
attribute will be used by your XML parser.
The article will first discuss the various options for styling by language in HTML, using the lang
attribute. There then follows a section about how to style XML documents based on xml:lang
.
:lang(...)
pseudo-class selectorThe HTML fragment:
could have the following styling:
The Greek and Russian use the styling set for the body
element.
This is the ideal way to style language fragments, because it is the only selector that can apply styling to the content of an element when the language of that content is declared earlier in a page.
A rule for :lang(zh)
would match elements with a language value of zh
. It would also match more specific
language specifications such as zh-Hant
, zh-Hans
and zh-TW
.
The selector :lang(zh-Hant)
will only match elements that have a language value of zh-Hant
or have inherited
that language value. If the CSS rule specified :lang(zh-TW)
, the rule would not match our sample paragraph.
[lang|="..."]
selector that matches the beginning of a value of an attributeFor markup example we saw in the previous section, the style sheet could be written as:
Unlike :lang
, this selector will only work for elements which carry a lang
attribute (see Inheritance of language values).
There is a significant difference between this selector and [lang="..."]
. Whereas [lang="..."]
will
only match elements when the selector value and the attribute value are identical, this selector value will match a language attribute
value that has additional hyphen-separated values. Therefore the selector [lang|="sl"]
would match sl-IT
, sl-nedis
or sl-IT-nedis
, and the selector [lang|="zh-Hans"]
would also match zh-Hans-CN
.
This method avoids the need to match the language declarations at all, and relies on class
or id
attribute markup. Using an ordinary CSS class or id selector works with most browsers that support CSS. The disadvantage is that adding the attributes takes up time and bandwidth.
For the markup example above, this would require us to change the HTML code by adding class
attributes as follows:
We could then have the following styling:
xml:lang
As mentioned earlier, in a document that is parsed as XML you need to use the xml:lang
attribute (rather than the lang
attribute) to express language information.
Using :lang
Use of :lang
is straightforward. If the document is parsed as HTML, the :lang
selector will match content where the language was defined using a lang
attribute value. However, if the document is parsed as XML, the :lang
selector will match content labeled with an xml:lang
attribute value and ignore any lang
attribute value.
Using attr= and attr|=
Use of these selectors involves some additional considerations.
The xml:
part of the xml:lang
attribute indicates that this is the lang
attribute used in the XML namespace. CSS3 Namespaces describes how to handle xml:lang
as an attribute in a namespace. Basically you need to declare the namespace and then replace the colon with a vertical bar. For example:
or:
Any @namespace
rules must follow all @charset
and @import
rules and precede all other non-ignored at-rules and rule sets in a style sheet. Note, also, that the URI for the namespace declaration must be exactly correct.
Fallbacks
For browsers that are not namespace aware, you can fall back to escaped characters. For this you need no @namespace
declaration, just one of the following:
or:
Note, however, that if you try to use this approach with a namespace-aware browser (ie. most recent, major browsers), it will not work, so if you feel it is needed, you should use this approach in addition to the namespace-based selectors.
I have used the language codes zh-Hant
and zh-Hans
. These language codes do not represent specific languages. zh-Hant
would
indicate Chinese written in Traditional Chinese script. Similarly zh-Hans
represents Chinese written in Simplified Chinese script. This
could refer to Mandarin or many other Chinese languages.
Until the zh-Hans
and zh-Hant
language tags were available, the codes zh-TW
and zh-CN
were used to indicate Traditional and Simplified versions of Chinese writing, respectively. This is not actually appropriate because zh-TW
indicates the Chinese language spoken in Taiwan, although more than one Chinese language is spoken there. Similarly zh-CN
really indicates a generic Chinese spoken language used in China (PRC), rather than Simplified Chinese writing. It could refer to Mandarin or any other Chinese language. The same code was also used incorrectly for the Simplified Chinese written in Singapore.
If you need to use language tags to differentiate between Chinese languages, the IANA language subtag registry has more precise language codes for a range of Chinese languages. For more information see Language tags in HTML and XML.
Getting started? Language on the Web http://www.w3.org/International/getting-started/language
Related links, Authoring web pages