How should I set the language of the content in my HTML page?
This page describes how to mark up an HTML page so that it gives information about the language of the page. It begins with an overall summary, then provides additional details in subsequent sections.
Always use a language attribute on the html
tag to declare the default language of the text in the page. This is inherited by all other elements. For example:
Note that you should use the html
element rather than the body
element, since the body
element doesn't cover the text inside the document's head
element.
When the page contains content in another language, add a language attribute to an element surrounding that content. This allows you to style or process it differently. For example:
Use the lang
attribute for pages served as HTML. (For pages served as XML, including XHTML 1.x and HTML5 polyglot documents, see Choosing the right attribute.)
Use language tags from the IANA Language Subtag Registry. You can find subtags using the unofficial Language Subtag Lookup tool. (more)
In some parts of your code you may have a problem. If you have multilingual text in the title
element, you cannot mark up parts of the text for different languages because the title
attribute only allows characters – no markup. The same goes for multiple languages in attribute values. There is no good solution for this at the moment.
Use nested elements to take care of content and attribute values on the same element that are in different languages. (more)
You should never use a meta
element with the http-equiv
attribute set to Content-Language
to indicate the language of a page, but in certain circumstances you may want to serve language information with the HTTP header to indicate the intended audience of your page. Whether or not you use the HTTP header, you should always declare the language of the text in a page using a language attribute on the html
tag. For more information see the companion article, HTTP headers, meta
elements and language information.
This section provides more detailed information on a variety of topics related to declaring language in HTML.
Occasionally the language of the text in an
attribute and the element content are in different languages. For example, at the top right corner of this article there are links to translated versions of this page. The
link text shows the language of the target page using the language of the target page, but an associated title
attribute contains a hint in the
language of the current page:
If your code looks as follows, the language
attributes would actually indicate that not only the content but also the title
attribute text is in Spanish.
This is obviously incorrect.
Instead, move the attribute containing text in a different language to another element, as shown in this example, where the a
element inherits the default en
setting of the html
element.
If you want to specify the language of some content but there is no markup around it, use an element such as span
, bdi
or div
around the content.
Here is an example:
To be sure that all user agents recognize which language you mean, you need to follow a standard approach when providing language attribute values. You also need to consider how to refer in a standard way to dialectal differences between languages, such as the difference between US English and British English, which diverge significantly in terms of spelling and pronunciation.
The rules for creating language attribute values are described by an IETF specification called BCP 47. In addition to specifying how to use simple language tags, such as en
for English or fr
for French, BCP 47 describes
how to compose language tags that allow you specify regional dialects, scripts and other variants related to that language.
BCP 47 incorporates, but goes beyond, the ISO sets of language and country codes. To find relevant codes you should consult the IANA Language Subtag Registry.
For a gentle but fairly thorough introduction to the syntax of BCP 47 tags, read Language tags in HTML and XML. For help in choosing the right language tag out of the many possible tags and combinations, see Choosing a language tag.
If your document is HTML (ie. served as text/html
), use the lang
attribute to set the language of the
document or a range of text. For example, the following sets the default language to French:
When serving XHTML 1.x or polyglot pages as text/html
, use both the lang
attribute and the xml:lang
attribute together every time you want to set the language. The xml:lang
attribute is the standard way to identify language information in XML. Ensure that the values for both attributes are identical.
The xml:lang
attribute is not actually useful for handling the file as HTML, but takes over from the lang
attribute any time you process or serve the document as XML. The lang
attribute is allowed by the syntax of XHTML, and may also be recognized by browsers. When using other XML parsers, however (such as the lang()
function in XSLT) you can't rely on the lang
attribute being recognized.
If you are serving your page as XML (ie. using a MIME type such as application/xhtml+xml
), you do
not need the lang
attribute. The xml:lang
attribute alone will suffice.
The information in this section is less likely to be useful, but is provided for completeness.
In addition to including an in-page language attribute on the html
tag (which you should always do), you may also have come across language declarations in the HTTP header (which is served with the page), or as meta
elements.
Importantly, the in-page language declaration always overrides the HTTP information when it comes to determining the actual language of the text, but the HTTP information may provide more general information about the intended use of the resource. Use of meta
elements in the HTML page for declaring language is not recommended.
For information about Content-Language
in HTTP and in meta
elements see HTTP headers, meta
elements and language information.
Just for good measure, and for the sake of thoroughness, it is perhaps worth mentioning a few other points that are not relevant to this discussion.
Firstly, it is not possible to declare the language of text using CSS.
Secondly, the DOCTYPE
that should start any HTML file may contain what looks to some people like a language declaration. The DOCTYPE
in the example below contains the text EN, which stands for 'English'. This, however, indicates the language of the schema associated with this document – it has nothing to do with the language of the document itself.
Thirdly, sometimes people assume that information about natural language could be inferred from the character encoding. However, a character encoding does not enable unambiguous identification of a natural language: there must be a one-to-one mapping between encoding and language for this inference to work, and there isn't one. For example, a single character encoding could be used for many languages, eg. Latin 1 (ISO-8859-1) could encode both French and English, as well as a great many other languages. In addition, the character encoding can vary over a single language, for example Arabic could use encodings such as 'Windows-1256' or 'ISO-8859-6' or 'UTF-8'.
All these encoding examples, however, are nowadays moot, since all content should be authored in UTF-8, which covers all but the rarest of languages in a single character encoding.
The same goes for text direction. As with encodings and language, there is not always a one-to-one mapping between language and script, and therefore directionality. For example, Azerbaijani can be written using both right-to-left (Arabic) and left-to-right (Latin or Cyrillic) scripts, and the language code az
can be relevant for either. In addition, text direction markup used with inline text applies a range of different values to the text, whereas language is a simple switch that is not up to the tasks required.
Getting started? Language on the Web
Tutorial, Working with language in HTML
Related links, Authoring HTML & CSS