Warning: This page is out of date and may contain information that is now incorrect, so it has been discontinued. Note that you should use UTF-8 for all your content.
You may want to try the following page:
How do I set character encoding in my web authoring application?
Content on the web can be authored using a variety of software applications. Even within a single site, the content may have been created using multiple authoring tools. For example, a web site that was created using Macromedia Dreamweaver might also include a page created using Microsoft Access' data access page feature, as well as a dynamic Flash movie that allows for language selection. In order for all of these files to serve the correct text, they need to be properly encoded.
The purpose of this article is to identify where some of the key functionality for encoding exists within some of the more popular web authoring applications.
Specific options for setting character encodings often vary depending on the user's version, and so these are not discussed in detail for each application. For more detailed information, refer to the specific application's help content or user manuals. Common index and search keywords include Character Encoding, Internationalization, Multilingual, Unicode, and UTF.
There are two main points to remember when creating properly encoded files:
charset=iso-8859-1
in an XHTML/HTML meta
tag, or encoding="UTF-8"
in an XML declaration statement).Following are a few points to consider when using these applications:
Most of these applications will save the file in the proper format, but might not automatically apply the proper markup within the document.
Some applications automatically insert BOM markup, while others do not. The The presence of the BOM for UTF-8 encoded pages can create problems in older browsers.
While an application might be Unicode-enabled and be able to save a file in the proper format and with the proper encoding, it might not be able to handle characters beyond, for example, the Latin-1 character set.
We recommend the use of Unicode encoding because it greatly simplifies the effort of creating multilingual Web sites.
Encoding options exist within the HTML Options Table.
To specify the character encoding for new pages, go to: Edit > Preferences > Encodings category
. In GoLive CS
(Mac), the Preferences Menu is located in the main Application menu (according to Apple Spec)
To change the encoding of a page go to File > Document Encoding
. In GoLive CS you can use Edit > Document
Content > Change Encoding...
, too.
You will need to enter the proper encoding markup into the XHTML/HTML file. Files are natively saved as UTF-8.
The proper markup for encoding will need to be entered into the file. When saving the document, the proper file format can be selected
here: File > Save As > Encoding dropdown menu
.
To properly configure a ColdFusion application, become familiar with the various encoding-related commands and functions (a few of
which include setEncoding
, cfcontent
, and the form attribute enctype
).
To specify the character encoding for your pages, go to Modify > Page Properties
. Select the proper encoding from the Document Encoding dropdown menu
.
You might also need to specify the character encoding for viewing pages while editing. Go to Edit > Preferences > Fonts
category
(Dreamweaver > Preferences > Fonts category
on Mac).
When efficiently designed, multilingual Flash movies often store the text for each language in separate include files
(#include
), reducing the time needed to download a flash movie by only sending the selected language data. UTF-8 text can be stored in
an include file. The include file should start with //!-- UTF8
and must be saved in UTF-8 format.
UTF-8 character notation can also be specified in Flash's ActionScript environment. U+0065 would be written using the escape sequence \u0065
within the ActionScript code.
Another setting worth noting is the encoding setting for the end-user's Flash Player. This is defaulted to false
(system.useCodepage = false;
), which will use UTF-8. There are times when this may have been changed for some special purpose, but must
be changed back to “false” before displaying UTF-8 text again by placing the proper ActionScript in the timeline before calling any new text.
You need to enter the encoding markup into the file. When saving the file, select: File > Save As
and select the
proper encoding using the Encoding dropdown menu
.
There is also an HTML Tidy feature that validates your code as you type. When using this feature, be sure to set this to the same
encoding format. Go to: Options > Settings > CodeSweeper category > HTML Tidy CodeSweeper subcategory > Macromedia HTML subcategory
> Char encoding dropdown menu
.
The encoding options are under Language (character set)
. Go to: Tools > Page Options > Default Font
tab
(or Unicode (UTF-8) tab
). You will notice an option that says “Multilingual (UTF-8).”
Notepad on Win2k/XP offers four choices, 'ANSI' (the codepage corresponding to the default system locale), 'Unicode' (meaning UTF-16LE on ix86), 'Unicode Big endian', and UTF-8.
You will need to specify the character encoding and language when you write the markup code. When you save the document, select File > Save as
and select the proper encoding from the Encoding
dropdown menu.
Be aware that Notepad adds a signature (byte order mark) to beginning of the file before saving as UTF-8. This can lead to issues when viewing the page in older browsers.
You will need to specify the character encoding and language when you write the markup code. When you save the document, select File > Save as
and select the proper encoding from the Save as type
dropdown menu.
Note that WordPad does not allow you to save as UTF-8, only as UTF-16 LE.
Character encoding for a document can be set here: View > Character Coding menu
. A file can be saved using a different
character encoding here: File > Save As Charset
.
Encoding can be set in command mode with the command :set encoding=utf-8
. “utf-8” can be replaced by any character
encoding supported.
When saving the file, go to File > Save as
. Amaya will make sure that the encoding is correct in the XML declaration
(for XHTML) and the meta
statement. Amaya also uses the appropriate encoding (charset
) in the HTTP headers when it saves a
document remotely using PUT
. Amaya also understands several other encodings when loading a document, but is not able to save in any of
these.
XPP can receive Unicode files. Character encoding options exist within preprocessing and postprocessing controls. Specific character encoding for XHTML/HTML output is usually performed by XyChange or the HTML Toolkit.
Another key element in the markup is the language indicator. Many of the applications listed here combine the encoding and language in
the user-selectable options. If the language is not included by the application, it is good practice to also include that in the markup manually.
Also, some applications may acquire the regional settings of your operating system to create a locale
tag.
Keep in mind that the end user can select both the encoding to use, as well as the font to use for each encoding. Another option that is
selectable by the user is the option to “Always send URLs as UTF-8.” In Microsoft Internet Explorer, for example, this can be found here: Tools
> Internet Options > Advanced tab > Browsing category
. If your site requires options that might not be standard, it may be proactive
to include viewing requirements for a site, which direct the user to the encoding and font settings to properly view the site in the intended
manner.
When content is ready to be published, it is always good practice to validate your content using the W3 validation tool.
Getting started? Introducing Character Sets and Encodings
Related links, Authoring HTML & CSS