Contents | PDF Glossary | References | PDF Checklist | Glossary Home
This document is a Draft.
This document has been produced as part of the W3C Web Accessibility Initiative. The goal of the Web Content Guidelines Working Group is discussed in the Working Group charter.
Last Updated: $Date: 2001/07/22 03:24:20 $ by: Katie Haritos-Shea
Please send comments on this document to w3c-wai-gl@w3.org.
This paper is just one in a series of techniques documents designed for authoring Accessible Web Content. For information about the other documents in the techniques series, please refer to the "Techniques for Web Content Accessibility Guidelines 1.0" WCAG 1.0 document.
In this PDF Techniques document we describe how to create accessible Adobe Portable Document Format (PDF) content (refer to PDF Reference Second Edition, Version 1.3). This document is also intended to demonstrate:
Because PDF is a Page Description Language, it is usually not intended to be edited directly by authors. Therefore these techniques are intended particularly for the developers of authoring tools that generate PDF as an output format. Developers should also see the Authoring Tools Accessibility Guidelines document that is nearby.
However, since authors routinely deliver PDF documents as web content, it is important that they too understand what constitutes an accessible PDF file. Webmasters and other web authors should also see the Web Content Accessibility Guideline document. The vehicle for acheiving accessible PDF documents can be addressed best (at this time), through the use of Tagged PDF.
Tagged PDF is a stylized use of PDF that allows reliable recovery of text, graphics, and images in PDF documents, with no ambiguity about the contents or the ordering of the contents. A Tagged PDF file is page oriented. For each page of a Tagged PDF document, the reading order for that page contains the text, graphics, and images in reading order, as determined by the authoring application. A Tagged PDF is a Logical Structured PDF. Logical Structure is used to carry information necessary to support tagging for access and content extraction, as well as styling properties needed for access, reflow so(??), and content extraction. It also provides the identification of the article flows in the cross-page environment for access and content extraction.
A short paragraph here making pdf relevant to people and their different devices. Examples of these devices could be Text-to-Speech for Voice Activation (phones.....reading aloud a pdf file online?) and for the Reflowing of Text to/for small PDA's. William, in support of device independence, would you be willing to create this for us??
To promote continuity across WAI documents and to assist in understanding accessibility principles, we have chosen to place each PDF checkpoint under the most appropriate one of the four basic WCAG Guidelines. KHS
For each technique, we identify the version of PDF in which the language support is first available. Where no version is specified, the technique can be applied in all versions of PDF.
How This Draft is OrganisedNEW 06-03-2001
WCAG Guidelines NEW 07-21-2001
PDF Glossary NEW 06-03-2001
PDF References NEW 07-21-2001
PDF Checkpoints NEW 06-03-2001
Now is the winter
of our discontent.
Even if the rectangle color matches the background color, when the user changes the background color, the rectangle will not change and may cause contrast problems with the new foreground color.
Images are not affected by the background and foreground color settings, so when text is placed on top of images........@ ?? @
Does the document avoid using color-coding as the only means of
conveying information, indicating an action, prompting a response, or
distinguishing a visual element?
@@New checkpoint: Use WCAG
wording here. KHS and LGR@@
Character Codes NEW 00-12-14 (a la Loretta)A show string is the encoded representation of a sequence of non-negative integers. Each of those integers is a Character Code. The interpretation of a show string depends on the associated font: some fonts imply a one-byte representation whie others imply a more complicated representation. A mapping from a set of integers to a set of characters. This mapping is generally 1:1 (i.e., bijective), for example, the code position 65 in ASCII maps only to "A", and it's the only position that maps to "A". There are several standard coded character sets, the most widely used is ASCII, generally in its Latin-1 dialect (the ASCII coded character set, encoded directly as single-byte values), or UTF-8 (the Unicode coded character set, encoded with an 8-bit transformation method), with Unicode becoming slowly more common; while EBCDIC and Baudot are extinct except in legacy systems. A coded character set may include letters, digits, punctuation, control codes, various mathematical and typographic symbols, and other characters. Each character in the set is represented by a unique character code (or "code position"). Column headers NEW 00-12-14 @@ CMap NEW 01-01-08 A CMap specifies the mapping from character codes to character selectors (CIDs, character names, or character codes) in one or more associated fonts or CIDFonts. It serves a function analogous to the Encoding dictionary for a simple font. A Cmap also specifies the writing mode - horizontal or vertical - for any CIDFont with which the CMap is combined. Also a CMap (character map) file specifies the correspondence between character codes and the CID (character identifier) numbers used to identify characters. For composite (Type 0) fonts, it is the equivalent to the concept of an encoding in a simple font. A CMap can describe a mapping from multiple-byte codes to thousands of characters in a large CID-keyed font. Concatenate NEW 00-12-14 To combine character strings, to join together two or more files or lists to form one big one. Example: The Unix cat command can be used to concatenate files. Crop box NEW 01-01-08 The crop box defines the region to which the contents of the page are to be clipped (cropped) when displayed or printed. Data tables NEW 00-12-14 @@ Expansion NEW 00-12-14 @@ Form fields NEW 01-01-08 @@ Glyph NEW 00-12-14 An image used in the visual representation of characters; roughly speaking, how a character looks. A font is a set of glyphs. In the simple case, for a given font (typeface and size), each character corresponds to a single glyph but this is not always the case, especially in a language with a large alphabet where one character may correspond to several glyphs or several characters to one glyph (a character encoding). A glyph can be an alphabetic or numeric font or some other symbol that pictures an encoded character. The following quote is from a document written as background for the Unicode character set standard. An ideal characterization of characters and glyphs and their relationship may be stated as follows: A character conveys distinctions in meaning or sounds. A character has no intrinsic appearance. A glyph conveys distinctions in form. A glyph has no intrinsic meaning. One or more characters may be depicted by one or more glyph representations (instances of an abstract glyph) in a possibly context dependent fashion. Glyph is from a Greek word for "carving." Indexing value NEW 00-12-14 @@ Line-break hyphen NEW 00-12-14 Hyphens that you add explicitly by entering the dash character are called line-break or hard hyphens. A hyphen that is always set; for example, the hyphen in "cost-effective." A soft hyphen, by contrast, will only be set when a word that is not normally hyphenated falls at the end of a line, and must be broken for proper type spacing. Word processors use two basic techniques to perform hyphenation. The first employs an internal dictionary of words that indicates where hyphens may be inserted. The second uses a set of logical formulas to make hyphenation decisions. The dictionary method is more accurate but is usually slower. The most sophisticated programs use a combination of both methods. Most word processors allow you to override their own hyphenation rules and define yourself where a word should be divided. Link text NEW 00-12-14 @@ MacRomanEncoding, MacExpertEncoding, or WinAnsi Encoding NEW 01-01-08 The regular font encodings used for Latin-text fonts on mac OS and Windows systems are named MacRomanEncoding and WinAnsiEncoding, respectively. Additionally, an encoding named MacExpertEncoding is used with "expert" fonts that contain addiitonal characters useful for sophisticated typography. Complete details of these encodings and the characters present in typical fonts are found in Appendix D of the PDF Version 1.3 Reference Manual. Map, mapped NEW 00-12-14 @@. Markup NEW 00-12-14 @@ The rendered text content of a link. Objects NEW 00-12-14 An object is an identifiable, encapsulated entity that provides one or more services requested by a client. Objects can refer to the objects in OOP (object-oriented programming) or the objects in OLE (Object Linking and Embedding). In object-oriented programming, objects are the things you think about first in designing a program and they are also the units of code that are eventually derived from the process. In between, each object is made into a generic class of object and even more generic classes are defined so that objects can share models and reuse the class definitions in their code. Each object is an instance of a particular class or subclass with the class's own method or procedures and data variable. An object is what actually runs in the computer. An object can be a spell checker or a piece of a graphics program used to draw squares or circles. Do you remember the crazy story people used to try to tell about a word processer where you could pick all of your favorite pieces (favorite spell checker, grammar checker, text editor, font manager, etc.) and piece them together to form the ultimate customizable word processer? Well, those pieces are objects. In OLE, an object is a piece of a document, a graphic, or some multimedia. In general multimedia terms, an object is a stored data element, such as a video clip, an audio file, or a graphic representation of an object. Page-content stream NEW 01-01-08 A page's content stream contains operands and operators used to place "paint" on a page in selected areas. By executing the actions described in the page content stream, an application builds up the image of the page described by the stream. Pagination NEW 00-12-14 @@ ReverseChars NEW 00-12-14 Font characteristics may suggest that right-to-left text be typeset left-to-right. The ReverseChars marked content indicates that the show strings within the marked content are individually reversed in reading order. Running headers NEW 00-12-14 @@. Showstring NEW 00-12-14 (a la Loretta) The strings that are the arguments to the PDF and Postscript text-showing operators that show text on a page. The show string is interpreted as a sequence of character codes identifying the glyphs to be painted. Soft hyphen NEW 00-12-14 (a la Loretta) A character that is used to mark conditional hyphenation points. Unicode and ISO_Latin-1 code-point 0xAD. A hyphen that will only be set if the word falls at the end of a line which is too long, and has to be broken. Hyphens inserted automatically by a hyphenation utility are called discretionary or soft hyphens. Word processors use two basic techniques to perform hyphenation. The first employs an internal dictionary of words that indicates where hyphens may be inserted. The second uses a set of logical formulas to make hyphenation decisions. The dictionary method is more accurate but is usually slower. The most sophisticated programs use a combination of both methods. Most word processors allow you to override their own hyphenation rules and define yourself where a word should be divided. Trailing space character NEW 01-01-08 A white space character inserted into the text for a page after the last word on a line. A trailing space character is not needed to produce the correct page image, but is important for determining word breaks in the text of the page. Type 0 font, Type 1 font NEW 01-01-08 Type 0 font: a composite font, that is, a font composed of other fonts, organized hierarchically. Type 1 font: a font represented using the Adobe Type 1 Font Format. A Type 1 font program is a stylized PostScript program that describes glyph shapes. Typographic styleNEW 00-12-14 @@. Unicode NEW 00-12-14 A character coding scheme that uses 16 bits for each character, designed to extend the capabilities of ASCII, which uses seven bits. Nearly all letters and symbols in all languages can be represented in a standard way with Unicode. The first 128 characters of Unicode are identical to those in standard ASCII. Unicode is an entirely new idea in setting up binary codes for text or script characters. Officially called the Unicode Worldwide Character Standard, it is a system for "the interchange, processing, and display of the written texts of the diverse languages of the modern world." It also supports many classical and historical texts in a number of languages. Currently, the Unicode standard contains 57709 distinct coded characters derived from 24 supported language scripts. These characters cover the principal written languages of the world. Originally Unicode was designed to be universal, unique, and uniform, i.e., the code was to cover all major modern written languages (universal), each character was to have exactly one encoding (unique), and each character was to be represented by a fixed width in bits (uniform). Parallel to the development of Unicode an ISO/IEC standard was being worked on that put a large emphasis on being compatible with existing character codes such as ASCII or ISO Latin 1. To avoid having two competing 16-bit standards, in 1992 the two teams compromised to define a common character code standard, known both as Unicode and BMP. Since the merger the character codes are the same but the two standards are not identical. The ISO/IEC standard covers only coding while Unicode includes additional specifications that help implementation. Unicode is not a glyph encoding. The same character can be displayed as a variety of glyphs, depending not only on the font and style, but also on the adjacent characters. A sequence of characters can be displayed as a single glyph or a character can be displayed as a sequence of glyphs. Which will be the case, is often font dependent. Unicode value NEW 00-12-14 (a la Loretta)Unicode value or code point: The Unicode Consortium defined a set of sixteen-bit code points, 57709 of which are currently assigned and named Unicode Characters. The lowest 65536 code-points in ISO 10646-1 1993 are idential to the Unicode Standard and are sometimes called the Basic Multilingual Plane. See http://www.unicode.org User name (/TU key) NEW 01-01-08 Any interactive form field may contain the optional /TU entry in its dictionary. This entry, known as the user name or short description, is used to identify this field when generating an error message or naming the field to a screen reader. Word breaks NEW 01-01-08 Applications divide the text of a page into words; word breaks are the points in the text stream that separate adjoining words. Different applications may use different rules for defining words; for example, one application may consider everything between white space characters to be a word. Another application may not include leading or trailing punctuation as part of a word. -------------------------------------------------------------------------------- --------------------------------------------------------------------------------
Last Updated: $Date: 2001/07/22 03:24:20 $ by: Katie Haritos-Shea For corrections and updates, please contact Katie Haritos-Shea @ Home or Katie Haritos-Shea @ Work, Paradigm Solutions Corporation, and, National Technical Information Service (NTIS), United States Department of Commerce.