Copyright © 2019 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This specification defines a general manifest format for expressing information about a digital publication. It uses [schema.org] metadata augmented to include various structural properties about publications, serialized in [JSON-LD], to enable interoperability between publishing formats while accommodating variances in the information that needs to be expressed.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Publishing Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-publ-wg@w3.org (archives).
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 March 2019 W3C Process Document.
This specification defines a general manifest format to describe publications. It does not attempt to constrain the nature of the publications that use the manifest. Rather, it is designed to be adaptable to the needs of specific areas of publishing, such as audiobook production, by specifying a modular approach for creating specializations.
This specification is also intended to facilitate different user agent architectures. While it is expected that traditional Web user agents (browsers) will be able to consume a publication manifest, this should not limit the capabilities of any other possible type of user agent (e.g., applications, whether standalone or running within a user agent, or even publications that include their own user interface).
This specification does not define how user agents are expected to render publications that use the manifest format.
This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [publishing-linking], including, in particular, user, user agent, browser, and address.
The term digital publication is used to refer to a publication authored in a format that uses the manifest format. These formats can differ in their structural and content requirements.
A manifest represents structured information about a publication, such as informative metadata, a list of resources, and a default reading order.
For the purposes of this specification, non-empty is used to refer to an element, attribute or property whose text content or value consists of one or more characters after whitespace normalization, where whitespace normalization rules are defined per the host format.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, MUST NOT, OPTIONAL, RECOMMENDED, REQUIRED, SHOULD, and SHOULD NOT in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section is non-normative.
A digital publication is described by its manifest, which provides a set of properties expressed using the JSON-LD [json-ld] format (a variant of JSON [ecma-404] for linked data).
The manifest includes both descriptive properties about the publication, such as its title and author, as well as information about the nature and structure of the publication.
This section describes the construction requirements for manifests, and outlines the general set of properties for use with them.
All implementations of the a digital publication manifest MUST set the following:
Not all of these properties have to be serialized in the authored manifest. Refer to each property's definition to determine if and how it is compiled into the canonical manifest from other information.
The priority of all other properties and resource relations is OPTIONAL, but MAY be modified by implementations of the manifest format.
Although a digital publication's manifest is authored as [json-ld], a user agent processes this information into an internal data structure, which can be in any language, in order to utilize the properties. The exact manner in which this processing occurs, and how the data is used internally, is user agent-dependent and not defined in this specification.
To simplify the understanding of the manifest format for developers, this specification defines an
abstract representation of the data structures employed by the manifest using the Web Interface
Definition Language (Web IDL) [webidl-1] — the
PublicationManifest
dictionary.
This definition expresses the expected names, datatypes, and possible restrictions for each member of the manifest. Unlike a typical Web IDL definition, however, user agents are not expected to expose the information in the manifest as an API. The Web IDL language is chosen solely to provide an abstraction of the data model.
It is not necessary to understand the Web IDL definition in order to create digital publications. Authoring requirements are defined in the following sections.
PublicationManifest
DictionarydictionaryPublicationManifest
{ required sequence<DOMString>type
; sequence<DOMString>id
; sequence<DOMString>accessMode
; sequence<DOMString>accessModeSufficient
; sequence<DOMString>accessibilityFeature
; sequence<DOMString>accessibilityHazard
;LocalizableString
accessibilitySummary
; sequence<CreatorInfo
>artist
; sequence<CreatorInfo
>author
; sequence<CreatorInfo
>colorist
; sequence<CreatorInfo
>contributor
; sequence<CreatorInfo
>creator
; sequence<CreatorInfo
>editor
; sequence<CreatorInfo
>illustrator
; sequence<CreatorInfo
>inker
; sequence<CreatorInfo
>letterer
; sequence<CreatorInfo
>penciler
; sequence<CreatorInfo
>publisher
; sequence<CreatorInfo
>readBy
; sequence<CreatorInfo
>translator
; sequence<DOMString>url
; DOMStringduration
; DOMStringinLanguage
;TextDirection
inDirection
; DOMStringdateModified
; DOMStringdatePublished
;ProgressionDirection
readingProgression
= "ltr"; required sequence<LocalizableString
>name
; required sequence<LinkedResource
>readingOrder
; sequence<LinkedResource
>resources
= []; sequence<LinkedResource
>links
= []; }; dictionaryCreatorInfo
{ sequence<DOMString>type
; required sequence<LocalizableString
>name
; DOMStringid
; DOMStringurl
; }; enumTextDirection
{ "ltr
", "rtl
", "auto
" }; dictionaryLocalizableString
{ required DOMStringvalue
; DOMStringlanguage
; }; enumProgressionDirection
{ "ltr
", "rtl
" };
A digital publication's manifest starts by setting the JSON-LD context [json-ld]. The context has the following two major components:
https://schema.org
https://www.w3.org/ns/pub-context
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
…
}
The publication context document adds features to the properties defined in Schema.org (e.g., the requirement for the creator property to be order preserving).
As part of the continuous contacts with Schema.org the additional features defined in the publication context file could migrate to the core Schema.org vocabulary.
Although Schema.org is often referenced using the http
URI scheme, the vocabulary is being migrated to use the
secure https
scheme as its default. This specification requires the use of
https
when referencing Schema.org in the manifest.
A digital
publication's
manifest defines its Publication Type using the type
term [json-ld]. The
type MAY be mapped onto CreativeWork
[schema.org].
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "CreativeWork",
…
}
Schema.org also includes a number of more specific subtypes of CreativeWork
, such as Article
, Book
, TechArticle
, and Course
.
These MAY be used instead of, or in addition to,
CreativeWork
.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
}
Each Schema.org type defines a set of properties that are valid for use with it. To ensure that the manifest can be validated and processed by Schema.org aware processors, the manifest SHOULD contain only the properties associated with the selected type.
If properties from more than one type are needed, the manifest MAY include multiple type declarations.
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : ["Book", "VisualArtwork"],
…
}
User agents SHOULD NOT fail to process manifests that are not valid to their declared Schema.org type(s).
Refer to the Schema.org site for the complete list of CreativeWork
subtypes.
This section is non-normative.
A digital publication's manifest is defined by a set of properties that describe the basic information a user agent requires to process and render the publication. These properties are categorized as followed:
Descriptive properties describe aspects of a digital publication, such as its title, creator, and language. These properties are primarily drawn from Schema.org and its hosted extensions [schema.org], so they map to one or several Schema.org properties and inherit their syntax and semantics. (The following property categories typically do not have Schema.org equivalents, so are defined specifically for publications.)
Resource categorization properties describe or identify common sets of resources, such as the resource list and default reading order. These properties refer to one or more resources, such as HTML documents, images, script files, and separate metadata files.
The categorization of properties exists only to simplify comprehension of their purpose; the groupings have no relevance outside this specification (i.e., the properties are not actually grouped together in the manifest).
Each manifest item drawn from schema.org identifies the property it maps to and includes its defining type in parentheses. Properties are often available in many types, however, as a result of the schema.org inheritance model. Refer to each property definition for more detailed information about where it is valid to use.
Schema.org additionally includes a large number of properties that, though relevant for publishing, are not mentioned in this specification — publication authors can use any of these properties. This document defines only the minimal set of manifest items.
There are discussion on whether a best practices document would be created, referring to more schema.org terms. If so, it should be linked from here.
This section describes the categories of values that can be used with properties of the Publication Manifest.
Some manifest properties expect a literal text string as their value — one that is not language-dependent, such as a code value or date. These values are expressed as [json] strings.
Literal values are not changed during canonicalization of the manifest, unlike other values which might be, for example, converted to objects.
Some manifest properties expect a number as their value. These values are expressed as [json] numbers.
Various manifest properties are expected to be expressed as [json] objects. Although the use of objects is usually recommended, it is also acceptable to use string values that are interpreted as objects depending on the context. The exact mapping of text values to objects is part of the property or object definitions.
Some manifest properties expect a localizable text string as their value. These values are expressed either as:
value
property containing a the property's
text and a language
property that identifies the language of the
text.In the case of single string values, these represent a implied object whose
value
property is the string's text and whose language will be
determined from other information in the manifest.
A common case of implied objects in the Publication Manifest properties set is for creators. The entities responsible for the various aspects of creation are expressed as [schema.org] Person and/or Organization objects. To simplify authoring, however, a simple string value can be used for the entity's name. In this case, the entity is assumed to represent a Person.
With the exception of the descriptive properties, manifest properties typically link to one or more resources. When a property requires a link value, the link MUST be expressed in one of the following two ways:
LinkedResource
object that can be used to express the URL, the media type, and other
characteristics of the target resource.In the case of single string values, these represent an implied
LinkedResource
object whose url
property is set to that
string value.
{
…
"resources" : [
"datatypes.svg",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv",
"name" : "Test Results",
"description" : "CSV file containing the full data set used."
},
{
"type" : "LinkedResource",
"url" : "terminology.html",
"encodingFormat" : "text/html",
"rel" : "glossary"
}
]
}
URLs are used to identify resources associated with a digital publication. They MUST be valid URL strings [url].
Manifest URLs are restricted to only the
http
and https
schemes [url]. URLs
MUST dereference to a resource, although user agents
are not required to dereference all URLs in
the manifest.
In the case of relative-URL strings, these are resolved to absolute-URL strings using a base URL [url].
The base URL for relative-URL strings is determined as follows:
By consequence, relative-URL strings in embedded manifests are resolved against the URL of
the document that references the manifest unless the document declares a base URL
(i.e., in a <base>
element in its header).
URLs allow for the usage of characters from Unicode following [rfc3987]. See the note in the HTML5 specification for further details.
Identifiers are URL records [url] that can be used to refer to Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.
Some manifest properties allow one or more value of their respective type (literal, object, or URL). As a general rule, these values can be expressed as [json] arrays. When the property value is an array with a single element, however, the array syntax MAY be omitted.
The accessibility properties provides information about the suitability of a digital publication for consumption by users with varying preferred reading modalities. These properties typically supplement an evaluation against established accessibility criteria, such as those provided in [WCAG20]. (For linking to a detailed accessibility report, see § 2.8.2.1 Accessibility Report.)
The following properties are categorized as accessibility properties:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
accessMode
|
The human sensory perceptual system or cognitive faculty through which a person may process or perceive information. | One or more text(s). | Array of Literals |
accessMode (CreativeWork) |
accessModeSufficient
|
A list of single or combined accessModes that are sufficient to understand all the intellectual content of a resource. | One or more ItemList. | Array of Literals |
accessModeSufficient (CreativeWork) |
accessibilityFeature
|
Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility. | One or more text(s). | Array of Literals |
accessibilityFeature (CreativeWork) |
b
accessibilityHazard
|
A characteristic of the described resource that is physiologically dangerous to some users. | One or more text(s). | Array of Literals |
accessibilityHazard (CreativeWork) |
accessibilitySummary
|
A human-readable summary of specific accessibility features or deficiencies, consistent with the other accessibility metadata but expressing subtleties such as “short descriptions are present but long descriptions will be needed for non-visual users” or “short descriptions are present and no long descriptions are needed.” | Text. | Localizable String |
accessibilitySummary (CreativeWork) |
Detailed descriptions of these properties, including the expected values to use with them, are available at [webschemas-a11y].
The author can also provide a reference to a detailed Accessibility Report if more information is needed than can be expressed by these properties.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "CreativeWork",
…
"accessMode" : ["textual", "visual"],
"accessModeSufficient" : [
{
"type" : "ItemList",
"itemListElement": ["textual", "visual"]
},
{
"type" : "ItemList",
"itemListElement": ["textual"]
}
],
…
}
A digital
publication's
address is a URL that identifies its source location. It
is expressed using the url
property.
Term | Description | Required Value | Value Type | [schema.org] Mapping |
---|---|---|---|---|
url
|
URL of the publication. | A valid URL string [url]. | Array of URLs |
url (Thing) |
A digital publication MAY have more than one address, but all the addresses MUST resolve to the same document.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
…
}
A digital
publication's
canonical identifier property
provides a unique identifier for the publication. It is expressed using the id
property.
Term | Description | Required Value | Value Type | [schema.org] Mapping |
---|---|---|---|---|
id
|
Preferred version of the publication. | A URL record [url]. | Identifier | (None) |
Ensuring uniqueness of canonical identifiers is outside the scope of this specification. The actual achievable uniqueness depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.
If a canonical identifier is not provided in the manifest, or the value is an invalid URL, the digital publication does not have a canonical identifier. User agents MUST NOT attempt to construct a canonical identifier from any other identifiers provided in the manifest for the canonical manifest.
The specification of the canonical identifier MAY be
complemented by the inclusion of additional types of identifiers using the identifier
property [schema.org] and/or its subtypes.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"id" : "urn:isbn:9780123456789",
"url" : "https://publisher.example.org/mobydick",
…
}
A creator is an individual or entity responsible for the creation of the digital publication.
The following properties are categorized as creators:
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
artist
|
The primary artist for the publication, in a medium other than pencils or digital line art. | One or more Person . |
Array of Entities |
artist (VisualArtwork) |
author
|
The author of the publication. | One or more Person and/or
Organization . |
Array of Entities |
author (CreativeWork) |
colorist
|
The individual who adds color to inked drawings. | One or more Person . |
Array of Entities |
colorist (VisualArtwork) |
contributor
|
Contributor whose role does not fit to one of the other roles in this table. | One or more Person and/or
Organization . |
Array of Entities |
contributor (CreativeWork) |
creator
|
The creator of the publication. | One or more Person and/or
Organization . |
Array of Entities |
creator (CreativeWork) |
editor
|
The editor of the publication. | One or more Person . |
Array of Entities |
editor (CreativeWork) |
illustrator
|
The illustrator of the publication. | One or more Person . |
Array of Entities |
illustrator (Book) |
inker
|
The individual who traces over the pencil drawings in ink. | One or more Person . |
Array of Entities |
inker (VisualArtwork) |
letterer
|
The individual who adds lettering, including speech balloons and sound effects, to artwork. | One or more Person . |
Array of Entities |
letterer (VisualArtwork) |
penciler
|
The individual who draws the primary narrative artwork. | One or more Person . |
Array of Entities |
penciler (VisualArtwork) |
publisher
|
The publisher of the publication. | One or more Person and/or
Organization . |
Array of Entities |
publisher (CreativeWork) |
readBy
|
A person who reads (performs) the publication (for audiobooks). | One or more Person . |
Array of Entities |
readBy (Audiobook) |
translator
|
The translator of the publication. | One or more Person and/or
Organization . |
Array of Entities |
translator (CreativeWork) |
Creators are represented in one of the following two ways:
Person
[schema.org]; orPerson
or Organization
[schema.org].In other words, a single string value is a shorthand for a [schema.org] Person
whose name
property is set to
that string value. (See also § 2.7.2.3.2 Entities.)
When compiling each set of creator information
from a [schema.org] Person
or Organization
type, user agents MUST retain the following information when available:
type
name
id
url
Note that user agents MAY interpret a wider range of creator properties defined by Schema.org than the ones in the preceding list.
The manifest MAY include more than one of each type of creator.
{
"type" : "Book",
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
…
"url" : "https://publisher.example.org/mobydick",
"author" : {
"type" : "Person",
"name" : "Herman Melville"
}
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"author" : [
"Jeni Tennison",
{
"type" : "Person",
"name" : "Gregg Kellogg",
},{
"type" : "Person",
"name" : "Ivan Herman",
"id" : "https://www.w3.org/People/Ivan/"
}
],
"editor" : [
"Jeni Tennison",
{
"type" : "Person",
"name" : "Gregg Kellogg",
}
],
"publisher" : {
"type" : "Organization",
"name" : "World Wide Web Consortium",
"id" : "https://www.w3.org/"
}
…
}
The global duration indicates the overall length of a time-based digital publication (e.g., an audiobook, a book consisting of a series of video clips, etc.). It is expressed as a "Duration" value as defined by [iso8601].
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
duration
|
Overall duration of a time-based publication. | Duration value as defined by [iso8601] | Literal |
duration (Property) |
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "Audiobook",
"id" : "https://example.org/flatland-a-romance-of-many-dimensions/",
"url" : "https://w3c.github.io/pub-manifest/experiments/audiobook/",
"name" : "Flatland: A Romance of Many Dimensions",
…
"duration" : "PT15153S",
…
}
The relevant Wikiepedia page gives a concise description of the ISO duration syntax.
A digital publication has at least one natural language, which is the language that the content is expressed in (e.g., English, French, Chinese). It also has a natural base direction in which it is written — the display direction, either left-to-right or right-to-left.
The digital publication manifest includes entries to set both these concepts, which can influence, for example, the behavior of a user agent (e.g., it might place a pop-up for a table of contents on the right hand side for publications whose natural base direction is right-to-left).
It is important to differentiate the language of the publication from the language and the base direction of the individual resources that compose it. If such resources are, for example, in HTML, the language and direction need to be set in those resources, too. The language and base direction of the publication are not inherited.
Similarly, each natural language property value in the manifest (e.g., title, creators) is a localizable string.
For more information about localized strings on the Web, refer to [string-meta].
The natural language and base direction can be set for both the publication and the natural language properties values of the manifest.
If a user agent requires the language and one is not available in the authored manifest (either globally or specifically for that property), or the obtained value is invalid, the user agent MAY attempt to determine the language when generating the canonical manifest. This specification does not mandate how such a language tag is created. The user agent might:
No default values are specified for the language or the default base direction.
The manifest MAY include global language and base direction declarations for the publication using the following properties.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
inLanguage
|
Default language for the publication as well as the textual manifest values | language code as defined in [bcp47] | Literal |
inLanguage (Property) |
inDirection
|
Default base direction for the publication as well as the textual manifest values | ltr , rtl , or auto |
Literal | (None) |
The natural language MUST be a tag that conforms
to [bcp47], while the base language
direction
MUST have one of the following values:
ltr
: indicates that the textual values are explicitly
directionally set to left-to-right text;rtl
: indicates that the textual values are explicitly
directionally set to right-to-left text;auto
indicates that the textual values are explicitly
directionally set to the direction of the first character with a strong
directionality, following the rules of the Unicode Bidirectional Algorithm [bidi].When specified, these properties are also used as defaults for textual values in the manifest.
The global language information MAY be overridden by individual values.
If authors intend to use a manifest, or a manifest template, both as
embedded manifest and as a separate resource, they are strongly encouraged to set
these properties explicitly to avoid interference of the containing
script
element in case of embedding.
It is possible to set the language for any textual value in the manifest. This
information MUST be set as a localizable
string
, i.e., using the value
and language
terms (instead of a simple
string) [json-ld]:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"author" : {
"type" : "Person",
"name" : {
"value" : "Marcel Proust",
"language" : "fr"
}
}
}
The value of the language
MUST be set to a language code as defined in
[bcp47].
When used in a context of localizable texts, a simple string value is a shorthand for a
localizable string
, with the value
set
to the string value, and the language set to the value of the inLanguage
property, if
applicable, and unset otherwise. In other words, the previous example is equivalent
to:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
"inLanguage" : "fr",
…
"author" : "Marcel Proust",
…
}
(See also § 2.7.2.3 Explicit and Implied Objects.)
It is not possible to set the direction explicitly for a value.
Setting the direction for a natural text value is currently not possible in JSON-LD [json-ld]. In case the JSON-LD community, as well as the schema.org community, introduces such a feature, future versions of this specification may extend the ability of manifests to include this.
In order to correctly handle manifests entries containing right-to-left or bidirectional text, user agents SHOULD identify the base direction of any given natural language value by scanning the text for the first strong directional character.
In situations where the first-strong heuristics will produce the wrong result (e.g., a string in the Arabic or Hebrew script that begins with a Latin acronym), content developers may want to prepend a Unicode formatting character to the string. This would then produce the necessary base direction when the heuristics are applied. They should use one of the following formatting characters: U+200E LEFT-TO-RIGHT MARK, or U+200F RIGHT-TO-LEFT MARK. (See § D. Examples for bidirectional texts.)
Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm [bidi]. This could require wrapping additional markup or Unicode formatting characters around the string prior to display, in order to apply the base direction.
Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm[bidi]. This could require wrapping additional markup or control characters around the string prior to display, in order to apply the base direction. (See § D. Examples for bidirectional texts.
The last modification date is
the date when the digital publication was last updated (i.e., whenever changes
were last made to any of the resources of the publication, including the manifest). It is
expressed using the dateModified
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
dateModified
|
Last modification date of the publication. | A Date or DateTime
value [schema.org], both
expressed in ISO 8601 Date, or Date Time formats, respectively [iso8601]. |
Literal |
dateModified (CreativeWork) |
The last modification date does not necessarily reflect all changes to the publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"dateModified" : "2015-12-17",
…
}
The publication date is the date on
which the digital publication was originally published. It represents a static event in the
lifecycle of a publication and allows subsequent revisions to be identified and compared. It
is expressed using the datePublished
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
datePublished
|
Creation date of the publication. | A Date or DateTime , both expressed
in ISO 8601 Date, or Date Time formats, respectively [iso8601]. |
Literal |
datePublished (CreativeWork) |
The exact moment of publication is intentionally left open to interpretation: it could be when the publication is first made available online or could be a point in time before publication when the publication is considered final.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"datePublished" : "2015-12-17",
"dateModified" : "2016-01-30",
…
}
The reading progression
establishes the reading direction from one resource to the next within a digital
publication. It is expressed using the readingDirection
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
readingProgression
|
Reading direction from one resource to the other. | ltr or rtl |
Literal | (None) |
The value of this property MUST be either:
ltr
: left-to-right;rtl
: right-to-left.The default value is ltr
.
This property has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.
The reading progression of a publication is used to adapt such publication level interactions as menu position, swap direction, defining tap zones to lead the user to the next and previous pages, touch gestures, etc.
If the readingProgression
is not set, user agents MUST use the default value ltr
when generating the
canonical manifest.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"readingProgression" : "ltr"
}
The title provides the human-readable name of the digital publication. It is expressed using
the name
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
name
|
Human-readable title of the publication. | One or more text items for the title. | Array of Localizable Strings |
name (Thing) |
If a title is not included in the authored manifest, and a digital publication does not define alternative rules for obtaining one, the user agent MUST create one. This specification does not specify what heuristics to use to generate such a title.
A user agent is not expected to produce a meaningful title [wcag20] for a publication when one is not specified.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick"
}
Publication resources are specified via the default reading order, the resource list, and
the links
, as
defined in this section. These lists contain references to informative resources like the privacy policy, and structural resources like the table
of contents.
Note that a particular resource's URL MUST NOT appear in more than one of these lists, and a URL MUST NOT be repeated within a list.
The manifest MUST NOT include a reference to itself within any of these lists.
The default reading order is a specific progression through a set of digital publication resources. A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.
The default reading order is expressed using the readingOrder
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
readingOrder
|
One or more of:
The order of items is significant. The URLs
MUST NOT include fragment
identifiers. Non-HTML resources SHOULD be expressed as |
Array of Links | (None) |
The default reading order MUST include at least one resource.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"readingOrder" : [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
…
]
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"readingOrder" : [{
"type" : "LinkedResource",
"url" : "html/title.html",
"encodingFormat" : "text/html",
"name" : "Title page"
},{
"type" : "LinkedResource",
"url" : "html/copyright.html",
"encodingFormat" : "text/html",
"name" : "Copyright page"
},{
…
}]
}
The resource list enumerates any
additional resources used in the processing and rendering of a digital
publication that are not already listed in the default reading order. It is expressed
using the resources
property.
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
resources
|
One or more of:
The order of items is not significant. The URLs
MUST NOT include fragment
identifiers. It is RECOMMENDED
to use |
Array of Links | (None) |
The completeness of the resource list can affect the usability of a digital publication in certain reading scenarios (e.g., the ability to read it offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the publication's constituent resources beyond those listed in the default reading order.
In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a publication even if some of these resources are not identified as belonging to the publication (e.g., if it is taken offline without them).
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},{
"type" : "LinkedResource",
"url" : "test-utf8-bom.csv",
"encodingFormat" : "text/csv"
},{
…
}
],
…
}
The links
property provides a list
of resources that are not required for the processing and rendering of a digital
publication (i.e., the content of the publication remains unaffected even if these
resources are not available).
Term | Description | Required Value | Value Category | [schema.org] Mapping |
---|---|---|---|---|
links
|
One or more of:
The order of items is not significant. It is RECOMMENDED to use |
Array of Links | (None) |
Linked resources are typically made available to user agents to augment or enhance the processing or rendering, such as:
Links can also be used to identify resources used in the online rendering of a publication, but that are not essential to include when the publication is taken offline or packaged (e.g., to minimize the size). These include:
The links
list SHOULD include resources
necessary to render a linked resource (e.g., scripts, images, style sheets).
Resources listed in the links
list MUST
NOT be listed in the default reading order or
resource list.
User agents MAY ignore linked resources, and are not required to take them offline with a publication. These resources SHOULD NOT be included when packaging a publication.
The manifest is designed to provide a basic set of properties for use by user agents in presenting and rendering a digital publication, but MAY be extended in the following ways:
Although both methods are valid, the use of linked records is RECOMMENDED.
This specification does not define how such additional properties are compiled, stored or exposed by user agents in their internal representation of the manifest. A user agent MAY ignore some or all extended properties.
Extending the manifest through links to a record, such as an ONIX [onix] or BibTeX [bibtex] file,
MUST be expressed using a LinkedResource
object, where:
rel
value of the LinkedResource
SHOULD include a relevant identifier defined by
IANA or by other organizations; if the link record contains descriptive metadata it MUST include the describedby
(IANA)
identifier; encodingFormat
in the link MUST use the MIME media type [rfc2046] defined for that particular type of
record, if applicable.Linked records MUST be included in the resource list when they are part of the publication (i.e., are needed for more than just manifest extensibility). Otherwise, they MUST be included in the links list.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"type" : "LinkedResource",
"url" : "https://www.publisher.example.org/mobydick-onix.xml",
"encodingFormat" : "application/onix+xml",
"rel" : "describedby"
},{
…
}],
…
}
The application/onix+xml
MIME type has not yet been registered by
IANA at the time of writing this document, and is included in the example for
illustrative purposes only.
Additional properties can be included directly in the manifest. It is RECOMMENDED that these properties be taken from public schemes like [schema.org] or [dcterms] and use values from controlled vocabularies whenever possible. Proprietary terms MAY be used, but it is RECOMMENDED that such terms be included using Compact IRIs [json-ld], with prefixes defined as part of the context.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "CreativeWork",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"dc:subject" : ["Web data description languages","Data integration","Data Exchange"],
…
}
A prefix definition dc
for [dcterms] is included in the context file of [schema.org]. This means that it is not necessary to add the prefix
explicitly. The same is true for a number of other public vocabularies; see the schema.org context file for
further details.
This section is non-normative.
The manifest identifies key
resources of a digital publication through the use of link relations. These relations are applied to
the rel
property of LinkedResource
objects (e.g., the links found in
the table of contents and resource
list).
The types of resources these relations identify are categorized as follows:
Informative resources are resources that contain additional information about the publication, such as its privacy policy, accessibility report, or preview.
Structural resources are key meta structures of the publication, such as the cover image, table of contents, and page list.
An accessibility report provides information about the suitability of a digital publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [WCAG21], and are an important source of information in determining the usability of a publication.
An accessibility report is identified using the accessibility-report
link
relation.
The accessibility-report
term is not currently registered in the
IANA link relations but the Working Group expects to add it.
The manifest SHOULD include a link to an accessibility report when one is available for a publication. It is RECOMMENDED that the report be included as a resource of the publication.
It is also RECOMMENDED that the accessibility report be provided in a human-readable format, such as [html]. Augmenting these reports with machine-processable metadata, such as provided in Schema.org [schema.org], is also RECOMMENDED.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"type" : "LinkedResource",
"url" : "https://www.publisher.example.org/mobydick-accessibility.html",
"rel" : "accessibility-report"
},{
…
}],
…
}
Not all digital publications will be available to all users (e.g., they might be restricted to registered users of a site). In such cases, the publisher might wish to provide a preview of the content in order to entice users to access the full version.
A preview is
identified using the preview
link relation [iana-link-relations].
Previews MAY be located externally or included as resources of digital publications.
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"type" : "LinkedResource",
"url" : "preview.mp3",
"encodingFormat" : "audio/mpeg",
"rel" : "preview"
},{
…
}],
…
}
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"links" : [{
"type" : "LinkedResource",
"url" : "https://publisher.example.org/mobydickpreview.html",
"encodingFormat" : "text/html",
"rel" : "preview"
},{
…
}],
…
}
Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses such privacy concerns is consequently an important part of publishing digital publications. Even if no information is collected, such a declaration increases the trust users have in the content.
A link to a privacy policy can be included in the manifest for this purposes. It is RECOMMENDED that the privacy policy be included as a resource of the publication.
A privacy policy is identified using the privacy-policy
link
relation [iana-link-relations].
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
…
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
…
"links" : [{
"type" : "LinkedResource",
"url" : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
"encodingFormat" : "text/html",
"rel" : "privacy-policy"
},{
…
}],
…
}
The cover is a resource that user agents can use to present the digital publication (e.g., in a library or bookshelf, or when initially loading the publication).
The cover is identified by the cover
link relation. The URL expressed in the url
term MUST NOT include a fragment identifier.
The cover
term is not currently registered in the IANA link
relations but the Working Group expects to add it.
If the cover is in an image format, a title
and description
SHOULD be provided. User agents can use these
properties to provide alternative text and descriptions when necessary for
accessibility.
More than one cover MAY be referenced from the manifest (e.g., to provide alternative formats and sizes for different device screens). If multiple covers are specified, each instance MUST define at least one unique property to allow user agents to determine its usability (e.g., a different format, height, width or relation).
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/donquixote",
"name" : "Don Quixote",
"resources" : [{
"type" : "LinkedResource",
"url" : "cover.html",
"encodingFormat" : "text/html",
"rel" : "cover"
},{
…
}],
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"resources" : [{
"type" : "LinkedResource",
"url" : "whale-image.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "cover",
"name" : "Moby Dick attacking hunters",
"description" : "A white whale is seen surfacing from the water to attack a small whaling boat"
},{
…
}],
…
}
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/donquixote",
"name" : "Gulliver's Travels",
"resources" : [{
"type" : "LinkedResource",
"url" : "lilliput.jpg",
"encodingFormat" : "image/jpeg",
"rel" : "cover"
},{
"type" : "LinkedResource",
"url" : "lilliput.svg",
"encodingFormat" : "image/svg+xml",
"rel" : "cover"
},{
…
}],
…
}
The page list is a navigational aid that contains a list of static page demarcation points within a digital publication.
The page list is identified by the pagelist
link relation. The URL expressed in the url
term MUST NOT include a fragment identifier.
The pagelist
term is not currently registered in the IANA link
relations but the Working Group expects to add it.
The link to the page list MAY be specified in either the default reading order or resource-list, but MUST NOT be specified in both.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"resources" : [{
"type" : "LinkedResource",
"url" : "toc_file.html",
"rel" : "pagelist"
},{
…
}],
…
}
The table of contents is a navigational aid that provides links to the majort structural sections of a digital publication.
The table of
contents is identified by the contents
link relation [iana-link-relations]. The URL expressed in the url
term MUST NOT include a fragment identifier.
The link to the table of contents MAY be specified in either the default reading order or resource-list, but MUST NOT be specified in both.
The RECOMMENDED structure and processing model for the table of contents is defined in § B. Machine-Processable Table of Contents.
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"type" : "Book",
…
"url" : "https://publisher.example.org/mobydick",
"name" : "Moby Dick",
"resources" : [{
"type" : "LinkedResource",
"url" : "toc_file.html",
"rel" : "contents"
},{
…
}],
…
}
If additional relations beyond those defined in this specification need to be expressed, the rel
property can be extended in one of the
following ways:
Use of relations from [iana-relations] is RECOMMENDED.
If a digital publication format uses links to discover a manifest, the links MUST take one or both of the following forms:
An HTTP Link
header field [rfc5988] with its
rel
parameter set to the value "publication
".
Link: <https://example.com/webpub/manifest>; rel=publication
A link
element [html] with its
rel
attribute set to the value "publication
".
<link href="https://example.com/webpub/manifest" rel="publication"/>
When a manifest is embedded within an HTML document, the link MUST include a fragment identifier that references the
script
element that contains the manifest (see § 3.2 Embedding).
<link href="#example_manifest" rel="publication">
…
<script id="example_manifest" type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
…
}
</script>
The exact value of rel
is still to be agreed upon and should be registered
by IANA.
When a digital
publication format allows manifests to be embedded within an HTML document, the manifest MUST be included in a script
element [html] whose type
attribute is set to application/ld+json
[json-ld].
<script type="application/ld+json">
{
…
}
</script>
Digital publication formats MAY define alternative methods of discovering a manifest that do no involve linking to, or embedding, a manifest (e.g., that manifest could be discovered through the use of a restricted name and/or location). This specification does not add any restrictions on such methods.
This section is non-normative.
This section describes the steps a user agent follows to process an authored manifest into an internal representation of the data structure it contains.
The first step in this process is to obtain the manifest, the exact steps by which to do so are defined by each digital publication format.
The process then involves generating a canonical form of the manifest, which is a
representation that adds any missing data not explicitly authored (e.g., information could be
gleaned from a containing HTML document if the manifest is embedded inside a script
tag).
After a canonical manifest is generated, the data is put through a final set of post-processing steps to check its validity, ultimately resulting in a data structure that the user agent can use.
Within this process are various extension points that allow digital publication formats to enhance the basic requirements for their own specialized needs and audiences.
The steps for processing a manifest are given by the following algorithm. The algorithm, if successful, returns a processed manifest; otherwise, it terminates prematurely and returns nothing. In the case of nothing being returned, the user agent MUST ignore the manifest declaration.
The algorithm takes the following arguments:
Object
, terminate this algorithm. document
as input to the algorithm
described in § 4.3
Generating a Canonical Manifest. Check whether the canonical manifest fulfills the minimal requirements for a Publication Manifest, namely:
If any of these requirements are not met, terminate the algorithm.
The algorithm does not describes how error and warning messages should be reported. This is implementation dependent.
The steps to convert a Publication Manifest into a Canonical Manifest are given by the following algorithm. The algorithm takes the following arguments:
The steps of the algorithm are described below. The algorithm varies from strict JavaScript notation
in that P["term"] refers to the value in the object P for the label
"term", where P is either manifest or an object appearing
within manifest (e.g., a Person). The algorithm replaces or adds some terms to
manifest; the replacement terms are expressed in JSON syntax as {"term":"value"}
.
(§ 2.7.3.10 Title) if
manifest["name"] is undefined
, locate the title
element [html] using document (when set). If
that element exists and is non-empty, let t be its text content, and add to
manifest:
title
is explicitly set to the value of l, then add "name": [{"value": t, "language": l}]
"name": [t]
This step adds the content of the title
element of document when
the name
property is not specified in the manifest. For example:
<html>
<head>
<title>Moby Dick</title>
…
<script type="application/ld+json">
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
…
}
</script>
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : ["Moby Dick"],
…
}
(§ 2.7.4.1
Default Reading Order) if manifest["readingOrder"] is
undefined
, let u be the value of document.URL, and add
"readingOrder": [{"type": ["LinkedResource"], "url": u}]
to
the manifest
If the Digital Publication consists only of the referencing document, the default reading order can be omitted; it will consist, automatically, of that single resource.
(§ 2.7.2.6 Arrays)
for each value v of P["term"] that is a single string or an object,
and where term expects an array: change the relevant
term/value pair to
"term": [v]
A number of terms require their values to be arrays but, for the sake of convenience, authors are allowed to use a single value instead of a one element array. For example,
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : "Moby Dick",
"author" : "Herman Melville",
"resources" : [{
"type" : "LinkedResource",
"rel" : "cover",
"url" : "images/cover.jpg",
"encodingFormat" : "image/jpeg"
},
…
}],
…
}
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : ["Moby Dick"],
"author" : ["Herman Melville"],
"resources" : [{
"type" : ["LinkedResource"],
"rel" : ["cover"],
"url" : "images/cover.jpg",
"encodingFormat" : "image/jpeg"
},
…
}],
…
}
(§ 2.7.3.4 Creators)
for each value v in a manifest["term"] array that is a simple string
or a localizable string
, and where term expects an entity: exchange that element in the array to
{"type": ["Person"], "name": [v]}
An author, editor, etc., should be explicitly designed as an object of type
Person
but, for the sake of convenience, authors are allowed to just
give their name. For example,
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : ["Moby Dick"],
"author" : ["Herman Melville"],
…
}
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : ["Moby Dick"],
"author" : [{
"type" : ["Person"],
"name" : "Herman Melville"
}],
…
}
(§ 2.7.2.3.3 Links)
for each value v in a manifest["term"] array that is a simple string,
and where term is one of the resource categorization properties: exchange that element in the array to
{"type": ["LinkedResource"], "url": v}
Resource links should be explicitly designed as an object of type
LinkedResource
but, for the sake of convenience, authors are allowed to
just give their absolute or relative URL. For example,
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
…
"resources" : [
"css/mobydick.css",
…
],
…
}
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
…
"resources" : [{
"type" : ["LinkedResource"],
"url" : "css/mobydick.css"
},
…
],
…
}
(§ 2.7.2.3.1 Localizable Strings) for each value v of P["term"], or in P["term"] in the case the latter is an array, that is a simple string, and term expects a localizable string: change the relevant term/value to:
"term": {"value": v,"language": l}
"term": {"value": v}
Natural language text values should be explicitly designed as localizable string objects
but, for the sake of convenience, authors are allowed to just use a simple string. I.e.,
if no language information has been provided (via inLanguage
) in the
manifest then, for example,
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : ["Moby Dick"],
"author" : ["Herman Melville"],
…
}
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"name" : [{
"value" : "Moby Dick"
}],
"author" : [{
"value" : "Herman Melville"
}],
…
}
If an explicit language has also been provided in the manifest, that language is also added to the localizable string object. For example,
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"inLanguage" : "en",
"name" : ["Moby Dick"],
"author" : ["Herman Melville"],
…
}
yields:
{
"@context" : ["https://schema.org","https://www.w3.org/ns/pub-context"],
"inLanguage": "en",
"name" : [{
"value" : "Moby Dick",
"language" : "en"
}],
"author" : [{
"value" : "Herman Melville"
"language" : "en"
}],
…
}
(§ 2.7.2.4 URLs) for each value v of
P["term"] that is not an absolute URL string, and
where term expects a URL: resolve this value, considered
to be a relative URL, using the value of base and yielding the value of
au, and replace the term/value pair by
"term": au
All relative URLs in the Publication Manifest must be resolved against the base value to yield absolute URLs.
(§ 5.2 Compatibility Requirements, extension point) if a profile defines its own canonicalization steps for profile specific terms, those steps are executed at this point.
Return the (transformed) manifest
.
See the diagram in the appendix for a visual representation of the algorithm. Also, to help understanding the result of the algorithm, there is a link to the corresponding canonical manifests for all the examples in § C. Manifest Examples.
At the moment, step 11. of the manifest canonicalization means all relative URI-s are resolved at this
step using base
. This may be a problem in relation with the value of
base
in the case of packaged publications, see w3c/pwpub#45.
The steps for post-processing a canonical manifest are given by the following algorithm. The algorithm takes a json object representing a canonical manifest. The output from inputting a JSON object into this algorithm is a processed manifest. The goal of the algorithm is to ensure that the data represented in json abides to the minimal requirements on the data, removing, if applicable, non-conformant data.
As an abuse of notation, P["term"] refers to the value in the object P for the label "term", where P is either manifest, or an object appearing within manifest (e.g., a LinkedResource).
Let manifest be the result of converting
json to a PublicationManifest
dictionary.
Perform data cleanup operations on manifest, possibly removing data, as well as raising warnings.
CreativeWork
" and issue a warning.LinkedResource
, if the value of
P["length"] is set, check whether this value is a valid number. If the
check fails, issue a warning.Extension point: process any proprietary, profile specific, and/or other supported members at this point in the algorithm.
Return manifest.
This section is non-normative.
Publishing communities (e.g., audio books, scholarly publications) are encouraged to define extensions, also known as profiles, by extending the core manifest with module-specific terms and possibly adding new requirements.
In order for a digital publication format to be compatible with this specification, following conditions MUST be met:
Adding an example of a term added by, e.g., the audiobook profile would be a good idea, when available.
LinkedResource
DefinitionThis specification defines a new type for links called LinkedResource
. It consists of the
following properties:
Term | Description | Required Value | Value Type | [schema.org] Mapping |
---|---|---|---|---|
url
|
Location of the resource. REQUIRED. | A valid URL string [url]. Refer to the property definitions that accept this type for additional restrictions. | URL |
url
|
encodingFormat
|
Media type of the resource (e.g., text/html ). OPTIONAL. |
MIME Media Type [rfc2046]. | Literal |
encodingFormat
|
name
|
Name of the item. OPTIONAL. | One or more Text items. | Array of Localizable Strings |
name
|
description
|
Description of the item. OPTIONAL. | Text. | Localizable String |
description
|
rel
|
The relation of the resource to the publication. OPTIONAL. |
One or more relations. The values are either the relevant relation terms of the IANA link registry [iana-link-relations], or specially-defined URLs if no suitable link registry item exists. |
Array of Literals | (None) |
integrity
|
A cryptographic hashing of the resource that allows its integrity to be verified. OPTIONAL. | One or more whitespace-separated sets of integrity metadata [sri]. The value MUST conform to the metadata definition [sri]. Refer to [sri] for the list of cryptographic hashing functions that user agents are expected to support. |
Literal | (None) |
length
|
The total length of a time-based media resource in (possibly fractional) seconds. OPTIONAL | Number | Number | (None) |
Although user agent support for the integrity
property is OPTIONAL, user agents that support cryptographic hashing comparisons using
this property MUST do so in accordance with [sri].
dictionary LinkedResource
{
required DOMString url
;
DOMString encodingFormat
;
sequence<LocalizableString
> name
;
LocalizableString
description
;
sequence<DOMString> rel
;
DOMString integrity
;
double length
;
};
This section is non-normative.
To facilitate navigation within pages and across sites, HTML uses the nav
element
[html] to express lists of links. Although generic in nature by default, the
purpose of a nav
element can be more specifically identified by use of the role
attribute
[html]. In particular, the doc-toc
role from the [dpub-aria-1.0] vocabulary identifies the nav
element as the digital
publication's table of contents.
Including an identifiable table of contents is an accessible way to produce any digital publication, but due to the flexibility of HTML markup, it also presents challenges for user agents trying to extract a meaningful hierarchy of links (e.g., to provide a custom view available from any page). To avoid duplicating the tables of contents for different uses, this section defines a syntax that is both human friendly and commonly used while still providing enough structure for user agent extraction.
Authors have a choice of lists (ordered or unordered) to construct their table of contents. By
tagging each link within these lists in anchor tags (a
elements), user agents can easily differentiate the information they need from any
peripheral content (asides) or stylistic tagging that has also been added. The table of contents can
consist of both active links (with an href
attribute) and inactive links (excluding the
href
attribute), providing additional flexibility in how the table of contents is
constructed (e.g., to omit links to certain headings or only link to certain content in a
preview).
The table of contents is expressed via an [html] element (typically a nav
element).
This element MUST be identified by the role
attribute [html] value "doc-toc
" [dpub-aria-1.0], and MUST be the first element in the document in document tree order [dom]
with that role
value.
The manifest SHOULD identify the resource that contains the table of contents.
Although the content model of the nav
element is not restricted, user agents will only
be able to extract a usable table of contents when the following markup guidelines are followed:
Although a title for the table of contents is optional, to avoid having a user agent generate
a placeholder title when one is needed, it is advised to add one. Titles are specified using
any of the [html] h1
through h6
elements. Note that only the first such
element is recognized as the title. If a heading element is not found before the list of links, user agents will assume that one has not been
specified.
The first [html] ol
or ul
list element encountered in the nav
element is assumed
to contain the list that defines the links into the content. This list will be found even if
it is nested inside of div
elements, for example, as the algorithm ignores elements that are not relevant to its
processing. The list cannot occur inside of any skipped
elements, however, since their internal contents are not evaluated.
If the nav
element does not contain one of these elements, then user agents will
not register the digital publication as containing a usable table of contents (e.g., a
machine-rendered option will not be available).
If the table of contents is considered as a tree of links, then each list item (li
element) inside of the list of links represents one
branch. Each of these branches has to have a name and optional destination in order to be
presented to users, and this information is obtained from the first a
element found within the list item, wherever it is
nested (again, excluding any a
elements inside of skipped elements.)
The link destination for the branch is obtained from the a
element's
href
attribute, when specified. This attribute can be omitted if a link is
not available (e.g., in a preview) or not relevant (e.g., a grouping header). When providing
a link into the content, it is also possible to specify the relation of the linked document
(in a rel
attribute) and the media type of the linked resource (in a
type
attribute).
After finding the a
element that labels the branch, user agents will continue to
inspect the markup for another list element (i.e., sub-branches). If a list is found, it is
similarly processed to extract its links, and so on, until there are no more nested branches
left to process.
A small set of elements are ignored when the parsing table of contents to avoid misinterpretation. These are the [html] sectioning content elements and sectioning root elements. The reason they are ignored is because they can defined their own outlines (i.e., they can represent embedded content that is self-contained and not necessarily related to the structure of content links).
Any element that has its hidden
attribute set is also skipped, since hidden elements are not intended to be directly
accessed by users.
Although these elements can be included in the nav
element, care has to be taken
not to embed important content within them (e.g., do not wrap a section
element
around the list item that contains all the links into the content).
All elements that are not relevant to extracting the table of contents, and are not skipped, are ignored. Unlike skipped elements, ignoring means that user agents will continue to search inside them for relevant content, allowing greater flexibility in terms of the tagging that can be used.
This section is non-normative.
This section defines an algorithm for extracting a table of contents from a nav
element.
It is defined in terms of a walk over the nodes of a DOM tree, in tree order, with each node
being visited when it is entered and when it is exited during the walk. Each time
a node is visited, it can be seen as triggering an enter or exit event. In some
steps, user agents are provided a choice in how to process the content to provide flexibility for
different presentation models.
For illustrative purposes, the examples in this section show the structure of the table of contents as JavaScript objects. User agents can process and internalize the resulting structure in whatever language and form is appropriate.
For the purposes of this algorithm, a list
element is defined as either an [html] ol
or ul
element.
The following algorithm MUST be applied to a walk of a DOM
subtree rooted at the first nav
element in document order with the role
attribute value doc-toc
. All explanations are informative.
Let toc be a object that represents the table of contents and initialize it as follows:
name
property for toc that represents the title of the
table of contents and set to an empty string.entries
property for toc that represents all the
branches of the table of contents and set to an empty array.This step initializes the toc object that will store the title and the branches of the table of contents.
Initialize a stack.
The stack is used to hold branches that are not yet complete. As a new sub-branch is encountered, the parent gets pushed onto the stack so it can be retrieved later.
Let current toc branch be a variable set to null
.
current toc branch is used to hold the object that represents the branch of the table of contents that is currently being processed.
Walk over the DOM in tree
order, starting with the nav
element the table of contents is being
built from, and trigger the first relevant step below for each element as the walk enters
and exits it.
When entering a heading content element:
Run these steps:
If the stack is empty, and the name
property of toc is
an empty string, set the name
property to one of the
following:
If the resulting value of name
is an empty string (e.g., after
removing any presentational elements and trimming all leading and trailing
whitespace), set the name
property either to a placeholder
value or to null
.
This step identifies the heading for the table of contents. A heading is only
processed if the value of the toc
name
property is an empty string (i.e., no headings have yet been
encountered).
Whether a user agent sets the name
to the descendant content of the
heading element, or generates a text string from it, depends on whether it will
re-use any descendant tagging in the presentation (e.g., to retain images,
MathML, ruby and other content that does not translate to text easily).
If the name
is not an empty string, or is null
, then a
previous heading has already been encountered or content has been encountered
that indicates the nav
element does not have a heading (e.g., a
list has already been processed, since the heading would not follow the list of
links).
When entering a list element:
Run these steps:
If the name
property of toc is an empty string, set
name
to null
.
If current toc branch is not null
:
entries
property of current toc branch is
null
or a non-empty array, exit the element and
continue processing with the next element.null
.Otherwise, if the stack is empty:
entries
property of toc is
null
or a non-empty array, exit the element and
continue processing with the next element.This algorithm does not process multiple lists in a single branch or at the root
of the nav
element, so if a list has already been encountered (the
entries
property contains one or more branches or is set to
null
), this list is skipped.
If a list is encountered and the table of contents (toc
) still does
not have a name (i.e., no heading element has been encountered), the table of
contents is assumed to not have a heading (i.e., the heading for the table of
contents cannot appear after the first list of entries). The value of the
name
property is changed from an empty string to
null
as no further headings encountered apply, either.
When exiting a list element:
If the stack is not empty, pop the top object off the stack and set current toc branch to it.
This resets current toc branch back to the parent object after all of its child branches have been processed.
When entering a list item element:
Run these steps:
name
, url
, type
, and
rel
properties for the object and set them to empty
strings.entries
property for the object and set it to an empty
array.Each list item represents a possible new branch in the table of contents, so whenever one is encountered a new blank object is created in current toc branch.
This object gets populated with information as a descendant a
element and list are encountered.
When exiting a list item element:
Run these steps:
If entries
property of current toc branch contains an
empty array, set its value to null
.
If the stack contains one or more entries:
entries
property of current toc branch
contains a non-empty array, and its name
property is an
empty string, set its name
to a placeholder value or
null
;entries
property of current toc branch
contains an empty array, and its name
property is an empty
string, set current toc branch to null and exit this processing
step.Add current toc branch to the array in the entries
property of the object at the top of the stack.
Otherwise, add the object in current toc branch to the
entries
array of toc.
Set current toc branch to null
.
Exiting a list item indicates that processing of the current branch is complete.
Before adding this branch to its parent's entries
array, the branch
needs to be tested to see if it has a name and/or any sub-branches. If it does
not have a name but has sub-branches, the branch is kept. The user agent can
either supply a placeholder value of its own creation or set the value to null.
If it does not have a name or any branches, it is invalid and is discarded.
To determine where to merge the branch, the stack is checked. If there are no
objects in the stack, it is added into the entries
property of the
root toc object (i.e., it is a top-level branch). Otherwise, it gets
added into the entries
property of the object immediately preceding
it in the stack.
As a final step, current toc branch is reset back to
null
.
When entering an anchor element and current toc branch is not
null
:
Run these steps:
If the name
property of current toc branch is not an
empty string, do nothing.
Otherwise:
Set the name
property of current toc branch to
one of the following:
If the resulting value of name
is an empty string (e.g.,
after removing any presentational elements and trimming all leading
and trailing whitespace), set the name
property to
null
.
href
attribute and the URL in the
attribute resolves to a resource in the default reading order or resource list, set the url
property of current
toc branch to the value. Otherwise, set the property to
null
.type
attribute, and the value of the
attribute is not an empty string after trimming leading and trailing
white space, set the type
property of current toc
branch to its value. Otherwise, set the property to
null
.rel
attribute, and the value of the
attribute is not an empty string after trimming leading and trailing
white space, set the rel
property of current toc
branch to its value. Otherwise, set the property to
null
.Exit the element and continue processing with the next element.
This step processes anchor tags to obtain values for the name
and
url
properties of a branch.
If the name of the current branch is already defined, then processing of this element is terminated (i.e., to avoid processing multiple links for a single branch).
Whether a user agent sets the name
of the entry to the descendant
content of the a
element, or generates a text string from it,
depends on whether it will re-use any descendant tagging in the presentation
(e.g., to retain images, MathML, ruby and other content that does not translate
to text easily).
In addition to having an href
attribute specified, it is necessary
that it resolve to a resource that belongs to the digital publication to meet
the requirements of this specification. If not, the branch is retained but the
entry will not be linkable.
Additional information about the target of the link — the type of resource and its relation — is also retained.
When entering a sectioning content element, a sectioning root element, or an element with a hidden attribute:
Exit the element and continue processing with the next element.
As sectioning and sectioning root elements can define their own outlines, descending into them poses problems for generating the table of contents (i.e., they may contain content that is not directly related). As a result, they are skipped over when encountered to prevent their child content from being processed.
Otherwise: do nothing.
For all other elements, this steps allows their descendant elements to continue to be processed.
After completing the DOM walk, if the entries property of toc contains a non-empty array, toc represents the machine-processed table of contents.
Otherwise, the digital publication does not have a table of contents that can be used for machine rendering purposes.
If the entries
array in the root toc object does not contain any
branches (either because no list was found in the nav
element or the list
did not contain any conforming list items), then the algorithm did not produce a usable
table of contents.
This section is non-normative.
A manifest for a simple book. The canonical version of this manifest is also available.
{
"@context": ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type": "Book",
"url": "https://publisher.example.org/mobydick",
"author": "Herman Melville",
"dateModified": "2018-02-10T17:00:00Z",
"readingOrder": [
"html/title.html",
"html/copyright.html",
"html/introduction.html",
"html/epigraph.html",
"html/c001.html",
"html/c002.html",
"html/c003.html",
"html/c004.html",
"html/c005.html",
"html/c006.html"
],
"resources": [
"css/mobydick.css",
{
"type": "LinkedResource",
"rel": "cover",
"url": "images/cover.jpg",
"encodingFormat": "image/jpeg"
},{
"type": "LinkedResource",
"url": "html/toc.html",
"rel": "contents"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneral.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralBol.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralBolIta.otf",
"encodingFormat": "application/vnd.ms-opentype"
},{
"type": "LinkedResource",
"url": "fonts/STIXGeneralItalic.otf",
"encodingFormat": "application/vnd.ms-opentype"
}
]
}
Example for an embedded manifest example. The canonical version of the manifest is, as well as a more elaborate version for the same document are also available.
<!DOCTYPE html>
<html lang="en-US">
<head>
<title>Model for Tabular Data and Metadata on the Web</title>
<link href="#wpm" rel="publication" />
...
<script id="wpm" type="application/ld+json">
{
"@context" : ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type" : "TechArticle",
"id" : "http://www.w3.org/TR/tabular-data-model/",
"url" : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear" : "2015",
"copyrightHolder" : "World Wide Web Consortium",
"creator" : ["Jeni Tennison", "Gregg Kellogg", "Ivan Herman"],
"publisher" : {
"type" : "Organization",
"name" : "World Wide Web Consortium",
"id" : "https://www.w3.org/"
},
"datePublished" : "2015-12-17",
"resources" : [
"datatypes.html",
"datatypes.svg",
"datatypes.png",
"diff.html",
{
"type" : "LinkedResource",
"url" : "test-utf8.csv",
"encodingFormat" : "text/csv"
},
{
"type" : "LinkedResource",
"url" : "test.xlsx",
"encodingFormat" : "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
}
],
}
</script>
</head>
<body>
....
<section id="toc" role="doc-toc">
<h2 resource="#h-toc" id="h-toc" class="introductory">Table of Contents</h2>
<ul class="toc">
<li class="tocline"><a class="tocxref" href="#intro">
<span class="secno">1. </span>Introduction</a>
</li>
...
</ul>
</section>
...
</body>
</html>
A manifest for an audiobook. The canonical version of this manifest is also available.
{
"@context": ["https://schema.org", "https://www.w3.org/ns/pub-context"],
"type": "Audiobook",
"id": "https://librivox.org/flatland-a-romance-of-many-dimensions-by-edwin-abbott-abbott/",
"url": "https://w3c.github.io/wpub/experiments/audiobook/",
"name": "Flatland: A Romance of Many Dimensions",
"author": "Edwin Abbott Abbott",
"readBy": "Ruth Golding",
"publisher": "Librivox",
"inLanguage": "en",
"dateModified": "2018-06-14T19:32:18Z",
"datePublished": "2008-10-12",
"duration": "PT15153S",
"license": "https://creativecommons.org/publicdomain/zero/1.0/",
"resources": [
{
"rel": "cover",
"url": "http://ia800704.us.archive.org/9/items/LibrivoxCdCoverArt12/Flatland_1109.jpg",
"encodingFormat": "image/jpeg"
},{
"rel": "contents",
"url": "toc.html",
"encodingFormat": "text/html"
}
],
"readingOrder": [
{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_1_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1371,
"name": "Part 1, Sections 1 - 3"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_2_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1669,
"name": "Part 1, Sections 4 - 5"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_3_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1506,
"name": "Part 1, Sections 6 - 7"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_4_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1669,
"name": "Part 1, Sections 8 - 10"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_5_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1506,
"name": "Part 1, Sections 11 - 12"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_6_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1798,
"name": "Part 2, Sections 13 - 14"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_7_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1225,
"name": "Part 2, Sections 15 - 17"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_8_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1371,
"name": "Part 2, Sections 18 - 20"
},{
"url": "http://www.archive.org/download/flatland_rg_librivox/flatland_9_abbott.mp3",
"encodingFormat": "audio/mpeg",
"length": 1659,
"name": "Part 2, Sections 21 - 22"
}
]
}
This section is non-normative.
This section illustrates how Unicode formatting characters can be applied to bidirectional strings, where necessary, in order to help a consumer produce the expected display. In cases where the first-strong heuristics would produce the wrong result, if the string is created with a prepended formatting character, the first-strong heuristics will produce the correct base direction for the string as a whole.
A right-to-left string that begins with a Latin script character should have U+200F RIGHT-TO-LEFT MARK prepended.
Character order in memory: | HTML היא שפת סימון. |
Gives incorrect display: | HTML היא שפת סימון. |
Source code with formatting character: | "\u200FHTML היא שפת סימון." |
Gives expected display: | HTML היא שפת סימון. |
A left-to-right string that begins with a Arabic script character should have U+200E LEFT-TO-RIGHT MARK prepended.
Character order in memory: | 'سلام' is hello in Persian. |
Gives incorrect display: | 'سلام' is hello in Persian. |
Source code with formatting character: | "\u200E'سلام' is hello in Persian." |
Gives expected display: | 'سلام' is hello in Persian. |
This section is non-normative.
These diagrams provide a visual view of the lifecycle steps, as specified in § 4. Manifest Lifecycle.
This section is non-normative.
The following table identifies where manifest properties are defined and extended.
This section is non-normative.
The following table identifies where the use of resource relations is defined.
Name | Publication Manifest |
---|---|
accessibility-report
|
§ 2.8.2.1 Accessibility Report |
contents
|
§ 2.8.3.3 Table of Contents |
cover
|
§ 2.8.3.1 Cover |
pagelist
|
§ 2.8.3.2 Page List |
privacy-policy
|
§ 2.8.2.3 Privacy Policy |
preview
|
§ 2.8.2.2 Preview |
This section is non-normative.
The editors would like to thank the members of the Publishing Working Group for their contributions to this specification:
The Working Group would also like to thank the members of the Digital Publishing Interest Group for all the hard work they did paving the road for this specification.