This document is also available in these non-normative formats: XML.
Copyright © 2004 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a W3C Working Group Note, made available by the of the W3C Internationalization Working Group (Web Services Internationalization Task Force) as part of the W3C Internationalization Activity. It describes internationalization usage scenarios for Web services and is intended for review by W3C Members and other interested parties. It is also intended to serve as a basis for future work on Web service internationalization.
The Internationalization Working Group (Web Services Internationalization Task Force) thinks that this document has reached a sufficient level of maturity to be published as a Working Group Note, and does not intend to issue new versions in the near future. This does not exclude that the document may be updated at a later stage, after more experience has been gained with the internationalization of Web services.
The Internationalization Working Group or its successor will keep track of any further comments and discussion relating to this document, and invites any such comments or discussion. Discussion of this document takes place on the public mailing list
public-i18n-ws@w3.org.
To contribute, please subscribe by sending mail to
public-i18n-ws-request@w3.org
with subscribe
as the subject. The
archive of this list can be read
by the general public. Please send comments on this document to the
www-i18n-comments@w3.org mailing list
(public archive). Please use [Web Services]
or [WSUS] in the subject.
Publication as a Working Group Note does not imply endorsement by the W3C, including the Team and Membership.
1 Introduction
1.1 Audience for This Document
1.2 Scope
2 Introduction to Web Services
2.1 Basic Framework: Anatomy of a Web Service Interaction
2.1.1 Discovery
2.1.2 Request
2.1.3 Response
3 Introduction to Internationalization: Definitions for a Discussion of Web Services
3.1 What are Internationalization and Localization?
3.1.1 Relationship of Locale to Natural Language
3.1.2 I-025: Specifying and Exchanging International Preferences in Web Services
3.1.3 Locales in Web Service Descriptions
3.1.4 Locales in SOAP
3.1.5 Faults, Errors, and Human Readable Text
3.2 Locale Independent vs. Locale Dependent Data
3.2.1 Textual vs. Binary Representations
3.2.2 Locale-Dependent XML Schema Datatypes
3.2.3 Examples
4 Basic Web Service Internationalization Scenarios
4.1 Locale Patterns in Web Services
4.1.1 The Travel Application
4.1.2 Locale Neutral
4.1.2.1 Example: 'GetArrivalTime' Returns Flight Arrival Time
4.1.3 Client Influenced
4.1.3.1 Example: 'getItinerary' Get Flight Information in the Requester's Language
4.1.3.2 Service Description
4.1.4 Service Determined
4.1.4.1 Example: 'flightCheck' Service
4.1.5 Data Driven
4.1.5.1 Example: 'getWeightRestrictions' Gets Flight Luggage Restrictions
4.1.5.2 Example: Stored User Preferences
4.1.5.3 Example: Data from Service to Service
4.2 Locale and Language Dependency in Message Exchange Patterns
4.2.1 I-009: One Way Messages
4.2.2 I-018: Data Associated with a Default Attribute
4.2.3 I-013: Conflicts Between Requester's Expectations and Service's Locale
4.3 Fault Handling
4.3.1 I-004: Producing Fault Reasons in All Available Languages
4.3.2 I-005: Language Matching for Fault Reason Messages
4.3.3 I-008: Locale Sensitive Formatted Data in SOAP Fault Messages
4.4 Legacy Issues
4.4.1 Pandora's box: Using Non-internationalized Data Structures
4.4.2 I-019: Locale Dependent Datatypes
4.4.3 Existing Web Services
4.5 Character Encodings and Web Services
4.5.1 SOAP Documents and the MIME Charset Parameter
4.5.2 Character Encoding of Attachments
4.5.3 Unsupported Charset in Request Scenario
4.5.4 Unsupported Charset in Response Scenario
4.5.5 Unsupported Characters
4.5.6 Legacy Application Use of Non-Unicode Character Encodings
4.5.6.1 Calling the Service Requires Transcoding
4.5.6.2 Service's Internal Implementation Performs Transcoding
4.5.7 Variability in Transcoding Scenario
4.6 Passing or Matching International Preferences
4.7 Intermediaries and Internationalization
4.7.1 I-020: Correlation of Data Between Services in Different Languages
4.7.2 I-007: Interaction of Optional Locale and Language Negotiation and Chained Services
4.7.3 I-012: Caching
4.7.4 Caching with Locale Information in SOAP Headers
4.8 SOAP Header Structures
4.8.1 Character Encoding Conversion Scenario
4.9 Service Discovery
4.9.1 Searching for Web Services Using UDDI
4.9.2 I-026 Searching for Service Descriptions Using My Language
4.9.3 I-027: Searching for Services Specific to a Region
4.10 Introspection of Services When Generating WSDL
4.11 Ordering, Grouping, and Collation
4.12 Natural Language Text Search
4.12.1 Language-Neutral Natural Language Text Search
4.12.1.1 Unicode Normalization
4.12.1.2 Catalog or Index in Multiple Languages
4.12.2 Language-Specific Natural Language Text Search
4.12.2.1 Keyword Searching
4.12.2.2 Gender and plural variants
4.12.2.3 Orthographic Variation in Searching ('like' clauses)
4.12.2.4 Use of Intermediary Translation and Dictionary Look-Up Service
4.12.2.5 Phonetic Searches
4.13 Locale Sensitive Presentation and Human Readable Text
4.13.1 I-021: Data Formatting for End User on Receiver Side
4.13.2 I-022: Data Formatting on Sender Side
4.13.3 Enumerated Values and Object Names
4.13.3.1 Use of Default English-like Names
4.13.3.2 Types of Names
4.14 Data Structures
4.14.1 Times and Time Zones
4.14.2 Calendars and Dates
4.15 Legal and Regulatory Goobers
4.15.1 Modeling Tax, Customs, Legal, and Other Cross-Border and Cultural Considerations
4.16 Transports
4.16.1 HTTP
4.16.2 FTP
4.16.3 SMTP
4.16.3.1 MIME Tags
4.16.4 IRIs, URIs, and fun stuff
4.17 Orchestration and Choreography
A References (Non-Normative)
B Acknowledgements (Non-Normative)
C Heisei (Non-Normative)
The goal of the Internationalization Web Services Task Force is to ensure that Web services have robust support for global use, including all of the world's languages and cultures.
The goal of this document is to examine the different ways that language, culture, and related issues interact with Web services architecture and technology. Ultimately this will allow us to develop standards and best practices for implementing internationalized Web services. We may also discover latent international considerations in the various Web services standards and propose solutions to the responsible groups working in these areas.
There are three basic parts to a Web services interaction. The first part is discovery and configuration. The second part is the request. The third part is the optional response. In the diagram above, the purple arrows are the discovery, the blue arrows are the request, and the red arrows are the response.
It is important to distinguish between the actual service and the Web service provider or agent. The service is the function, method, or other logic that actually is "the service". The provider is the process that receives and emits SOAP messages. In the diagram above, we show the client process and the requester agent as being in a single machine and process, while the provider agent and the actual service are in separate processes. Neither of these is necessarily the case: the provider agent may host the service inside its process, just as the client process and requester agent might be in separate processes or on separate machines.
[Definition: International Preferences]The specification of the particular set of cultural conventions that software or Web services must employ to correctly process information exchanged with a user.
[Definition: Internationalization]The process of designing, creating, and maintaining software that can serve the needs of users with differing language, cultural, or geographic requirements and expectations.
There are many kinds of international preferences that a Web service may need to offer, to be considered usable and acceptable by users around the world. Some of these preferences might include:
Natural language for text processing: parsing, spell checking, and grammar checking are examples of this
User interface language, which may include items like images, colors, sounds, formats, and navigational elements
Presentation (human-oriented formatting) of dates, times, numbers, lists, and other values
Collation and sorting
Alternate calendars, which may include holidays, work rules, weekday/weekend distinctions, the number and organization of months, the numbering of years, and so forth
Tax or regulatory regime
Currency
... and many more
Because there are a large number of preferences, software systems (operating environments and programming languages) often use an identifier based on language and location as a shorthand indicator for collections of preferences that typify categories of users.
HTML for example uses the lang attribute to indicate the language of segments of content. XML
uses the xml:lang
attribute for the same purpose.
Java, POSIX, .NET and other software development technologies use a similar-looking (but not identical) construct known as a locale to activate certain internationalized capabilities in software.
[Definition: Locale] A collection of international preferences, generally related to a geographic region that a (certain category) of users require. These are usually identified by a shorthand identifier or token that is passed from the environment to various processes to get culturally affected behavior.
Generally, systems that are internationalized can support a wide variety of languages and behaviors to meet the international preferences of many kinds of users. When a particular set of content and preferences is operationally available (often called "enabled"), then the system is referred to as localized.
[Definition: Localization] The tailoring of a system to the individual cultural expectations for a specific target market or group of individuals. The target group is often indicated by the locale identifier.
Localized systems often need to perform matching between end user preferences represented by the locale and localized resources. This process is called language (or locale) negotiation.
[Definition: Language Negotiation] The process of matching a user's preferences to available localized resources. The system searches for matching content or logic "falling-back" from more-specific to more-general following a deterministic pattern.
However, it is important to note that many of the international preferences do not correlate strongly with locale identifiers based solely on language and location. For example, a system might define a locale of "en-US" (English, United States). This locale encompasses several time zones, so the user's preferred time zone cannot be deduced by the locale identifier alone. Many cultures have more than one way of collating text, and so the appropriate sort ordering cannot always be inferred from the locale. For example, Japanese applications may use different orderings known as radical-stroke and stroke-radical. Germany and other parts of the world may use different sort orderings known as dictionary versus phonebook.
Distinguishing these situations requires forethought in the design of the service and the setting of reasonable default values.
Each user or system in a Web services interaction may have its own default locale settings. The interplay between the requester, provider, service host, intermediaries, and other entities may have complex implications.
There is not yet an Internet standard for locale identifiers. However, there is one for natural language identifiers, [RFC3066]. Since these language identifiers can imply a locale and in the absence of a standard for locale interchange, language identifiers are often used by software as the source for locale identification. Language and locale are distinct properties and should not be used interchangeably, but there is a relationship between these parameters in the area of resource selection and localization.
The danger of using one for the other lies in the distinction between them. A language preference controls only the language of the textual content, while locale objects are used to control culturally affected (software) behavior within the system. For example, making the assumption that the language parameter ja (Japanese) means the data should be presented in the locale-determined format for Japan could be a mistake if the requester actually lives and works in Australia.
The language parameter may be available in several places. In HTTP, there is an Accept-Language header field which can be used (see the HTTP Accept-Language section for more information). MIME has a Content-Language header which contains a language identifier (see the MIME Tags section for more information). In XML, there is an attribute which can be defined for elements called xml:lang
. xml:lang
marks all the contents and attribute values of the corresponding element as belonging to the language identified. What that means for processing those contents varies from application to application.
Here are some examples:
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p> <p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p> <sp who="Faust" desc='leise' xml:lang="de"> <l>Habe nun, ach! Philosophie,</l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n.</l> </sp>
For more detailed information on the behavior of xml:lang
, see the XML specification.
Web service and provider implementations, like Web based applications, face the problem of language and locale negotiation.
Most Web based application environments have established proprietary standards for performing language and locale negotiation and provide greater or lesser support for managing this form of personalization and content management.
Web services, in contrast, must allow disparate systems to interoperate in a consistent, non-proprietary manner. This design allows systems to invoke each other without regard to the internal architecture of any part of the system. It is helpful to think of a Web service as an remote procedure call ("RPC"), even though many Web services do not use the SOAP-RPC pattern. Unlike Web applications that can store user preferences in a session-like object hidden from the requester, Web service interoperability requires a shared model, if processing is to produce consistency between expectations and result.
Some of the problems inherent in dealing with locale negotiation and identifiers in Web services include:
Web Service Description Scenario A: A method is implemented in the Java programming language which takes a java.util.Locale
argument. A Web service description is generated from this method via reflection of the Java class so that the method can be deployed as a Web service. The implementation of the Java java.util.Locale
class is exposed in the Web service description and requests must be submitted with field values appropriate for Java, which may be difficult or impossible for non-Java clients to provide.
Description Scenario B: The same method is implemented taking a single string argument instead. The programmer creating the method writes logic to translate the string into the appropriate internal locale object. This logic may be substantial and must be repeated or shared for each locale-affected method. There is no way to associate the string argument with locale functionality in the provider, locale or language identifiers available in the transport, or to describe the parameter fully and consistently in directories. A system invoking the service might not be able to create a string in the expected format. The provider may not be able to validate the information appropriately.
Description Scenario C: A existing or "legacy" function or method which obtains its locale information from the runtime environment is deployed as a Web service. Existing locale negotiation mechanisms, such as Accept-Language in many application servers, rely on the container (formerly an Application server, but in this case the service provider) to populate this information. The service provider cannot know that this information is needed. The Web service description doesn't have a mechanism for describing this environment setting and the results from the service are limited to the runtime default locale of the provider or service host.
Scenario A, Different Locale Identifiers: Sender sends a request to a provider and wants a specific locale and uses its identifier for that. The provider is running on a different platform and doesn't produce the same result as the sender expects.
Scenario A1, Different Locale Semantics: Sender sends a request to a provider, expecting a result in a specific locale-affected format. The provider has a locale with the same ID, but the specific operation is different from the sender's implementation and the results don't match. These differences are generally subtle, but may vary widely depending on the specifics of the implementation. For example: collation or formatting dates as a string often display subtle variation from one platform to another.
Scenario A2, Fallback Produces Different Results: Sender requests a specific locale. Provider's fallback
produces wildly different results. For example, zh-Hant
, the RFC3066 language tag for Chinese written in the Traditional Han script might fall back to zh
which represents
generic Chinese and, on many systems, implies the use of the Simplified Han script.
The following graphics show some Chinese language tags and the resulting locale object in various systems. Note the differences in interpretation:
Here are two additional examples, one for Serbian and another for Azerbaijani:
Scenario B: Sender sends a request to a provider, expecting results in in a specific
locale-affected format. The sender uses its own locale identifiers. The provider and/or service is on an incompatible
platform and cannot interpret the request. For example, converting Microsoft Windows's LCID
identifier to Java's java.util.Locale
.
Scenario C: Sender wants a specific format or set of processing rules for
a specific item or set of items. The provider is running on a different platform, so the semantics differ. For example, the sender expects the Java SHORT
date format, but the provider is written in the C# language.
Scenario D: Sender wants a specific format and sends a picture string or other very specific identifier. The provider and sender must agree on picture string semantics. For example, they must agree on what the picture string symbols stand for. Even in the presence of such an agreement, the underlying data in the different locale models may not match, such as the particular abbreviation for a month name.
Scenario E: Sender wants a specific locale and the provider doesn't support it. This isn't fatal to or detected by the receiving process, which returns data in an unexpected format or with unexpected results. For example, the date May 6, 2004 might be returned in a locale-formatted string as 06/05/2004 and be interpreted by a U.S. English end user as June 5, 2004.
Scenario F: Scenario E, except that it is detected by or fatal to the service. It may be difficult to interpret why the service failed. For example, the date returned in Scenario E might have been 13/05/2004, which is clearly in the wrong format for a U.S. user, but the receiving service may not be able to correct for the problem
Scenario G: Sender requests results that contain human readable text. The provider returns all languages available.
Web service descriptions should consider how to communicate language or locale choices in a consistent manner. In the sections that follow, specific patterns are recommended as good canonical references. However experience shows that a specific implementation may require additional contextual information not conveyed with a simple language tag. Generally this type of additional information should be encoded into the message body (that is, as part of the application's design, not as part of the Web services infrastructure). This expresses specific implementation decisions as part of the service's signature: you might require additional or different data in future versions. Some of the examples below show this type of information exchanged in headers and some of the complications that may arise from this.
In the examples below, adoption of a generic method for exchanging "international contextual information" would allow implementations to better model the natural language and locale processing choices offered by the services.
Implementers should consider adding a language tag to any operation fault elements to show what language to expect fault messages to be generated in.
In all cases, descriptive text should be tagged with its actual content language
using the xml:lang
attribute (where permitted). Consideration should be given to
providing documentation within services in alternate languages when the service
is expected to be utilized by users such as those in other countries or who speak
other languages.
Some applications of Web services require a locale in order to meet end user expectations. An example of this is any process that returns human readable text messages (many more examples exist and some are given below).
Software developers generally get their messages from language resources using an API provided by the programming environment. This functionality is implemented in many ways, but the pattern for writing the logic is always similar: the language and locale preferences are not included in the parameter list of the service itself because the processing environment (JVM, OS, .NET framework, etc.) maintains this information as metadata about the process or user.
A SOAP Processor implementation might provide accessible natural language or locale preference information, received either in the transport (such as HTTP Accept-Language) or in SOAP headers defined for a particular binding of a service.
For example, a .NET SOAP Processor might set the service's thread default
CultureInfo using a language tag. A J2EE implementation might populate
the javax.servlet.ServletRequest
class's Locale
property with a java.util.Locale
constructed from
the ISO639 and ISO3166 fields embedded in a language tag. And so forth.
An interesting, informative paper describing late localization is available here: [JITXL]
The use of XML Schema in Web services helps promote locale-independent data because most of the XML Schema datatypes [XMLS-2] have been designed to be locale-independent.
As an example, the XML Schema Datatype
date
uses the format YYYY-MM-DD
from [ISO8601]. This format is similar
(and in some
cases even identical) to some actual formats used in some locales. The format is unambiguous and can be
understood by a human reading the XML file. Although it is the appropriate format in some locales and not in others, it can be understood to be a locale-independent format. By contrast, if XML Schema had chosen a format
that is not
used in any locale, such as just numbering days since a well-defined
day, it would
have made the format much more difficult for humans to work with,
without any
benefits.
This section uses examples from the travel industry to illustrate all four patterns.
A service to get flight arrival time 'GetArrivalTime', can be written in a locale-neutral way.
For example, the service response would contain a single value arrivalTime
using the current UTC (Coordinated Universal Time) time in the ISO 8601[ISO8601]
format: YYYY-MM-DDThh:mm:ss.sss
, mandated by the
time
datatype
in XML Schema Part 2: Datatypes[XMLS-2].
Any requester can transform the result into a local time format, including shifting the time into the local time zone. This way the requester agent, service provider, the service, and the result are entirely independent of the locale of the client, the host, and the implementer. Hence the service is locale-neutral.
By contrast, a service that returns a locally formatted string containing the arrival date and time should be dependent on the locale and language preferences of the requester.
In the Client Influenced pattern, the service provides a specific set of localized behaviors which are tailored according to the locale preferences of the requester.
The service must provide a way for the requester to communicate the preferences, and, if there is a response, it should communicate the actual value used to perform the processing.
This pattern's name uses the term 'influenced' because the provider and service may not have all possible languages, locales, or sets of preferences available as resources. The service might perform language negotiation and 'fall-back' to a more general set of preferences or use its own preferences if the preferences requested cannot be satisfied.
For example, the service might use the default locale of the system where the provider is running.
As an example, a service dealing with flight schedules will use the time zones of the respective departure and arrival locations for departure and arrival times, rather than some server-related time zone or the time zone preference of the client. The ISO 8601 format used by XML Schema might express times using the [RFC822] UTC offset for each time zone, rather than attempting to use a single time zone in the messages.
As another example, a service that queries a database might return data sorted or selected based on the database's configuration, rather than an external setting (such as the requester or service locale).
One-way messages that do not have a response may still have language-related issues.
Adding parameters to the SOAP body requires design changes to the service interface and possibly to the implementation. Adding default values into SOAP headers does not affect the service interface and often can be done statically for a particular resource. This may be an acceptable solution when presenting data from legacy systems through Web services. For example, this could be used for adapting a legacy retail or banking system which conducts all transactions in a single currency to provide data to an international system, however there are many potential issues with this design (see section 4.4.1 Pandora's box: Using Non-internationalized Data Structures).
SOAP Version 1.2 allows the provider to send fault messages that provide a description of the reason the service failed in
multiple languages. SOAP Version 1.2 Part 0: Primer [SOAP-0]
explains the <Reason> element as follows:
"It must have one or more env:Text sub-elements, each with a unique xml:lang
attribute,
which allows applications to make the fault reason available in multiple languages.
(Applications could negotiate the language of the fault text using a mechanism built
using SOAP headers; however this is outside the scope of the SOAP specifications.)"
This mechanism is suitable for returning faults in an environment in which the number of languages is relatively small and the range of languages to be returned is known in advance.
SOAP implementations are often localized into many languages simultaneously. To prevent faults from becoming overly large and difficult to manage, implementations should include some strategy that reduces the set of languages returned to those of interest to client(s). This requires a mechanism to match the language of the fault as closely as possible to the client's preferences.
Internationalization best practice is to perform late localization, in which messages are formatted or resolved to strings as late as is reasonable in a process. This preserves language independence and flexibility in responding to multiple users with different language or cultural needs.
Future versions of SOAP should probably consider allowing additional structured information in a Fault so that suitably internationalized clients can perform the localization and formatting themselves.
The service requester needs to select a matching language from the list of
fault reasons returned by the service provider. Language tag matching and language ranges are described by RFC3066[RFC3066]. Since the xml:lang
value associated with the Reason Text element may not be empty, the requester may be unable to match any of the returned text elements to
its current end user language.
RFC3066 language tag matching and SOAP Reason Text elements do not provide for a default message: there is only a list of different language messages. So the requester must choose some reasonable default from the list of messages provided.
This is an example of a user's daily activity provided in Japanese 12 hour time scheme.
As an example, if a Japanese sender sends dates to a Japanese receiver, the Japanese sender may wish to send the data in a Japanese date format as required for government records, such as H13-5-31(H stands for the Heisei era; see Appendix C Heisei).
WSDL and SOAP can be used to constrain locale- or region-specific data fields.
SOAP interactions rely on being able to exchange data in a consistent, mutually understandable way. The character encoding of the SOAP message and the communication of the encoding between senders and receivers enable this to occur reliably. Because all XML [XML] processors must be able to read entities in both the UTF-8 [RFC3629] and UTF-16 [RFC2781] encodings, using UTF-8 or UTF-16 guarantees character encoding interoperability on the SOAP layer. The Character Model for the World Wide Web [CHARMOD] document describes these considerations and guidelines.
The charset parameter must be supplied in order to ensure correct interoperability.
Note that the XML Japanese Profile [XML-JP] states that using legacy encodings such as Shift_JIS cannot provide complete interoperability in information interchange; there are differences among platforms in the mapping tables they use for this and similar encodings.
Scenarios in this section deal with issues that arise when services employ intermediaries, such as those discussed in "Service Oriented Architecture Derivative Patterns Intermediary" (in Web Services Architecture document[WSA]).
As the diagram indicates, one or more providers offer services. An intermediary provider can deploy a service that makes requests from these providers and uses the results to satisfy the requests coming from its clients. The intermediary service may process and/or integrate the results from different providers to create a new kind of service or it may simply pass results along. The intermediary service may also cache either the contents it sends to clients, or the results returned to it by its providers, for reuse with subsequent requests. In these scenarios it is important to consider that the providers may return results formulated for certain international preferences. Clients may also be expecting results formulated according to their specific requirements. The intermediaries may be expected to apply appropriate matching between client and provider, or to bridge gaps.
With respect to internationalization, there are four primary scenarios that will be discussed below:
I-026.1 Searching for Service Descriptions using my language
It states in the UDDI Version 3.0.1[UDDI] specification in the section on Introduction to Internationalization:
"1.8.4 Use of Multiple Languages and Multiple Scripts
Multinational businesses or businesses involved in international trading at times require the use of possibly several languages or multiple scripts of the same language for describing their business. The UDDI specification supports this requirement through two means, first by specifying the use of XML with its underlying Unicode representation, and second by permitting the use of the xml:lang
attribute for various items such as names, addresses, and document descriptions to designate the language in which they are expressed.
"
Using xml:lang
and multiple entries, a service provider can publish text information about their service in multiple languages. The name, description, address, and personName UDDI elements may have an associated xml:lang
attribute to indicate the language in which their content is expressed. The policyDescription element contains a description of the effect of the policy implementation. This element can also have an xml:lang
attribute and can appear multiple times to allow for localized versions of the policy description. Providers are encouraged to do this for target language markets that their service may support.
Entity names in UDDI can also provide an Alternate Name in RFC2277 default language, readable in English. This provides a fallback mechanism to allow a search to identify services even if the named contents may be in a script that is not readable by the entity doing the search.
The scenario would be as follows:
Service provider publishes service information using UDDI in the provider's default language. The first entity name in a list is considered to be in the primary name and language.
Service provider, or other entity, adds localized duplicate content to the UDDI entries for the service.
Service requester makes a request for service listings, first setting the primary language for searching using the UDDI Subscription API. The language is indicated by setting the xml:lang
attribute on query key entities.
The UDDI application returns services that match the query in the given
xml:lang
language, matching languages according to the language matching
rules defined in [RFC3066].
Here are some examples from the UDDI Version 3.0.1 specification.
A brief list of these collation issues are described here. An important reference is the Unicode Collation Algorithm (UCA), described by: [UTR10]. Although the UCA is a mature standard, it should be noted that there is wide variance in the implementation of collation algorithms; that few of these implementations are based on UCA; and that there is little or no general agreement on identifiers for collation preferences.
Collation rules cannot be inferred solely from a language identifier or a locale, as the identifiers do not indicate which sort ordering should be used within a locale. A language identifier may be suggestive as to whether a requester expects a particular sort ordering (as with Traditional or Modern ordering in Spanish, for example) but it may not be definitive.
Some examples of sort orderings include: telephone, dictionary, phonetic, binary, stroke-radical or radical-stroke. In the latter two cases, the reference (source standard) for stroke count may also need to be cited.
Different components or subsystems which are used by a software process may employ different sort orderings. For example, a User Agent may provide a drop-down list which sorts the elements of the list at run-time differently from the other components of the agent. Information retrieved from a database may be ordered by an index which has no correlation to the requester's requirements. When different components or subsystems of a Web Service use different collation rules, then errors can occur. They are not always hard errors (i.e. those that generate faults) but the resulting data, operations, or events, may be incorrect or inconsistent with expectations.
In the case of services that might use a binary collation (ordering by the code points of text data) there can be differences in ordering introduced by different components using UTF-8 vs. UTF-16 internally.
Knowing the language of the requester does not prescribe how sensitive the collation should be. Should text elements that are different by case or accent be treated as distinct? Should certain characters be ignored? For example, hyphens are often ignored so that "e-mail" and "email" sort together.
Where case is considered distinct, it may be important to describe whether all lowercase characters precede all uppercase characters, vice versa, or whether they should be intermixed.
Often the performance of an application is impacted by collation. For example, if a service returns results in an unknown ordering, the requester may have to sort the results using its local collation rules. This can consume resources and delay the further use of the results until the entire set can be collated. Alternatively, if results are returned in the order needed by the requester, then the requester can begin to process the first records returned without waiting for the remaining records to arrive.
Of course, collation can be performed at different stages of data processing and timing can be an important consideration. Database indexes are updated as the data is added to the database, not at the time a request arrives. Requests that can use the preordained collation of the index have a significant performance advantage over requests that either cannot use indexes or must re-sort the results.
For language neutral applications, text should be normalized to only one form (such as base+combining character or all precomposed) according to Unicode Standard Annex #15 [UTR15] before comparisons are made. For more information, please see [CharModNorm].
Note that this kind of normalization is different from and in addition to other forms of normalization such as case-folding.
As an example, the address "422 St. Jerome St." could be also be represented as:
(From http://www.opentravel.org/2004A/xsd/3/simpletype/PaymentCardCodeType.htm)
Scenario A: A Web service returns the current time of a city listed as part of
the request. The requester sends the name of a city (with an xml:lang
attribute value) and
the provider returns the current time in that city formatted in [ISO8601] format
(hh:mm:ss).
Scenario B: A Web service takes a date/time value in ISO 8601 format
(yyyymmddThhmm+hhmm) and the name of a city with an xml:lang
attribute value, and returns
the value converted to the specified city's time zone.
Scenario C: As a sub-process of a "meeting manager" service, a Web service inspects multiple appointment books looking for mutually available time slots. The requester provides a span of time in ISO 8601 format (yyyymmddThhmm+hhmm) using a start time and an end time. The inspected appointment books store information about their time zones. The service returns a series of time spans in the ISO 8601 format.
Internationalized Resource Identifiers (IRIs, see [IRI]) should be used wherever URIs would be used, to allow the use of non-ASCII characters in a natural way. This can be done automatically by using the anyURI data type from XML Schema [XMLS-2].
The example in I-022 was taken directly from an example by Mark Davis, IBM, and is used by his permission.