i18n ITS WG F2F -- 21 Sep 2005

Presentation by Goutam

Goutam presents his three layer scheme

CL: a comment
... we have covered the content domain level already in our requirement documents
... the second level, the sentence level
... information you suggest like "this is an question"
... that information would ensure accurate translation
... that level can be covered by a host schema
... for example the TEI has elements which could cover the sentence level
... as for the third level, the word level, that is related to terminology work
... e.g. you might say "that term is the expression of a concept 'bank'"

<SebastianR> an example of TEI word-level markup:

CL: we addressed the terminology realm in the requirement document

<SebastianR> <w type="EX0">there </w>

<SebastianR> <w type="AJ0">extra </w>

<SebastianR> <w type="CRD">twenty </w>

<SebastianR> <w type="CRD">thousand </w>

<SebastianR> <w type="NN1">beginning</w>

FS: the categories of Goutam's schema can also be expressed as attributes
... e.g. <s type="praying">

RI: So you want to use three attributes which should be available everythere?

GO: No, I will explain

Goutam explains the proposed scheme

CL: We agree that these three levels of information will give us a lot of benefit

GO: You will get meaningful output

CL: I do not agree that it will solve every problem of translation
... e.g. dialogue systems and machine translation systems
... they do not just know about pos, sentence cat, domain
... but really about complex transfer conditions
... you need that information to do accurate work
... our scope is not to tell people who built parsers how to do that
... what you propose can be a part of our guidelines
... which show that you cannot do an accurate translation without such information
... so as a guideline for schema authors: please provide that information
... then people like the TEI people can see if they have covered that topic
... or people who develop new schemas will read the ITS guidelines and create their schemes in that way

FS: would anybody disagree to put this into the guidelines?

CL: I would put it into the guidelines and add a fourth level
... the guidelines should say
... please provide as many context as possible (i.e. "context" as a fourth level)
... "please don't give every sentence to a translator seperately, but give the translator the context"
... e.g. the translator should see not only the content of a XUL element, but the other parts of the XUL document respectively

YS: how about dialect specification?
... would that be part of the requirement for lang / locale specification?

RI: we should mark it up for language
... a question on the purpose of the three layer scheme:
... do you expect content authors to mark up that layers?

GO: It might be, but not necessarily

RI: so this markup would be used by a linguistic person?

GO: Maybe even a "simple" person
... e.g. students who study grammar, first language / second language grammar

YS: I would not use such markup because I'm bad at grammar ...

CL: I share Yves feelings
... some authors have difficulties to provide this information

RI: Is this for use by machines?
... if that is the case, the tokens have to be machine recognizable
... it seems to be difficult for an ordinary person to use such information

CL: on RI's question whether this is for humans or for machines
... I think information about the domain, sentence type or specific words
... it will help translators to do better quality work or to do the work quickly
... if they know that a word belongs to a specific domain
... they can go to a terminoloy data base and check the word
... so even for human translators this might be helpful
... e.g. "this is a computer interface string" is a helpful information
... for my understanding, the human use scenario is not only for the translators
... but also authors or for quality assurance

RI: That is a different topic for quotation
... the example you give with term data bases
... a machine, not a human will look up the data base

CL: As for terminology
... the translator has to be made aware of the fact that s.t. is a term

FS: that is then the terminology requirement

YS: I propose to have an action item to work on the document Goutam started

<scribe> ACTION: Goutam to continue work on the document he started to see if we should put that into the guidelines, including the aspect of language / dialect identifaction [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action01]

Continuation of the mapping requirement

YS: we talked about pointing to XSLT
... I think there are different kinds of mapping
... i.e. a 1:1 mapping
... sometimes we have to map elements to attributes
... so how could we address that
... e.g. s.t. like the DITA translate would be easy to map

FS: how about the question when the mapping takes place?
... e.g. "on the fly" during processing or before / after processing?

YS: We don't have to specify that

FS: Yes, we can leave that to the people who use the mapping

YS: let's collect examples what kind of mapping we need

SR: If you say that the translate attribute of ITS maps to DITA translate
... that could only be a clue for a human

CL: One of our requirements is to mark up terminology

Dita localization aids: http://www-306.ibm.com/software/globalization/topics/dita/localization.jsp

CL: I saw simple mapping of mapping priporitary language identifiers to offical ones
... e.g. people would use numeric values to identify languages

RI: so you map values, right?

CL: yes
... we need a list of data categories what is to be mapped
... e.g. translatability, constraints
... and then a list of the mappings

YS: we don't want to force people to use ITS if they already have the information

RI: But that is a different use case, right?
... if DITA does exactly the same thing, that do we have to do?

SR: DITA attributes are not in a namespace
... so it would be no problem if we use that
... in DTDs, you could hard wire prefixes

RI: hard wired means "changing the schema"?

YS: if we are stepping out of the namespace realm
... we might run into clashes

SR: by proposing the automatic mapping
... that is a burden for the processing application

YS: true, but the tools can be very generic

FS: would it be a possibility to approach the DITA people and make an agreement with them on what one should use for "translate"?

YS: it would be mainly the case in terminology

FS: example with architectural forms: http://www.w3.org/People/fsasaki/EML2005sasa0411.html section 4.3.1

CL: If we establish what the indicator of translatability is
... that would be very helpful
... the "equiv" would be helpful for people who are in the process and the people who use this
... of course we might have problems which RI and SR mentioned
... so we could provide a container for mapping
... and have suggestions how to fill the container
... e.g. with xslt

FS: so an "extensible" container?

RI: that sounds like localization property stuff

YS: to some degree
... the problem is: we have schemas
... which we cannot process

because their is no generic way of applying their l10n related information

scribe: to the ITS sensitive tools

RI: somebody has to do that at some point

YS: yes
... and the schema is the best place to have that information

RI: Another issue
... if you put that into our schema, the dita schema might change

<YvesS> ..FS to show the example with Architectural forms.

RI: Would that not be localization properties work?

YS: In some way
... if there is an w3c way of mapping
... we could just adopt it
... I want to say

"img" is a graphic

scribe: as the tool processes "im"
... it should be processed like a graphic

RI: That is not a tag set again, that is localization properties

YS: yes, but we have that existing requirement
... we thougth we have a common goal, but maybe not

Summary of the mapping discussion

CL: We will solve the need to provide information about correspondences
... we recommend that to people and have an element / attribute that points to a mapping
... then we say that people can consider different things like xslt or architectural forms

YS: That is one part
... in addition we need to look at the type of mapping we need
... I want to know what will be mapped
... I want a pointer to xslt
... and s.t. that says "what is mapped"

<its:mapping>

<its:mappingdesc>some desc</its:mappingdesc>

<its:map>

here some xslt

</its:map>

</its:mapping>

RI: If you have xslt stylesheets you would tie what to a specific version of DITA

CL: should that be another section of the WD?
... we have discussed this far enough
... we should be able to make a statement about the mapping
... we should not prescribe how the mapping is realized

YS: just a place holder that the mapping exists

<scribe> ACTION: CL and FS to decide who will edit the mapping section of the ITS implementation WD [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action02]

The deliverables

YS: we have three documents

requirements, ITS guidelines, ITS specification

YS: no editor for the guidelines yet
... AZ mentioned he would to some editor work
... I will do some editing as well of the ITS guidelines

<scribe> ACTION: YS as the initial editor for the ITS guidelines, Diane helping [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action03]

RI: please before you do editing, please read:

http://www.w3.org/International/xmlspec/002/documentation/styleguide.html

http://www.w3.org/International/xmlspec/002/documentation/xmlspec-i18n-dtd.html

http://www.w3.org/International/xmlspec/002/documentation/i18n-docs-processing.html

scribe: and follow the guidelines

FS: don't spend much time on the status section

RI: that is important only before publication

timing of publication

YS: In september, we would like to publish the first WD of the ITS specification
... the second publication of the req. document is november
... the first publication of the "ITS techniques" (before called "ITS guidelines")
... so we don't have to change our deadlines know

SR: You would need to write an "ODD2XMLSPEC.xsl"

xmlspec i18n dtd from http://www.w3.org/International/xmlspec/002/xmlspec-i18n.dtd

i18n specific elements: http://www.w3.org/International/xmlspec/002/i18n-elements.mod

http://www.w3.org/International/xmlspec/002/i18n-extensions.mod

<r12a-sophia> http://www.w3.org/International/xmlspec/002/documentation/i18n-docs-processing.html#xmlspeci18n-files

YS: let's continue that discussion by email

http://www.w3.org/TR/2003/WD-xquery-full-text-requirements-20030502/

http://www.w3.org/TR/xquery-full-text/

FS: proposal to have a focus on the ITS specification and ITS techniques
... the req document should only be updated from time to time

YS: how about the wiki editing?
... how do the keep track of the changes if we publish a new WD?
... do we have to change everything in the wiki?
... in the document with div, del, ins?
... that takes a lot of time

CL: does everybody needs to modify the req documents in the wiki?
... maybe we could say we move away from the wiki

RI: If you have a contenious subject
... there is a lot of mail discussion
... it is difficult to summarize discussions
... as for the wiki, you can see what is being talked about

FS: how to handle the ITS techniques and the ITS specification?
... also handling in the wiki? i.e. converting ODD (possibly ODD) into the wiki

YS: that is a general problem for all three documents

RI: what would you do with an image?

bugzilla example: http://www.w3.org/Bugs/Public/show_bug.cgi?id=1334

http://cgi.w3.org/cgi-bin/html2txt?url=http://www.w3.org/International/Overview.html

<SebastianR> Christian/Felix: grab http://users.ox.ac.uk/~rahtz/its.zip and see the Makefile

<SebastianR> (that is the ODD demo to see if you can reproduce)

<YvesS> YS: we will discuss requirements

<YvesS> .. and does anybody has another requests

<YvesS> ACTION: For YS to post message about meeting f2f Dec-14 to 16 (noon). [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action04]

classification of requirements

classification parameters:

1) should the req be in the techiques doc / in the specification doc?

2) is the req sensitive to the scope problem we discussed at the f2f?

http://esw.w3.org/topic/its0506ReqConstraints in spec, sensitive to scope

http://esw.w3.org/topic/its0503ReqSpan in spec, not sensitive to scope

http://esw.w3.org/topic/its0503ReqEntities part of techniques doc

http://esw.w3.org/topic/its0503ReqLangLocale part of the techniques document

http://esw.w3.org/topic/its0503ReqTermIdentification probably techniques doc, depends on how we develop it

http://esw.w3.org/topic/its0504ReqPurposeSpecMap we don't know yet

<scribe> ACTION: felix to ask w3c if there is a methodology for mapping exisiting / under development [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action05]

http://esw.w3.org/topic/its0908LinguisticMarkup we don't know

http://esw.w3.org/topic/its0504ReqCulturalAspects maybe a technique, but we don't know yet

http://esw.w3.org/topic/its0504ReqLinkedText

YS: part of the techniques, with a "?"
... good practice would be to provide an attribute to give feedback to the translator

SR: like an alt tag on an link which is specific for its

FS: so part of the specification?

http://www.w3.org/People/fsasaki/EML2005sasa0411.html

example for such a link:

<para> If you create a typing error like "strs(s)",

you will get the message

<subst>

</subst>

</xref>.<para>

discussion about the linked text requirement

http://esw.w3.org/topic/its0505ReqBidi specification

RI: this is one driver of the original ITS work
... originally we said to SSML folks that they need bidi markup for accessebility
... they asked us for a coherent way of doing that
... so we started this effort: ITS (initially)
... it would be nice to have this as part of the xml ns, but that is not likely to happen

YS: so this is part of the spec and the techniques doc

FS: And this is not part of the scope issue

http://esw.w3.org/topic/its0505Translatability part of the spec and the techniques

YS: and we do need scope

http://esw.w3.org/topic/its0505WordCount

SR: thinks like bidi are more part of i18n , most of the other stuff we talked about are part of l10n

YS: this would be a guideline / technique
... SR said that we need to make the difference between universal things (like bidi) and l10n specific things

RI: some thinks we might say "please use these tags" ..
... there might be s.t. like "please don't do this" like translatable text in attributes
... and the third category would be "here is s.t. you could use"

YS: like the ITS tag set?

RI: yes
... and we would make clear what aspect would be important

YS: back to metrics: what should it be?
... metrics does not enhance the localizer, I think

http://esw.w3.org/topic/its0505ReqAttrAndTrans

YS: this is a guideline

SR: a guideline of good practice and an instruction

http://esw.w3.org/topic/its0505ReqNamingScheme

YS: please avoid s.t. like: <Message001>Cannot open the file.</Message001>
... more and more the name are the same as the content
... or s.t. generic because they use non-xml tools for the generation of xml

RI: they should use IDs for ids, and not the name of the element

this is guidelines

http://esw.w3.org/topic/its0505ReqLocNotes

SR: this is like its:info
... you might want to say "who said that"?

YS: so that means specification, and it has to do with scoping

http://esw.w3.org/topic/its0505ReqWhiteSpaces

YS: explains the req

from the xml rec:

The value "default" signals that applications' default white-space processing modes are acceptable for this element; the value "preserve" indicates the intent that applications preserve all the white space.

YS: so this would be a guideline

http://esw.w3.org/topic/its0506ReqMultilingualDoc

YS: it is an issue for the localization process
... and a guideline

SR: It depends on how you manage the process

http://esw.w3.org/topic/its0506ReqRuby

YS: part of the specification

RI: Steve wants to have a different ruby spec
... which is not so presentation oriented
... I want to have a different level of conformance

<YvesS> .. three levels would be better

<YvesS> RI: wonder if we should separate attribute and element in scoping (even for translatablity).

RI: we don't want to provide a tag set for bad practice
... but we can show them how to get out of trouble

YS: We don't have a solution for attributes, so we can only have the element content case in the spec

http://esw.w3.org/topic/its0506ReqDateTime

SR: what is the value of knowing it is a date?
... you can just use the data type "date"
... is it different than marking up technical terms as terms

RI: it gives you the date itself
... i.e. a machine could transform it into a specific calendar etc.

YS: I put that as a guideline, and we see what will happen

<scribe> ACTION: Sebastian to introduce to the wg the l10n / i18n aspects of the TEI [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action06]

http://esw.w3.org/topic/its0509ReqNestedElements

YS: goes to the guidelines

<scribe> ACTION: SR to put a comment on http://esw.w3.org/topic/its0509ReqNestedElements in the wiki [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action07]

other business?

Summary

http://www.w3.org/TR/2005/WD-ws-i18n-20050914/

<scribe> ACTION: Felix to make proposals by mail for a shortcut for the namespace of the ITS spec wd [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action08]

action items for monday: http://www.w3.org/2005/09/19-i18n-minutes.html#ActionSummary

<scribe> ACTION: to contact Deborah A. Lapeyre (DITA commitee) about the relation between its / DITA [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action09]

action item for tuesday: http://www.w3.org/2005/09/20-i18nts-minutes.html#ActionSummary

<scribe> ACTION: RI to check for hosting the f2f near Oxford (December, 14-16 (noon)) [recorded in http://www.w3.org/2005/09/21-i18nts-minutes.html#action10]

YS: drop http://www.w3.org/2005/09/19-i18n-minutes.html#action04 and http://www.w3.org/2005/09/19-i18n-minutes.html#action03
... these are not necessary anymore

GO: a different topic: computational or "semantic" linguistic markup

close the meeting

YS: Thanks to everybody

GO: Thanks to you all
... I was happy to be able to come

I18N ITS WG F2F

21 Sep 2005

Attendees

Contents