07:04:45 <RRSAgent> RRSAgent has joined #mlw-lt
07:04:45 <RRSAgent> logging to http://www.w3.org/2012/09/25-mlw-lt-irc
07:04:49 <Zakim> Zakim has joined #mlw-lt
07:04:56 <fsasaki> meeting: MLW-LT f2f
07:04:58 <fsasaki> chair: various
07:05:06 <fsasaki> scribe: variousToo
07:05:13 <fsasaki> agenda: http://www.w3.org/International/multilingualweb/lt/wiki/PragueSep2012#25_Sept:_MLW-LT_WG_meeting_agenda
07:06:32 <daveL> daveL has joined #mlw-lt
07:13:38 <daveL> chair: felix
07:13:52 <Jirka> Jirka has joined #mlw-lt
07:14:07 <daveL> scribe: daveL
07:14:10 <tadej> tadej has joined #mlw-lt
07:14:57 <daveL> Meeting: MLW-LT face to face, Prague, 25 Feb 2012, 09.00 CET
07:15:31 <Arle> Arle has joined #mlw-lt
07:16:29 <daveL> agenda: http://www.w3.org/International/multilingualweb/lt/wiki/PragueSep2012#25_Sept:_MLW-LT_WG_meeting_agenda
07:17:31 <daveL> felix: one change to demo is continuation of session 1 will be breakout between coffee break today and lunch
07:21:42 <omstefanov> omstefanov has joined #mlw-lt
07:22:44 <fsasaki> topic: introduction
07:22:50 <Des> Des has joined #mlw-lt
07:22:54 <dF> dF has joined #mlw-lt
07:22:54 <declan> declan has joined #mlw-lt
07:23:03 <Yves_> Yves_ has joined #mlw-lt
07:23:11 <daveL> felix: this morning we will go through some basic parts of the document
07:23:37 <micha> micha has joined #mlw-lt
07:24:17 <leroy> leroy has joined #mlw-lt
07:24:19 <fsasaki> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#introduction
07:24:32 <daveL> ... starting with introduction to specification
07:24:45 <shaunm> shaunm has joined #mlw-lt
07:24:55 <mhellwig> mhellwig has joined #mlw-lt
07:25:54 <daveL> ... to look for changes that are needed 
07:26:03 <daveL> ... read intro to section 1
07:26:18 <daveL> ... need to add reference for HTML5
07:26:23 <mdelolmo> mdelolmo has joined #mlw-lt
07:26:26 <Pnietoca> Pnietoca has joined #mlw-lt
07:26:28 <Pedro> Pedro has joined #mlw-lt
07:27:20 <daveL> felix: has reference ITS requriements and localiizable DTD which influenced this document
07:27:52 <daveL> ... and references potentially unwritten best practices document
07:27:54 <fsasaki> http://www.w3.org/2011/12/mlw-lt-charter.html
07:28:00 <daveL> ... but what does this mean?
07:28:44 <daveL> ... In context of workplan, after feature freeze we hae time to add best practice document
07:29:22 <daveL> ... change the refernec to a stable wiki page for best practices.
07:29:46 <Milan> Milan has joined #mlw-lt
07:30:44 <daveL> felix: section 1.1, relation to its1.0 and new principles
07:31:20 <daveL> ... outlines what the principles needs
07:34:47 <Arle_> Arle_ has joined #mlw-lt
07:35:02 <fsasaki> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc
07:35:06 <daveL> ... notes that additional horizontal feature need not be implemented for ITS1.0 data categories
07:35:22 <giuseppe> giuseppe has joined #mlw-lt
07:36:01 <daveL> Yves: asks if we still therefore need test suite for ITS1.0 data categories
07:37:16 <daveL> daveL: yes for completeness, for those not referencing the its1.0
07:37:47 <philr> philr has joined #mlw-lt
07:38:05 <Ankit> Ankit has joined #mlw-lt
07:38:12 <daveL> felix: give brief outline of what it means to be conformant to ITS, with reference to test suite
07:38:15 <Tatiana> Tatiana has joined #mlw-lt
07:38:29 <fsasaki> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc
07:38:50 <fsasaki> http://phaedrus.scss.tcd.ie/its2.0/its-testsuite.html#translate-local-host
07:39:03 <fsasaki> http://phaedrus.scss.tcd.ie/its2.0/expected/translate/xml/translate4XmlOutput.txt
07:40:12 <daveL> felix: tomorrow we need to look a tthis in more detail
07:41:48 <daveL> omstefanov: it seems the ITS1.0 requirement may be redundant
07:42:13 <daveL> shaunm: this indicates that ITS2.0 encompasses ITS1.0
07:42:50 <fsasaki> "Where ITS 1.0 data categories are implemented in XML, the implementation must be conformant with the ITS 1.0 approach to XML to claim conformance to ITS 2.0."
07:43:02 <daveL> pedro: HTML5 add new features
07:43:21 <fsasaki> "ITS 2.0 is backwards compatibly with ITS 1.0 in terms of ITS mechanisms"
07:45:29 <omstefanov> suggest rephrasing that to ""ITS 2.0 is backwards compatible with ITS 1.0 in terms of ITS mechanisms"
07:45:35 <daveL> felix: so this last bullet of 1.1.1 will update to this
07:45:48 <daveL> felix: section 1.1.2, new principles
07:47:07 <daveL> felix: in first bullet, drop refernece to RDFa and NIF, since these are not the format for confromance
07:47:48 <daveL> ... RDFa and NIF status are correctly referenced in second bullet, they are a possible output option
07:49:00 <daveL> felix: third bullet clarifies the need for XPATH1.0, with new mechanisms for other queries, i.e. CSS and later xpath version
07:49:37 <daveL> ... but there seems no interest in CSS as a selector language, so we might drop it
07:49:56 <daveL> phil: may be using CSS selector in our implementation
07:50:16 <daveL> felix: so we may keep it, as it is optional
07:50:36 <fsasaki> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc
07:50:53 <daveL> felix: list of new data categories need to be updated, with reference to table
07:54:50 <daveL> felix: now review text in seciton 1.2
07:55:40 <fsasaki> "The increasing usage of XML as a medium for documentation-related content (e.g. DocBook and DITA as formats for writing structured documentation, well suited to computer hardware and software manuals)": should mention also HTML5
07:56:09 <daveL> jirka: need to review the last paragraph related to XML
07:56:27 <daveL> felix: agree, this needs a rewrite
07:57:45 <daveL> olaf: can we continue refining this after the meeting
07:58:15 <daveL> jan: would be helpful to reference other documents
07:58:59 <daveL> jan: a question about directionality, is vertical being discussed
07:59:43 <daveL> felix: this is being discussed elsewhere, in CSS for Asian layout
08:00:14 <daveL> olaf: suggest adding vertical layout by referring to Japanese to list of example language
08:00:34 <daveL> felix: agrees and add reference to best practice document on japanese
08:02:07 <daveL> felix: discusses examples
08:03:13 <daveL> ... but it would be good to have some html examples as well as XML in this section
08:03:43 <daveL> shaun: seems harder to come up with example with both human and machine readable aspects
08:05:28 <daveL> dave: it would be good to have some real industrial content for examples
08:05:46 <daveL> des: there is no mention of XLIFF, is that deliberate
08:06:07 <daveL> dF: XLIFF isn't a source format in the same way that XML and HTML5
08:06:30 <daveL> felix: but for example yves processes many XML as XLIFF
08:07:15 <daveL> Yves: agrees
08:08:12 <daveL> df: need to be careful defining XLIFF binding, since this may impinge of scope of XLIFF TC
08:10:23 <daveL> daveL: suggest mentioning multilanguage and bitext files
08:10:36 <daveL> df: this would be better in in usages section
08:11:17 <daveL> felix:agrees - we can have a section in 1.3 focussed on XLIFF
08:12:13 <daveL> ... currently we have users identified as schema developers, schema managers, vendors of tools.
08:12:31 <daveL> ... need to add for localisaiton workflow managers
08:12:34 <fsasaki> "1.3.1.5" workflow process manager
08:12:51 <daveL> action: dF to add section 1.3.5 on usage wby localisation workflow managers
08:13:03 <trackbot> Created ACTION-222 - Add section 1.3.5 on usage wby localisation workflow managers [on David Filip - due 2012-10-02].
08:13:45 <daveL> felix: another gorup on the table but not mentioned, that is people working with terminology and language technology
08:14:51 <daveL> dF: there might be two, one for terrmoinology and one for language technology
08:15:42 <leroy> leroy has joined #mlw-lt
08:15:44 <daveL> ... so there is a bridge to open data and ontologies and also terminologists
08:17:01 <daveL> jan: are we regarding these text analytics as separate services
08:19:13 <daveL> action: Tatiana to draft text for terminology user with Tadej
08:19:13 <trackbot> Sorry, couldn't find user - Tatiana
08:19:57 <daveL> df: we should look at the use of data categories in terminology lifecycle
08:23:58 <daveL> action: tadej to provide section on text analytics
08:23:58 <trackbot> Created ACTION-223 - Provide section on text analytics [on Tadej Štajner - due 2012-10-02].
08:24:43 <daveL> action: pedro to provide a section of MT service provider as user
08:24:43 <trackbot> Created ACTION-224 - Provide a section of MT service provider as user [on Pedro Luis Díez Orzas - due 2012-10-02].
08:25:58 <Tatiana> Tilde could also contribute to the MT service part as the consumer of ITS
08:25:59 <daveL> felix: section 1.3.2, explains the use of global and local selectors
08:26:28 <Tatiana> I mean, as a support to Pedro's paragraph ;)
08:28:09 <daveL> pedro: this section should explain a bit more clearly how meta data can be produced and consumed by different actors or processes
08:30:58 <daveL> felix: perhaps revise example from the use cases being shown today
08:31:33 <daveL> felix: 1.3.2 ways to use ITS 
08:32:52 <daveL> ... needs to still address how to extend scehma, but also how to wor with existing formats
08:33:02 <daveL> ... in particular with HTML5
08:33:20 <daveL> rrsagent, generate minutes
08:33:20 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html daveL
09:05:00 <daveL> topic: continuation of session 1
09:07:12 <Zakim> Zakim has left #mlw-lt
09:09:44 <daveL> felix: now we will review specific data categories
09:10:50 <daveL> tadej: summarises the changes to disambiguation
09:12:02 <daveL> ... concerned with superfluous information and also the lack of RDF bindings for several existing lexical repositories
09:12:24 <daveL> ... but encouraging this behaviour in repositories is a big issue.
09:13:01 <daveL> ... Also added disambiguation level. 
09:14:59 <daveL> ... Also generalised entity type to more general target type
09:16:14 <daveL> Tadej: current issues discussed on mailing list.
09:16:44 <Milan> Milan has joined #mlw-lt
09:16:49 <daveL> ... one is that the type can be inferred form the link
09:17:59 <daveL> ... but keep disambig level as optional, but allow it also to be inferred from disambig ident
09:18:38 <daveL> ... also make 'target' more specific by naming to 'disambiguation target'
09:19:55 <daveL> ... Also, wording needs some work, to make it both accessible and also accurate.
09:21:32 <Declan> Declan has joined #mlw-lt
09:22:18 <mdelolmo> mdelolmo has joined #mlw-lt
09:22:43 <daveL> jirka: comment on example that disambig level should just be literals, so don't need 'its:' prefix
09:24:00 <DomJones> DomJones has joined #mlw-lt
09:24:36 <daveL> action: Tadej to update disambiguation to chanrge name of target type and to remove level value prefix 
09:24:56 <trackbot> Created ACTION-225 - Update disambiguation to chanrge name of target type and to remove level value prefix  [on Tadej Štajner - due 2012-10-02].
09:28:27 <daveL> arle: suggest use of alternative to target, using 'category' instead, or 'class', i.e. its -disambig-class-ref
09:29:38 <daveL> daveL: does 'level' make sense
09:30:13 <daveL> Tadej: yes, well understood in language processing circles
09:30:36 <daveL> phil: perhaps use category or type
09:30:56 <daveL> tadej: perhaps use 'granularity'
09:34:08 <daveL> felix: suggest that these changes and also the descriptive text in breakout session tomorrow with Arle
09:35:34 <daveL> daveL: suggest to supplement introductory description with an example
09:35:40 <daveL> tadej: agrees
09:36:29 <daveL> felix: not time now for breakout, so perhaps introduce some other topics
09:37:32 <daveL> ... tool identification is one issue, yves to summarise
09:37:51 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
09:38:37 <daveL> Yves: we have some data categories where there is some data that is at a dcoument level and some that is local, e.g. at every segement
09:39:16 <daveL> ... so agreed overide is always complete, but still want this orthogonal tool id feature
09:40:00 <daveL> ... felix suggested a separate format based on OLIF for this
09:40:28 <fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0160.html
09:40:36 <tadej> tadej has joined #mlw-lt
09:40:45 <fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0160.html
09:40:53 <daveL> ... but own opinion that this might be a bit complex, and an in-document way of identifying tool would be attractive
09:51:25 <SebastianSkl> SebastianSkl has joined #mlw-lt
09:52:50 <daveL> felix: this definitely needs a breakout session
09:53:05 <daveL> dF: indicate he will lead this breakout
09:54:25 <daveL> felix: examples in the spec - this needs some work and shaun volunteered to look at that
09:55:11 <daveL> felix: we also need schema fragements to integrate into XML and HTML5 (jirka's action)
09:59:52 <daveL> felix: we will have a breakout session on provenance tommorrow, led by dave. Later this topic will be handed over to Phil, though he is leaving early 
09:59:52 <leroy_> leroy_ has joined #mlw-lt
10:01:06 <daveL> pedro: presents a quick overview of use of readiness
10:01:49 <Yves_> proposal is attached here: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0025.html
10:03:29 <daveL> ... the advantage of this is that client is more independent from providers
10:05:38 <daveL> pedro: there is a concrete need for this, but nowwhere to put this
10:09:20 <daveL> jan: invites us to look at microsoft translator API that offers some potential for this
10:27:29 <DomJones> DomJones has left #mlw-lt
11:26:23 <Yves_> test
11:29:36 <Arle> Arle has joined #mlw-lt
11:31:22 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
11:32:22 <Arle> Scribe: Arle
11:32:40 <Arle> Felix: This next section is to who to the project officer that we are making process.
11:33:14 <Arle> ..Arle will fill in templates to show what we are doing.
11:34:16 <giuseppe> giuseppe has joined #mlw-lt
11:34:22 <mdelolmo> mdelolmo has joined #mlw-lt
11:35:07 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
11:35:19 <fsasaki> rrsagent, draft minutes
11:35:19 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
11:35:29 <fsasaki> rrsagent, make log public
11:35:31 <fsasaki> rrsagent, draft minutes
11:35:31 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
11:36:29 <fsasaki> topic: implementation enlaso
11:36:35 <fsasaki> presentation from yves
11:39:47 <omstefanov> omstefanov has joined #mlw-lt
11:45:09 <Arle> Yves: Question about what to do with multiple keywords.
11:45:32 <Milan> Milan has joined #mlw-lt
11:45:40 <Arle> ..Conducted a demo showing that non-translatable content was in fact not translated.
11:46:25 <Arle> ..Showed slide on Translation Package Creation
11:46:52 <mhellwig> mhellwig has joined #mlw-lt
11:47:32 <leroy> leroy has joined #mlw-lt
11:47:57 <Tatiana> Tatiana has joined #mlw-lt
11:48:57 <Des> Des has joined #mlw-lt
11:52:07 <Arle> ..ist:storageSizeEncoding provides information not otherwise available in XLIFF 1.2 concerning the encoding.
11:53:10 <Arle> s\ist:\its:\
11:53:44 <Arle> ..Third use case: Moses Translation (M4Loc). Essentially identical to the case with Microsoft Translator.
11:54:03 <Arle> ..(Used imitation of M4Loc in the demo)
11:56:34 <Arle> ..Last use case is a bit different. It uses the categories after extraction, not to make a kit, but to use them directly, to validate things. I hope to add locQuality later.
11:57:08 <tadej> tadej has joined #mlw-lt
11:57:16 <Arle> ..This is quality check. It uses the same extraction mechanism and preserve space is important. Need id value.
11:58:48 <Arle> ..Finds problems in source as well as target.
11:59:24 <Arle> ..The UI of CheckMate lets you decide whether to use the ITS categories in some cases.
12:00:19 <Declan> Declan has joined #mlw-lt
12:00:21 <Arle> Felix: Question: The M4Loc bit was made up, didn't actually use Moses. Is it something we could leverage since this is a workflow that does half the job?
12:00:42 <Arle> ..I'm just wondering if we can use this with Moses.
12:00:59 <Arle> Milan: I think we could change the M4Loc process to use ITS and it will be very helpful.
12:01:17 <Arle> Des: Storage Size was an example. Just it get propagated through to the translator?
12:01:37 <Arle> Yves: Yes. CheckMate doesn't modify the file. We could allow that.
12:02:17 <Arle> Yves: For allowed characters, we don't use the schema. We use a subset in Java Regex. I don't intend to support the entire XML regex. It's a dependency we don't want.
12:02:42 <Arle> ..We do everything else with it, but if you use more of a regex than what we can handle, you will get an error.
12:02:58 <Arle> Jirka: I think there is a Saxon library that might convert this. You should look into it.
12:03:14 <Arle> Felix: Is there a concrete action following for M4Loc from this?
12:03:46 <Arle> Milan: It looks much easier now, so we should analyze the new version of these tools.
12:03:56 <Arle> Yves: You'll get HTML5 support by going this route.
12:04:16 <Arle> ..We can also add information about the domain. It might be useful for choosing the process in MT.
12:05:25 <Arle> David: There is a potential to expand what M4Loc parses. Not just inlines, but the domain would be an obvious thing. Property bugs could be another thing. It depends on the MT consumer.
12:05:55 <Arle> ..Asia online could consume property bugs. It would be nice to add terminology and entity markup in M4Loc.
12:06:30 <Arle> Declan: We might be able to releverage some of the M4Loc stuff in what we are doing to avoid duplication of effort.
12:06:38 <Arle> David: It would be great if you could consume it.
12:07:06 <Arle> Felix: You don't need a separate filter for translate from Okapi as long as you can consume it.
12:07:50 <Arle> David: Yves is working on the XLIFF 2.0 library, which will make switching easy when the time comes for it.
12:08:45 <Arle> Yves: We do have some XLIFF 2.0 stuff done. But we don't want to fall back on everyone using Okapi because we need several implementations. It helps make the standard better by seeing what problems they run into. It is important to have multiple implementations.
12:09:18 <Arle> Felix: That's not a W3C process question: We can have "fake" implementations, but we need real ones.
12:10:36 <Arle> Felix: We didn't address the keyword mapping topic. Let's put that down for later.
12:10:49 <fsasaki> action: felix to come back to keyword mapping issue in domain
12:10:49 <trackbot> Created ACTION-226 - Come back to keyword mapping issue in domain [on Felix Sasaki - due 2012-10-02].
12:11:19 <fsasaki> topic: HTML5+ITS to XHTML+ITS convertor
12:11:26 <fsasaki> rrsagent, draft minutes
12:11:26 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
12:12:02 <Arle> Milan: Is there a new version of Okapi with this?
12:12:15 <Arle> Yves: The HTML5 branch in the GIT repository has it.
12:12:23 <Arle> Des: Will it move into the dev branch?
12:12:29 <Arle> Yves: Later on.
12:16:03 <Jirka> https://github.com/kosek/html5-its-tools
12:17:23 <Arle> Topic: Jirka's demo
12:17:38 <Arle> Arle: can it convert back from XHTML to HTML5?
12:17:55 <Arle> Jirka: Not currently, but it wouldn't be hard.
12:18:05 <Arle> Felix: It might be useful to Pedro if it did.
12:19:10 <Arle> Shaun: If there is no ITS target information in the target file, do you have to convert back?
12:19:29 <Arle> ..It should take only a few lines of XSLT. It's not difficult.
12:20:06 <Sebastian> Sebastian has joined #mlw-lt
12:20:09 <Arle> Pedro: The transition to HTML5 will take some time and this will help.
12:20:37 <Arle> Yves: This was *extremely* useful to me. If you are working with Java, using validator.nu is the natural way.
12:20:58 <Arle> Felix: This validator.nu is used by the W3C's own validator.
12:26:55 <Arle> Jirka: for HTML5+ITS there is web and command line versions. If there is interest, I can make it accessible through university website when stable.
12:27:12 <Arle> Felix: This will become part of the W3C validator once stable.
12:27:26 <Arle> Jirka: Before that, I can find a server and make it available.
12:27:50 <Arle> ..It will help us catch typos in examples.
12:29:17 <dF> dF has joined #mlw-lt
12:30:26 <fsasaki> http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-schematron-constraints
12:30:28 <Arle> Felix: For ITS 1.0 you made Schematron rules to check all sorts of things. I'm not sure if people are familiar with that.
12:30:54 <Arle> ..See the link I posted. These are checks that go well beyond schema checks.
12:31:04 <Arle> ..E.g., cooccurrence constraints, etc.
12:31:38 <Arle> ..Could the Schematron be integrated into the W3C validator?
12:32:29 <Arle> Jirka: I'll need to check on that.
12:32:57 <fsasaki> topic: CMS-to-TMS and Online MT System Readiness prototype
12:33:26 <Arle> Topic: CMS to TMS and Online TM System
12:35:14 <fsasaki> rrsagent, draft minutes
12:35:14 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
12:35:22 <fsasaki> s/topic: CMS-to-TMS and Online MT System Readiness prototype//
12:35:23 <fsasaki> rrsagent, draft minutes
12:35:23 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
12:37:31 <Arle> Pedro: This features Drupal integration with Cocomore for the showcase.
12:39:49 <Milan> Milan has joined #mlw-lt
12:48:31 <Arle> Felix: These are hand-made examples for now, right?
12:48:38 <Arle> Pablo: Yes.
12:52:09 <Arle> ..The implementation of translate allows CAT tool users to see the content, but not to change it.
12:54:19 <Ankit> Ankit has joined #mlw-lt
12:54:51 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
12:56:02 <Arle> s/Pablo:/Mauricio:/
12:59:50 <Arle> Felix: When will there be a prototype?
13:00:26 <Arle> Pedro: Here there are three parts. The first is the Drupal connection. We have checked our web service. Today or tomorrow I hope that we can ramp up but it has been tested.
13:00:41 <Arle> ..The second is the engine for normalization. That will be done in October, in a couple of weeks.
13:00:57 <Arle> ..The third are the effects in the localization platform. Everything has to be ready before the end of December.
13:01:36 <Arle> Felix: If you look at the description of work we have until next year. But see how Yves is implementing while we are defining and providing feedback.
13:02:33 <Arle> ..You are working in a waterfall mode, waiting for the definition to be complete. For example, the <meta> tag has content, so it wouldn't work. The waterfall model wouldn't catch that early on, otherwise you don't see the errors until later on.
13:03:35 <Arle> ..I hope you can move towards Yves' model to catch errors early on.
13:04:33 <philr> ITS 2.0 Specification says that Provenance category will be updated in next version of the spec. Is this still the case? Has Provenance category been dropped?
13:05:45 <Arle> Felix: It is really useful to use a feature prototype model.
13:06:39 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
13:07:14 <Pnietoca> https://www.w3.org/International/multilingualweb/lt/wiki/Online_MT_System_Internationalization_Project_Information_Metadata
13:15:50 <Arle> Felix: We need to start contributing test cases.
13:16:02 <Arle> Jan: That will help those interested in that to start getting involved.
13:16:38 <Arle> Des: In the first use case, why did you go first to XML, then to XLIFF, then to HTML5? HTML5 doesn't seem to be an optimized interchange format?
13:17:02 <Arle> ..When you don't have a CMS, there are valid reasons to use HTML5. But when you do, why not go straight to XML?
13:17:33 <Arle> ..You obviously have a reason since you considered them.
13:18:13 <Arle> Moritz: We started with XML, moved to XLIFF, and that was hard. And then Felix asked for more HTML5 implementations, so we thought we'd try that. We found XLIFF was a pain, so we could move back to XML.
13:19:29 <Arle> Felix: While authors may want to work with HTML5, internally use what works best. I don't think corporations are using HTML5-based workflows right now.
13:19:52 <Arle> ..I've seen examples of XLIFF, but see what works for you. Make sure it is useful for you internally.
13:20:34 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
13:20:51 <Arle> Des: It seems to me that this is going Publishing → Localization → Publishing by using HTML5. It may work for you though.
13:21:17 <Arle> Dave: Jan told us yesterday, however, that more authoring is in HTML5.
13:21:49 <Arle> Pedro: Normally we have discussion between integrators, the client, and us. Perhaps in that case someone would have asked why we use HTML for a roundtrip like this.
13:23:23 <Arle> Felix: It would't work without HTML5 support, but we didn't discuss any specific HTML5 application. Yves showed how HTML5 could enter the chain, be converted to XLIFF, etc.
13:23:52 <Arle> ..But I'm not sure if HTML5 should serve for the whole chain. XLIFF would seem to make more sense.
13:24:07 <Arle> David: HTML5 lacks the mechanism for bitext translations.
13:24:41 <Arle> ..I thought Tektronix donated their XLIFF-to-Drupal extractor to an open-source project, so this was taken care of.
13:25:05 <Arle> Felix: You don't have to use HTML5, so please look at it and do what you need to that makes sense.
13:25:36 <Arle> Des: I think that we need to distinguish between authoring and publication formats on the one hand and interchange formats on the other. We need to consider what is best practice.
13:26:02 <Arle> ..There is a lot that isn't possible in HTML5. I think we need to consider what is best practice and what we should promote.
13:26:39 <Arle> Dave: Smaller clients running their own websites might have only an off-the-shelf Drupal and don't want to set up XLIFF and so forth.
13:26:57 <Arle> ..So that is one market, different from the enterprise client.
13:27:35 <Arle> Felix: You can consider using XLIFF in your process, or might continue as you are and make it clear where your workflow applies with a good description.
13:27:51 <Arle> Dave: We need clear business cases.
13:28:18 <Arle> Moritz: Mauricio and I should knock this out tonight.
13:28:29 <Arle> Felix: Include David F. in this discussion.
13:29:16 <Arle> Pedro: Concerning readiness, there are a few of us who see this as very useful (Dave, Yves, Cocomore, and us). In the case that you can choose where to put that information, is more political than technical.
13:29:37 <Arle> ..In the use case of HTML with no API, wrapper, etc., you might put it right in the HTML material.
13:29:50 <Arle> ..We need to push this hard right now since it needs to be ready by November.
13:30:48 <Arle> Felix: Let me point to what Yves and Shaun did: they implemented features they liked and discussed them in the ITS discussion forum. Some of their ideas are now being implemented.
13:31:16 <Arle> ..Implement things, but not privately, even if they don't make it into ITS 2.0, so that others can see them.
13:32:46 <Arle> Felix: One reason for an implementation-driven approach is that it allows people to see what is being thought of and tried.
13:33:09 <Arle> David: I see why you want readiness in HTML5, but most clients don't want that information published.
13:33:28 <Arle> Dave: One thing we haven't discussed much is the need to strip information.
13:36:07 <Arle> Shaun: For ITS 2.0 we use DocBook and Mallard. Before we had tools, the translators had to work directly in those files.
13:36:16 <Arle> ..Our translators use PO files.
13:36:29 <fsasaki> action: phil to move provenance forward (off-line discussion at prague f2f) 
13:36:38 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
13:36:39 <trackbot> Created ACTION-227 - Move provenance forward (off-line discussion at prague f2f)  [on Phil Ritchie - due 2012-10-02].
13:38:03 <Arle> ..Colleague created XML2PO, but it created problems for us in some ways (despite being a step forward). There were issues for us concerning how to map the XML structure to PO.
13:38:27 <Arle> ..I redid this as ITS Tool when I discovered it.
13:40:22 <Arle> ..ITS couldn't provide all the information needed by PO. We added a number of extensions, some of which have now gone into the ITS 2.0.
13:41:36 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
13:41:51 <fsasaki> topic: ITS Tool
13:41:53 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
13:45:12 <Arle> Shaun: ITS tool ships with a set of rules and uses them to parse files.
13:46:04 <fsasaki> Arle, maybe for the slides: ITS tool ships with a set of default rules for various formats and uses these for PO file generation
14:16:01 <Pnietoca> Pnietoca has joined #mlw-lt
14:19:11 <Declan> Declan has joined #mlw-lt
14:19:35 <Arle> Phil: I'm going to show our work on review.
14:24:44 <Arle> Phil: Our system is something like CheckMate, doing automated checks. We added a browser client that works both off and online, using AJAX to post back to a server, capturing provenance.
14:25:17 <Arle> ..Allowed use of audit trails to find quality problems in other documents.
14:26:23 <Milan> Milan has joined #mlw-lt
14:26:57 <Arle> ..Tool focuses on sentences where we expect there may be problems.
14:27:45 <Arle> ..Allow tagging error types in the UI. The process alters the DOM in HTML and puts the errors into stand-off markup.
14:28:12 <Milan> Milan has joined #mlw-lt
14:29:06 <Arle> ..By editing the DOM, we can save the file with the markup.
14:29:15 <Arle> ..It doesn't require copying and pasting.
14:29:59 <Arle> Des: What are the constraints? Can you use any HTML file?
14:30:45 <Arle> Phil: It's browser-independent. It doesn't have any dependencies because when we do the transformation from XLIFF everything is wired into the file and all you have to do is references some standard JQuery/JavaScript libraries.
14:31:15 <tadej> tadej has joined #mlw-lt
14:31:30 <Arle> ..Everything is embedded in the HTML5 when it is converted from XLIFF.
14:32:47 <Arle> Dave: Will discuss simple MT.
14:32:48 <fsasaki> topic: Simple Segment Machine Translation Use Case
14:48:58 <fsasaki> http://about.validator.nu/htmlparser/
14:49:25 <fsasaki> "The jar file contains sample main() entry points:"
14:51:05 <Arle> Pedro: With MT there should also be CAT tools and human at the segment level. What strategy did you take to addressing metadata that applies to more than one segment/level.
14:51:14 <Arle> s/level./level?/
14:51:25 <fsasaki> above library can be used not only for validation, but also for parsing and e.g. creating various serializations
14:51:31 <Arle> Dave: Before we call the service, we have to do a full parse down to the segment level.
14:51:58 <Arle> Pedro: In our case we don't do the segmentation. The CAT tool does, because it has to be consistent with the TM.
14:52:08 <Arle> ..It is an external service to us.
14:52:32 <Arle> Dave: We do it because we want to focus on the MT and still have control. But we are't working with a CAT tool.
14:53:52 <Arle> David: It's a small loop here, so we can do it this way. But in a bigger process, you have to make sure these things are handled appropriately early on. You will need ways to reverse the process too, at the end.
14:54:48 <Arle> Pedro: Some things are handled at the segment level, but others apply to the document or sections.
14:55:16 <Arle> David: in some cases segment-by-segment is too slow.
14:55:55 <Arle> ..You won't want to rely on the MT system for segmentation if you have to use TM.
14:58:00 <Arle> Declan: We need to know whether the MT service would ever get a full document or whether it would only get pieces. In the past we have usually dealt with sub-paragraph segments.
15:00:02 <Arle> Felix: Domain-mapping here used space separated rather than comma-separated. We need to make sure there is consistency here.
15:00:45 <Arle> Yves: I wanted to know how to map domains in HTML. The problem was the format of the keywords in META. Currently we point to a node and expect a string to map to it, but we don't have an internal syntax for the contents. We need to specify this.
15:02:48 <Declan> Declan has joined #mlw-lt
15:04:19 <Arle> David: Talking about XLIFF used to provide CMS-TMS roundtrip.
15:04:29 <Arle> ..Proxy problem means we can't show the demo.
15:04:57 <Arle> ..We initiate projects on the CMS. Want to show examples of how the XLIFF half works.
15:05:25 <Arle> ..Note this is nothing like a traditional TMS. It is a service-oriented architecture. Previously it had an unrestricted number of specialized agents.
15:06:02 <Arle> ..It routes XLIFF between the specialized agents.
15:06:16 <Arle> ..It is a closed localization loop, before the CMS enters.
15:06:37 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
15:06:42 <Arle> ..The idea is to use this modularized system based on XLIFF I/O.
15:06:58 <fsasaki> topic: SOLAS CMS-LION ITS
15:07:01 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki
15:08:41 <Arle> ..Some times the dumb components need to be clever.
15:08:51 <Arle> s/Some times/Sometimes/
15:09:21 <Arle> ..We can start processes from an arbitrary XLIFF file, or from Okapi.
15:12:10 <Arle> ..We work with Moravia and M4Loc (Moses). Moses uses text only, but M4Loc adds XLIFF capabilities for Moses. We then pass on MT-relevant metadata. They will add support to the M4Loc project.
15:12:28 <leroy> leroy has joined #mlw-lt
15:13:50 <Arle> .. We might want to add support for provenance that Yves doesn't need. For example, if we want to integrate multiple MT systems, we would need that capability.
15:15:21 <Arle> Yves: We need a consistent way of mapping the data categories to XLIFF.
15:15:52 <Arle> Dave: The co-chairs need to take the lead in this.
15:16:27 <Arle> David: Does this belong to XLIFF or ITS? Maybe this is a good reason why Moritz and Pedro did not use XLIFF for an exchange mechanism.
15:16:50 <Arle> ..We need a single XLIFF+ITS method.
15:18:00 <Arle> Felix: Once the metadata is stable in November, we need to deal with this. We can publish as many best practice documents as we want, so we can have an ITS to XLIFF mapping.
15:19:06 <Arle> topic: Cocomore demonstration
15:22:08 <Arle> Moritz: I'd like to make a case for readiness. We need to provide a way for the user to be able to trigger processes upon certain conditions. For examples, we send things off to Enrycher, Linguaserve.
15:22:24 <Arle> ..Even if readiness isn't a data category, it should be a best practice to help smaller enterprises.
15:24:09 <Arle> ..We let users add local metadata.
15:24:21 <Arle> Dave: Is that an existing HTML editor?
15:24:26 <Arle> Moritz: Yes.
15:25:02 <Arle> ..We have trouble knowing how to make translate global for the end user in an intelligible fashion.
15:25:42 <Arle> Felix: Is localization note only global for the whole document?
15:25:56 <Arle> Moritz: for the content node, yes.
15:26:04 <Arle> Felix: That doesn't let you mark pieces of nodes.
15:26:14 <Arle> Moritz: We've not implemented that but it's something to consider.
15:26:42 <Arle> ..Implementing all this required "breaking Drupal’s back a bit”. It's still a bit too complex, but we're working on this.
15:27:19 <Arle> ..Our process in the CMS should be linked to best practice for readiness.
15:28:47 <Arle> Olaf-Michael: Does it compare metadata in source and target?
15:29:00 <Arle> Moritz: It's half automatic at this point. We need to see what we can leave in.
15:29:15 <leroy> leroy has joined #mlw-lt
15:29:43 <Arle> Serge: This targets Drupal, but what about the other 1200+ CMS products?
15:30:18 <Arle> Felix: Because we don't have infinite funding, we are focusing on an open-source CMS, hoping that it can be reused. This is just the start and we want it in open source.
15:30:34 <Arle> Moritz: We will provide these as Drupal modules for others to use.
15:31:05 <Arle> Yves: The interface with translation will be standardized, and not tied to Linguaserve?
15:31:34 <Arle> Moritz: For the showcase, we are focusing on Linguaserve, but we will go wider.
15:32:09 <Arle> Felix: Adjourn for today.
15:33:23 <Arle> rrsagent, draft minutes
15:33:23 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html Arle
16:30:59 <RRSAgent> I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki