07:04:45 RRSAgent has joined #mlw-lt 07:04:45 logging to http://www.w3.org/2012/09/25-mlw-lt-irc 07:04:49 Zakim has joined #mlw-lt 07:04:56 meeting: MLW-LT f2f 07:04:58 chair: various 07:05:06 scribe: variousToo 07:05:13 agenda: http://www.w3.org/International/multilingualweb/lt/wiki/PragueSep2012#25_Sept:_MLW-LT_WG_meeting_agenda 07:06:32 daveL has joined #mlw-lt 07:13:38 chair: felix 07:13:52 Jirka has joined #mlw-lt 07:14:07 scribe: daveL 07:14:10 tadej has joined #mlw-lt 07:14:57 Meeting: MLW-LT face to face, Prague, 25 Feb 2012, 09.00 CET 07:15:31 Arle has joined #mlw-lt 07:16:29 agenda: http://www.w3.org/International/multilingualweb/lt/wiki/PragueSep2012#25_Sept:_MLW-LT_WG_meeting_agenda 07:17:31 felix: one change to demo is continuation of session 1 will be breakout between coffee break today and lunch 07:21:42 omstefanov has joined #mlw-lt 07:22:44 topic: introduction 07:22:50 Des has joined #mlw-lt 07:22:54 dF has joined #mlw-lt 07:22:54 declan has joined #mlw-lt 07:23:03 Yves_ has joined #mlw-lt 07:23:11 felix: this morning we will go through some basic parts of the document 07:23:37 micha has joined #mlw-lt 07:24:17 leroy has joined #mlw-lt 07:24:19 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#introduction 07:24:32 ... starting with introduction to specification 07:24:45 shaunm has joined #mlw-lt 07:24:55 mhellwig has joined #mlw-lt 07:25:54 ... to look for changes that are needed 07:26:03 ... read intro to section 1 07:26:18 ... need to add reference for HTML5 07:26:23 mdelolmo has joined #mlw-lt 07:26:26 Pnietoca has joined #mlw-lt 07:26:28 Pedro has joined #mlw-lt 07:27:20 felix: has reference ITS requriements and localiizable DTD which influenced this document 07:27:52 ... and references potentially unwritten best practices document 07:27:54 http://www.w3.org/2011/12/mlw-lt-charter.html 07:28:00 ... but what does this mean? 07:28:44 ... In context of workplan, after feature freeze we hae time to add best practice document 07:29:22 ... change the refernec to a stable wiki page for best practices. 07:29:46 Milan has joined #mlw-lt 07:30:44 felix: section 1.1, relation to its1.0 and new principles 07:31:20 ... outlines what the principles needs 07:34:47 Arle_ has joined #mlw-lt 07:35:02 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc 07:35:06 ... notes that additional horizontal feature need not be implemented for ITS1.0 data categories 07:35:22 giuseppe has joined #mlw-lt 07:36:01 Yves: asks if we still therefore need test suite for ITS1.0 data categories 07:37:16 daveL: yes for completeness, for those not referencing the its1.0 07:37:47 philr has joined #mlw-lt 07:38:05 Ankit has joined #mlw-lt 07:38:12 felix: give brief outline of what it means to be conformant to ITS, with reference to test suite 07:38:15 Tatiana has joined #mlw-lt 07:38:29 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc 07:38:50 http://phaedrus.scss.tcd.ie/its2.0/its-testsuite.html#translate-local-host 07:39:03 http://phaedrus.scss.tcd.ie/its2.0/expected/translate/xml/translate4XmlOutput.txt 07:40:12 felix: tomorrow we need to look a tthis in more detail 07:41:48 omstefanov: it seems the ITS1.0 requirement may be redundant 07:42:13 shaunm: this indicates that ITS2.0 encompasses ITS1.0 07:42:50 "Where ITS 1.0 data categories are implemented in XML, the implementation must be conformant with the ITS 1.0 approach to XML to claim conformance to ITS 2.0." 07:43:02 pedro: HTML5 add new features 07:43:21 "ITS 2.0 is backwards compatibly with ITS 1.0 in terms of ITS mechanisms" 07:45:29 suggest rephrasing that to ""ITS 2.0 is backwards compatible with ITS 1.0 in terms of ITS mechanisms" 07:45:35 felix: so this last bullet of 1.1.1 will update to this 07:45:48 felix: section 1.1.2, new principles 07:47:07 felix: in first bullet, drop refernece to RDFa and NIF, since these are not the format for confromance 07:47:48 ... RDFa and NIF status are correctly referenced in second bullet, they are a possible output option 07:49:00 felix: third bullet clarifies the need for XPATH1.0, with new mechanisms for other queries, i.e. CSS and later xpath version 07:49:37 ... but there seems no interest in CSS as a selector language, so we might drop it 07:49:56 phil: may be using CSS selector in our implementation 07:50:16 felix: so we may keep it, as it is optional 07:50:36 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#datacategories-defaults-etc 07:50:53 felix: list of new data categories need to be updated, with reference to table 07:54:50 felix: now review text in seciton 1.2 07:55:40 "The increasing usage of XML as a medium for documentation-related content (e.g. DocBook and DITA as formats for writing structured documentation, well suited to computer hardware and software manuals)": should mention also HTML5 07:56:09 jirka: need to review the last paragraph related to XML 07:56:27 felix: agree, this needs a rewrite 07:57:45 olaf: can we continue refining this after the meeting 07:58:15 jan: would be helpful to reference other documents 07:58:59 jan: a question about directionality, is vertical being discussed 07:59:43 felix: this is being discussed elsewhere, in CSS for Asian layout 08:00:14 olaf: suggest adding vertical layout by referring to Japanese to list of example language 08:00:34 felix: agrees and add reference to best practice document on japanese 08:02:07 felix: discusses examples 08:03:13 ... but it would be good to have some html examples as well as XML in this section 08:03:43 shaun: seems harder to come up with example with both human and machine readable aspects 08:05:28 dave: it would be good to have some real industrial content for examples 08:05:46 des: there is no mention of XLIFF, is that deliberate 08:06:07 dF: XLIFF isn't a source format in the same way that XML and HTML5 08:06:30 felix: but for example yves processes many XML as XLIFF 08:07:15 Yves: agrees 08:08:12 df: need to be careful defining XLIFF binding, since this may impinge of scope of XLIFF TC 08:10:23 daveL: suggest mentioning multilanguage and bitext files 08:10:36 df: this would be better in in usages section 08:11:17 felix:agrees - we can have a section in 1.3 focussed on XLIFF 08:12:13 ... currently we have users identified as schema developers, schema managers, vendors of tools. 08:12:31 ... need to add for localisaiton workflow managers 08:12:34 "1.3.1.5" workflow process manager 08:12:51 action: dF to add section 1.3.5 on usage wby localisation workflow managers 08:13:03 Created ACTION-222 - Add section 1.3.5 on usage wby localisation workflow managers [on David Filip - due 2012-10-02]. 08:13:45 felix: another gorup on the table but not mentioned, that is people working with terminology and language technology 08:14:51 dF: there might be two, one for terrmoinology and one for language technology 08:15:42 leroy has joined #mlw-lt 08:15:44 ... so there is a bridge to open data and ontologies and also terminologists 08:17:01 jan: are we regarding these text analytics as separate services 08:19:13 action: Tatiana to draft text for terminology user with Tadej 08:19:13 Sorry, couldn't find user - Tatiana 08:19:57 df: we should look at the use of data categories in terminology lifecycle 08:23:58 action: tadej to provide section on text analytics 08:23:58 Created ACTION-223 - Provide section on text analytics [on Tadej Štajner - due 2012-10-02]. 08:24:43 action: pedro to provide a section of MT service provider as user 08:24:43 Created ACTION-224 - Provide a section of MT service provider as user [on Pedro Luis Díez Orzas - due 2012-10-02]. 08:25:58 Tilde could also contribute to the MT service part as the consumer of ITS 08:25:59 felix: section 1.3.2, explains the use of global and local selectors 08:26:28 I mean, as a support to Pedro's paragraph ;) 08:28:09 pedro: this section should explain a bit more clearly how meta data can be produced and consumed by different actors or processes 08:30:58 felix: perhaps revise example from the use cases being shown today 08:31:33 felix: 1.3.2 ways to use ITS 08:32:52 ... needs to still address how to extend scehma, but also how to wor with existing formats 08:33:02 ... in particular with HTML5 08:33:20 rrsagent, generate minutes 08:33:20 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html daveL 09:05:00 topic: continuation of session 1 09:07:12 Zakim has left #mlw-lt 09:09:44 felix: now we will review specific data categories 09:10:50 tadej: summarises the changes to disambiguation 09:12:02 ... concerned with superfluous information and also the lack of RDF bindings for several existing lexical repositories 09:12:24 ... but encouraging this behaviour in repositories is a big issue. 09:13:01 ... Also added disambiguation level. 09:14:59 ... Also generalised entity type to more general target type 09:16:14 Tadej: current issues discussed on mailing list. 09:16:44 Milan has joined #mlw-lt 09:16:49 ... one is that the type can be inferred form the link 09:17:59 ... but keep disambig level as optional, but allow it also to be inferred from disambig ident 09:18:38 ... also make 'target' more specific by naming to 'disambiguation target' 09:19:55 ... Also, wording needs some work, to make it both accessible and also accurate. 09:21:32 Declan has joined #mlw-lt 09:22:18 mdelolmo has joined #mlw-lt 09:22:43 jirka: comment on example that disambig level should just be literals, so don't need 'its:' prefix 09:24:00 DomJones has joined #mlw-lt 09:24:36 action: Tadej to update disambiguation to chanrge name of target type and to remove level value prefix 09:24:56 Created ACTION-225 - Update disambiguation to chanrge name of target type and to remove level value prefix [on Tadej Štajner - due 2012-10-02]. 09:28:27 arle: suggest use of alternative to target, using 'category' instead, or 'class', i.e. its -disambig-class-ref 09:29:38 daveL: does 'level' make sense 09:30:13 Tadej: yes, well understood in language processing circles 09:30:36 phil: perhaps use category or type 09:30:56 tadej: perhaps use 'granularity' 09:34:08 felix: suggest that these changes and also the descriptive text in breakout session tomorrow with Arle 09:35:34 daveL: suggest to supplement introductory description with an example 09:35:40 tadej: agrees 09:36:29 felix: not time now for breakout, so perhaps introduce some other topics 09:37:32 ... tool identification is one issue, yves to summarise 09:37:51 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 09:38:37 Yves: we have some data categories where there is some data that is at a dcoument level and some that is local, e.g. at every segement 09:39:16 ... so agreed overide is always complete, but still want this orthogonal tool id feature 09:40:00 ... felix suggested a separate format based on OLIF for this 09:40:28 http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0160.html 09:40:36 tadej has joined #mlw-lt 09:40:45 http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0160.html 09:40:53 ... but own opinion that this might be a bit complex, and an in-document way of identifying tool would be attractive 09:51:25 SebastianSkl has joined #mlw-lt 09:52:50 felix: this definitely needs a breakout session 09:53:05 dF: indicate he will lead this breakout 09:54:25 felix: examples in the spec - this needs some work and shaun volunteered to look at that 09:55:11 felix: we also need schema fragements to integrate into XML and HTML5 (jirka's action) 09:59:52 felix: we will have a breakout session on provenance tommorrow, led by dave. Later this topic will be handed over to Phil, though he is leaving early 09:59:52 leroy_ has joined #mlw-lt 10:01:06 pedro: presents a quick overview of use of readiness 10:01:49 proposal is attached here: http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Sep/0025.html 10:03:29 ... the advantage of this is that client is more independent from providers 10:05:38 pedro: there is a concrete need for this, but nowwhere to put this 10:09:20 jan: invites us to look at microsoft translator API that offers some potential for this 10:27:29 DomJones has left #mlw-lt 11:26:23 test 11:29:36 Arle has joined #mlw-lt 11:31:22 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 11:32:22 Scribe: Arle 11:32:40 Felix: This next section is to who to the project officer that we are making process. 11:33:14 ..Arle will fill in templates to show what we are doing. 11:34:16 giuseppe has joined #mlw-lt 11:34:22 mdelolmo has joined #mlw-lt 11:35:07 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 11:35:19 rrsagent, draft minutes 11:35:19 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 11:35:29 rrsagent, make log public 11:35:31 rrsagent, draft minutes 11:35:31 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 11:36:29 topic: implementation enlaso 11:36:35 presentation from yves 11:39:47 omstefanov has joined #mlw-lt 11:45:09 Yves: Question about what to do with multiple keywords. 11:45:32 Milan has joined #mlw-lt 11:45:40 ..Conducted a demo showing that non-translatable content was in fact not translated. 11:46:25 ..Showed slide on Translation Package Creation 11:46:52 mhellwig has joined #mlw-lt 11:47:32 leroy has joined #mlw-lt 11:47:57 Tatiana has joined #mlw-lt 11:48:57 Des has joined #mlw-lt 11:52:07 ..ist:storageSizeEncoding provides information not otherwise available in XLIFF 1.2 concerning the encoding. 11:53:10 s\ist:\its:\ 11:53:44 ..Third use case: Moses Translation (M4Loc). Essentially identical to the case with Microsoft Translator. 11:54:03 ..(Used imitation of M4Loc in the demo) 11:56:34 ..Last use case is a bit different. It uses the categories after extraction, not to make a kit, but to use them directly, to validate things. I hope to add locQuality later. 11:57:08 tadej has joined #mlw-lt 11:57:16 ..This is quality check. It uses the same extraction mechanism and preserve space is important. Need id value. 11:58:48 ..Finds problems in source as well as target. 11:59:24 ..The UI of CheckMate lets you decide whether to use the ITS categories in some cases. 12:00:19 Declan has joined #mlw-lt 12:00:21 Felix: Question: The M4Loc bit was made up, didn't actually use Moses. Is it something we could leverage since this is a workflow that does half the job? 12:00:42 ..I'm just wondering if we can use this with Moses. 12:00:59 Milan: I think we could change the M4Loc process to use ITS and it will be very helpful. 12:01:17 Des: Storage Size was an example. Just it get propagated through to the translator? 12:01:37 Yves: Yes. CheckMate doesn't modify the file. We could allow that. 12:02:17 Yves: For allowed characters, we don't use the schema. We use a subset in Java Regex. I don't intend to support the entire XML regex. It's a dependency we don't want. 12:02:42 ..We do everything else with it, but if you use more of a regex than what we can handle, you will get an error. 12:02:58 Jirka: I think there is a Saxon library that might convert this. You should look into it. 12:03:14 Felix: Is there a concrete action following for M4Loc from this? 12:03:46 Milan: It looks much easier now, so we should analyze the new version of these tools. 12:03:56 Yves: You'll get HTML5 support by going this route. 12:04:16 ..We can also add information about the domain. It might be useful for choosing the process in MT. 12:05:25 David: There is a potential to expand what M4Loc parses. Not just inlines, but the domain would be an obvious thing. Property bugs could be another thing. It depends on the MT consumer. 12:05:55 ..Asia online could consume property bugs. It would be nice to add terminology and entity markup in M4Loc. 12:06:30 Declan: We might be able to releverage some of the M4Loc stuff in what we are doing to avoid duplication of effort. 12:06:38 David: It would be great if you could consume it. 12:07:06 Felix: You don't need a separate filter for translate from Okapi as long as you can consume it. 12:07:50 David: Yves is working on the XLIFF 2.0 library, which will make switching easy when the time comes for it. 12:08:45 Yves: We do have some XLIFF 2.0 stuff done. But we don't want to fall back on everyone using Okapi because we need several implementations. It helps make the standard better by seeing what problems they run into. It is important to have multiple implementations. 12:09:18 Felix: That's not a W3C process question: We can have "fake" implementations, but we need real ones. 12:10:36 Felix: We didn't address the keyword mapping topic. Let's put that down for later. 12:10:49 action: felix to come back to keyword mapping issue in domain 12:10:49 Created ACTION-226 - Come back to keyword mapping issue in domain [on Felix Sasaki - due 2012-10-02]. 12:11:19 topic: HTML5+ITS to XHTML+ITS convertor 12:11:26 rrsagent, draft minutes 12:11:26 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 12:12:02 Milan: Is there a new version of Okapi with this? 12:12:15 Yves: The HTML5 branch in the GIT repository has it. 12:12:23 Des: Will it move into the dev branch? 12:12:29 Yves: Later on. 12:16:03 https://github.com/kosek/html5-its-tools 12:17:23 Topic: Jirka's demo 12:17:38 Arle: can it convert back from XHTML to HTML5? 12:17:55 Jirka: Not currently, but it wouldn't be hard. 12:18:05 Felix: It might be useful to Pedro if it did. 12:19:10 Shaun: If there is no ITS target information in the target file, do you have to convert back? 12:19:29 ..It should take only a few lines of XSLT. It's not difficult. 12:20:06 Sebastian has joined #mlw-lt 12:20:09 Pedro: The transition to HTML5 will take some time and this will help. 12:20:37 Yves: This was *extremely* useful to me. If you are working with Java, using validator.nu is the natural way. 12:20:58 Felix: This validator.nu is used by the W3C's own validator. 12:26:55 Jirka: for HTML5+ITS there is web and command line versions. If there is interest, I can make it accessible through university website when stable. 12:27:12 Felix: This will become part of the W3C validator once stable. 12:27:26 Jirka: Before that, I can find a server and make it available. 12:27:50 ..It will help us catch typos in examples. 12:29:17 dF has joined #mlw-lt 12:30:26 http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#its-schematron-constraints 12:30:28 Felix: For ITS 1.0 you made Schematron rules to check all sorts of things. I'm not sure if people are familiar with that. 12:30:54 ..See the link I posted. These are checks that go well beyond schema checks. 12:31:04 ..E.g., cooccurrence constraints, etc. 12:31:38 ..Could the Schematron be integrated into the W3C validator? 12:32:29 Jirka: I'll need to check on that. 12:32:57 topic: CMS-to-TMS and Online MT System Readiness prototype 12:33:26 Topic: CMS to TMS and Online TM System 12:35:14 rrsagent, draft minutes 12:35:14 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 12:35:22 s/topic: CMS-to-TMS and Online MT System Readiness prototype// 12:35:23 rrsagent, draft minutes 12:35:23 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 12:37:31 Pedro: This features Drupal integration with Cocomore for the showcase. 12:39:49 Milan has joined #mlw-lt 12:48:31 Felix: These are hand-made examples for now, right? 12:48:38 Pablo: Yes. 12:52:09 ..The implementation of translate allows CAT tool users to see the content, but not to change it. 12:54:19 Ankit has joined #mlw-lt 12:54:51 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 12:56:02 s/Pablo:/Mauricio:/ 12:59:50 Felix: When will there be a prototype? 13:00:26 Pedro: Here there are three parts. The first is the Drupal connection. We have checked our web service. Today or tomorrow I hope that we can ramp up but it has been tested. 13:00:41 ..The second is the engine for normalization. That will be done in October, in a couple of weeks. 13:00:57 ..The third are the effects in the localization platform. Everything has to be ready before the end of December. 13:01:36 Felix: If you look at the description of work we have until next year. But see how Yves is implementing while we are defining and providing feedback. 13:02:33 ..You are working in a waterfall mode, waiting for the definition to be complete. For example, the tag has content, so it wouldn't work. The waterfall model wouldn't catch that early on, otherwise you don't see the errors until later on. 13:03:35 ..I hope you can move towards Yves' model to catch errors early on. 13:04:33 ITS 2.0 Specification says that Provenance category will be updated in next version of the spec. Is this still the case? Has Provenance category been dropped? 13:05:45 Felix: It is really useful to use a feature prototype model. 13:06:39 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 13:07:14 https://www.w3.org/International/multilingualweb/lt/wiki/Online_MT_System_Internationalization_Project_Information_Metadata 13:15:50 Felix: We need to start contributing test cases. 13:16:02 Jan: That will help those interested in that to start getting involved. 13:16:38 Des: In the first use case, why did you go first to XML, then to XLIFF, then to HTML5? HTML5 doesn't seem to be an optimized interchange format? 13:17:02 ..When you don't have a CMS, there are valid reasons to use HTML5. But when you do, why not go straight to XML? 13:17:33 ..You obviously have a reason since you considered them. 13:18:13 Moritz: We started with XML, moved to XLIFF, and that was hard. And then Felix asked for more HTML5 implementations, so we thought we'd try that. We found XLIFF was a pain, so we could move back to XML. 13:19:29 Felix: While authors may want to work with HTML5, internally use what works best. I don't think corporations are using HTML5-based workflows right now. 13:19:52 ..I've seen examples of XLIFF, but see what works for you. Make sure it is useful for you internally. 13:20:34 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 13:20:51 Des: It seems to me that this is going Publishing → Localization → Publishing by using HTML5. It may work for you though. 13:21:17 Dave: Jan told us yesterday, however, that more authoring is in HTML5. 13:21:49 Pedro: Normally we have discussion between integrators, the client, and us. Perhaps in that case someone would have asked why we use HTML for a roundtrip like this. 13:23:23 Felix: It would't work without HTML5 support, but we didn't discuss any specific HTML5 application. Yves showed how HTML5 could enter the chain, be converted to XLIFF, etc. 13:23:52 ..But I'm not sure if HTML5 should serve for the whole chain. XLIFF would seem to make more sense. 13:24:07 David: HTML5 lacks the mechanism for bitext translations. 13:24:41 ..I thought Tektronix donated their XLIFF-to-Drupal extractor to an open-source project, so this was taken care of. 13:25:05 Felix: You don't have to use HTML5, so please look at it and do what you need to that makes sense. 13:25:36 Des: I think that we need to distinguish between authoring and publication formats on the one hand and interchange formats on the other. We need to consider what is best practice. 13:26:02 ..There is a lot that isn't possible in HTML5. I think we need to consider what is best practice and what we should promote. 13:26:39 Dave: Smaller clients running their own websites might have only an off-the-shelf Drupal and don't want to set up XLIFF and so forth. 13:26:57 ..So that is one market, different from the enterprise client. 13:27:35 Felix: You can consider using XLIFF in your process, or might continue as you are and make it clear where your workflow applies with a good description. 13:27:51 Dave: We need clear business cases. 13:28:18 Moritz: Mauricio and I should knock this out tonight. 13:28:29 Felix: Include David F. in this discussion. 13:29:16 Pedro: Concerning readiness, there are a few of us who see this as very useful (Dave, Yves, Cocomore, and us). In the case that you can choose where to put that information, is more political than technical. 13:29:37 ..In the use case of HTML with no API, wrapper, etc., you might put it right in the HTML material. 13:29:50 ..We need to push this hard right now since it needs to be ready by November. 13:30:48 Felix: Let me point to what Yves and Shaun did: they implemented features they liked and discussed them in the ITS discussion forum. Some of their ideas are now being implemented. 13:31:16 ..Implement things, but not privately, even if they don't make it into ITS 2.0, so that others can see them. 13:32:46 Felix: One reason for an implementation-driven approach is that it allows people to see what is being thought of and tried. 13:33:09 David: I see why you want readiness in HTML5, but most clients don't want that information published. 13:33:28 Dave: One thing we haven't discussed much is the need to strip information. 13:36:07 Shaun: For ITS 2.0 we use DocBook and Mallard. Before we had tools, the translators had to work directly in those files. 13:36:16 ..Our translators use PO files. 13:36:29 action: phil to move provenance forward (off-line discussion at prague f2f) 13:36:38 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 13:36:39 Created ACTION-227 - Move provenance forward (off-line discussion at prague f2f) [on Phil Ritchie - due 2012-10-02]. 13:38:03 ..Colleague created XML2PO, but it created problems for us in some ways (despite being a step forward). There were issues for us concerning how to map the XML structure to PO. 13:38:27 ..I redid this as ITS Tool when I discovered it. 13:40:22 ..ITS couldn't provide all the information needed by PO. We added a number of extensions, some of which have now gone into the ITS 2.0. 13:41:36 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 13:41:51 topic: ITS Tool 13:41:53 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 13:45:12 Shaun: ITS tool ships with a set of rules and uses them to parse files. 13:46:04 Arle, maybe for the slides: ITS tool ships with a set of default rules for various formats and uses these for PO file generation 14:16:01 Pnietoca has joined #mlw-lt 14:19:11 Declan has joined #mlw-lt 14:19:35 Phil: I'm going to show our work on review. 14:24:44 Phil: Our system is something like CheckMate, doing automated checks. We added a browser client that works both off and online, using AJAX to post back to a server, capturing provenance. 14:25:17 ..Allowed use of audit trails to find quality problems in other documents. 14:26:23 Milan has joined #mlw-lt 14:26:57 ..Tool focuses on sentences where we expect there may be problems. 14:27:45 ..Allow tagging error types in the UI. The process alters the DOM in HTML and puts the errors into stand-off markup. 14:28:12 Milan has joined #mlw-lt 14:29:06 ..By editing the DOM, we can save the file with the markup. 14:29:15 ..It doesn't require copying and pasting. 14:29:59 Des: What are the constraints? Can you use any HTML file? 14:30:45 Phil: It's browser-independent. It doesn't have any dependencies because when we do the transformation from XLIFF everything is wired into the file and all you have to do is references some standard JQuery/JavaScript libraries. 14:31:15 tadej has joined #mlw-lt 14:31:30 ..Everything is embedded in the HTML5 when it is converted from XLIFF. 14:32:47 Dave: Will discuss simple MT. 14:32:48 topic: Simple Segment Machine Translation Use Case 14:48:58 http://about.validator.nu/htmlparser/ 14:49:25 "The jar file contains sample main() entry points:" 14:51:05 Pedro: With MT there should also be CAT tools and human at the segment level. What strategy did you take to addressing metadata that applies to more than one segment/level. 14:51:14 s/level./level?/ 14:51:25 above library can be used not only for validation, but also for parsing and e.g. creating various serializations 14:51:31 Dave: Before we call the service, we have to do a full parse down to the segment level. 14:51:58 Pedro: In our case we don't do the segmentation. The CAT tool does, because it has to be consistent with the TM. 14:52:08 ..It is an external service to us. 14:52:32 Dave: We do it because we want to focus on the MT and still have control. But we are't working with a CAT tool. 14:53:52 David: It's a small loop here, so we can do it this way. But in a bigger process, you have to make sure these things are handled appropriately early on. You will need ways to reverse the process too, at the end. 14:54:48 Pedro: Some things are handled at the segment level, but others apply to the document or sections. 14:55:16 David: in some cases segment-by-segment is too slow. 14:55:55 ..You won't want to rely on the MT system for segmentation if you have to use TM. 14:58:00 Declan: We need to know whether the MT service would ever get a full document or whether it would only get pieces. In the past we have usually dealt with sub-paragraph segments. 15:00:02 Felix: Domain-mapping here used space separated rather than comma-separated. We need to make sure there is consistency here. 15:00:45 Yves: I wanted to know how to map domains in HTML. The problem was the format of the keywords in META. Currently we point to a node and expect a string to map to it, but we don't have an internal syntax for the contents. We need to specify this. 15:02:48 Declan has joined #mlw-lt 15:04:19 David: Talking about XLIFF used to provide CMS-TMS roundtrip. 15:04:29 ..Proxy problem means we can't show the demo. 15:04:57 ..We initiate projects on the CMS. Want to show examples of how the XLIFF half works. 15:05:25 ..Note this is nothing like a traditional TMS. It is a service-oriented architecture. Previously it had an unrestricted number of specialized agents. 15:06:02 ..It routes XLIFF between the specialized agents. 15:06:16 ..It is a closed localization loop, before the CMS enters. 15:06:37 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 15:06:42 ..The idea is to use this modularized system based on XLIFF I/O. 15:06:58 topic: SOLAS CMS-LION ITS 15:07:01 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki 15:08:41 ..Some times the dumb components need to be clever. 15:08:51 s/Some times/Sometimes/ 15:09:21 ..We can start processes from an arbitrary XLIFF file, or from Okapi. 15:12:10 ..We work with Moravia and M4Loc (Moses). Moses uses text only, but M4Loc adds XLIFF capabilities for Moses. We then pass on MT-relevant metadata. They will add support to the M4Loc project. 15:12:28 leroy has joined #mlw-lt 15:13:50 .. We might want to add support for provenance that Yves doesn't need. For example, if we want to integrate multiple MT systems, we would need that capability. 15:15:21 Yves: We need a consistent way of mapping the data categories to XLIFF. 15:15:52 Dave: The co-chairs need to take the lead in this. 15:16:27 David: Does this belong to XLIFF or ITS? Maybe this is a good reason why Moritz and Pedro did not use XLIFF for an exchange mechanism. 15:16:50 ..We need a single XLIFF+ITS method. 15:18:00 Felix: Once the metadata is stable in November, we need to deal with this. We can publish as many best practice documents as we want, so we can have an ITS to XLIFF mapping. 15:19:06 topic: Cocomore demonstration 15:22:08 Moritz: I'd like to make a case for readiness. We need to provide a way for the user to be able to trigger processes upon certain conditions. For examples, we send things off to Enrycher, Linguaserve. 15:22:24 ..Even if readiness isn't a data category, it should be a best practice to help smaller enterprises. 15:24:09 ..We let users add local metadata. 15:24:21 Dave: Is that an existing HTML editor? 15:24:26 Moritz: Yes. 15:25:02 ..We have trouble knowing how to make translate global for the end user in an intelligible fashion. 15:25:42 Felix: Is localization note only global for the whole document? 15:25:56 Moritz: for the content node, yes. 15:26:04 Felix: That doesn't let you mark pieces of nodes. 15:26:14 Moritz: We've not implemented that but it's something to consider. 15:26:42 ..Implementing all this required "breaking Drupal’s back a bit”. It's still a bit too complex, but we're working on this. 15:27:19 ..Our process in the CMS should be linked to best practice for readiness. 15:28:47 Olaf-Michael: Does it compare metadata in source and target? 15:29:00 Moritz: It's half automatic at this point. We need to see what we can leave in. 15:29:15 leroy has joined #mlw-lt 15:29:43 Serge: This targets Drupal, but what about the other 1200+ CMS products? 15:30:18 Felix: Because we don't have infinite funding, we are focusing on an open-source CMS, hoping that it can be reused. This is just the start and we want it in open source. 15:30:34 Moritz: We will provide these as Drupal modules for others to use. 15:31:05 Yves: The interface with translation will be standardized, and not tied to Linguaserve? 15:31:34 Moritz: For the showcase, we are focusing on Linguaserve, but we will go wider. 15:32:09 Felix: Adjourn for today. 15:33:23 rrsagent, draft minutes 15:33:23 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html Arle 16:30:59 I have made the request to generate http://www.w3.org/2012/09/25-mlw-lt-minutes.html fsasaki