Work Package 5: Deep Web Information and MT Training
Version 0.0 • 20 December 2012
Overview
Task 5.1: Development of CMS-side MT-training Support Components
Deliverables
The deliverables for this Task are in process.
- D5.1.1 Drupal MT Training Module.
- D5.1.2 XLIFF Deep Web MT Training Exporter.
Task 5.1: Development of CMS-side MT-training Support Components
[Task Leader: Cocomore; Contributors: MORAVIA, UEP, UL]
Deliverables resulting from this task:
- D5.1.1 Drupal MT Training Module.
- D5.1.2 XLIFF Deep Web MT Training Exporter.
The deliverable D5.1.2 builds on the groundwork performed within tasks 3.1 and 4.3. SOLAS modular platform was enhanced with an Extractor/Merger Component that wraps Okapi Frameworks libraries. The ITS 2.ß, and XLIFF 2 capabilities were contributed largely by ENLASO within task 3.1 of this project. M4Loc middleware developed largely by Moravia and enhanced with XLIFF encoding of ITS 2.0 categories within task 4.3 of this project consumes and produces several ITS 2.0 categories. The deep Web MT Training Exporter in turn exports bitext corpora driven by domain filters, containing terminology and disambiguation information.
The M4Loc functionality was enhanced in the following ways so far:
- Detection of encoded ITS 2.0 metadata within XLIFF file and preparation for the Machine translation (MT) using M4Loc process. Currently supported data categories are:
- Translate
- Domain
- Disambiguation
- Algorithm of MT engine selection is based on Domain metadata. Implemented for Moses MT engine(s) so far.
- Prototyped mechanism of translation sub-segments with Disambiguation data category defined. Such sub-segments are translated using resources defined in Disambiguation metadata instead of MT.
- Translation units or sub-segments comming to XLIFF with Translate="no" attribute are omitted from the MT translation process.
SOLAS stores all bitext that has been orchestrated through the platform and allows for creating XLIFF based training corpora driven by the ITS 2.0 Domain category, containing and preserving other supported ITS 2.0 categories. The functionality exists on the platform, GUI needs to be developed by M15.
The progress of this deliverable is satisfactory and the delivery milestone M15 is likely to be met.
5.2: Metadata-Aware MT Training
Deliverables
The deliverables for this Task are in process.
- D5.2 Metadata-Aware MT Training Tools.