Work Package 5: Deep Web Information and MT Training

Version 0.0 • 20 December 2012

Overview

Task 5.1: Development of CMS-side MT-training Support Components

Deliverables

The deliverables for this Task are in process.

Task 5.1: Development of CMS-side MT-training Support Components

[Task Leader: Cocomore; Contributors: MORAVIA, UEP, UL]

Deliverables resulting from this task:

The deliverable D5.1.2 builds on the groundwork performed within tasks 3.1 and 4.3. SOLAS modular platform was enhanced with an Extractor/Merger Component that wraps Okapi Frameworks libraries. The ITS 2.ß, and XLIFF 2 capabilities were contributed largely by ENLASO within task 3.1 of this project. M4Loc middleware developed largely by Moravia and enhanced with XLIFF encoding of ITS 2.0 categories within task 4.3 of this project consumes and produces several ITS 2.0 categories. The deep Web MT Training Exporter in turn exports bitext corpora driven by domain filters, containing terminology and disambiguation information.

The M4Loc functionality was enhanced in the following ways so far:

SOLAS stores all bitext that has been orchestrated through the platform and allows for creating XLIFF based training corpora driven by the ITS 2.0 Domain category, containing and preserving other supported ITS 2.0 categories. The functionality exists on the platform, GUI needs to be developed by M15.

The progress of this deliverable is satisfactory and the delivery milestone M15 is likely to be met.

5.2: Metadata-Aware MT Training

Deliverables

The deliverables for this Task are in process.