See also: IRC log
This is the raw scribe log for the sessions on day one of the MultilingualWeb workshop in Pisa. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC is used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.
See also the log for the second day.
Richard introduces the project and the workshop
<luke> 2nd of 4 MultilingualWeb conferences
<luke> Goal is to facilitate cross-pollination across different areas, so don't tune out if it's not your specialty!
Domenico describes the mechanisms behind IDN, domain names in general, the usage of the internet
Domenico describes what is possible with IDN, compared to domain names in general
Domenico describes how the punycode translation helps to use IDN, while keeping the underlying domain name system as is
<chaals> [webfonts is actually really important for some places ... ]
oreste is showing various areas that need more work to create "a web for all", e.g. in the area of accessibility, multilinguality etc.
oreste describes wcag 2.0
oreste: issues of multilingual web: encoding, colors, navigation, ...
oreste describes the role of W3C offices, translations, W3C I18N Activity etc. as important means to push the multilingual web
Kimmo: I am project officer for
mlw project
... I am very happy about the enthusastim in this project. It
is very small in terms of budget, but it is very
successful
... mlw has also been very succesful in using social
media
... looking forward to see the next steps including the review
which is coming up
... mlw has been wonderful forum for gathering new ideas, to
understand how much fragmentation still exits
... now it is time to become operational, to start to put ideas
into practice
... I except that this project will come up with good
recommendations: what needs to be done, why, who could do
it?
... we have to create operational working links to other
European projects
... mid 2015 we will have about 50 onging projects in the area
of multilingual technologies
... we started creating these links, i.e. we have speakers from
several European projects
... please look into these other initiatives and see what we
can do together
... we started funding language technology 2 years ago - we are
reaching a plateau
... we just evaluated 90 proposals, asking 240 mill. Euros, we
only have 50 mill. Euros
... we can only select one of five projects
... there is still one more call coming up for SME: 35 mill.
Euro for sharing data / language resources
... there is still three weeks to put in a proposal
... once SME call is other, we will have about 50
projects
... we spent 150.000 Euros to fund a survey, interviewing many
people in European states
... asking about language use while being online
... results will soon be public on our web site and europe
barometer web site
... results are that use of other languages is mostly
passive
... when people write and engage in social networking, they
prefer to use their own language
... 44% said: they are missing important information because
they don't understand the language used
... thank you, have a succesful conference
ralf: talking about attempts to
give access to information across languages
... monitoring news in 50 languages
ralf introduces JRC
ralf describes the news sources used for "media monitoring": 100.000 news articles gathered per day, in 50 languages
ralf: articles are converted into rss for further processing
ralf gives examples of news coverage: not always news are available in English, but sometimes more is available in other languages
ralf: we also find out
co-occurences: who or what is mentioned with whom or what in
different languages?
... also analysing quotation networks: who gets mentioned by
whom, also different depending on the language
... recognition of entities (mostly persons) in about 20
languages
... multilingual categorization, using about 1000 categories,
using boolean search word operations, optional weights of
words, co-occurance and distance of words, regular expressions
for inflection forms (not only morphological)
... multilngual categorization in general and specific for
medicine in the medisys - system
... classifying countries and category, e.g. there is 1/2
article about tuberculosus in tzech, but if suddenly it is 5
articles a day, we can issue an alert
ralf introducing news explorer - multilingual news daily overview
ralph: application about
multilingual template filling - NEXUS, extracting structured
information about events
... focusing on conflicts, crimes, desastors, ...
... want to know if there is a desastor with the need to send
aid etc.
raplh: summarizing: have
demonstrated our EMM system, technologies being used,
application scenarios
... modest attempts to get access across languages, but users
appreciate it and it shows that the Web is not only for
English
<Zakim> chaals, you wanted to ask about how users will distinguish papa.it and papá.it
domenico: punycode translation of papa.it and papá.it is different, so sure, yes
XYZ: question about nexus: if a news paper says "person X is a freedom fighter, another saying "person X is a terrorist", who do you deal with this?
raplh: there is political
analysis being done, but categorization like the above is
normally not being done
... system is publicly accesibly via our home page
Adriane Rinsche opens Developer session
Steven talks about HTTP content negotiation
Steven shows some examples of content negotiation
Steven talks about possibility of providing more better 404 error pages
scribe: and 406 pages
... some servers like www.google.com ignore content negotiation
headers
... and try to guess your location based on your IP address
<Tomas> Most do. The general problem is Multilingual Web Sites (MWS).
scribe: another approach is to
have button for changing language on the web page itself
... some sites even use Javascript to change content inside the
page
After summarizing some bad practices in serving multilingual websites Steven now introduces XForms
XForms separate data and presentation. Steven shows this on example of the simple form
scribe: XForms can contain
calculations
... controls are abstract and can get different styling
easily
... it's possible to use different datasources
Steven shows form which can dynamically change labels for form fields based on the selected language for the form
scribe: XForms use declarative
approach which require much less work to produce
... conclusion - XForms allow to use "language stylesheets" to
create multilingual forms even if this wasn't original goal for
XForms
<Tomas> It is in my presentation this afternoon. An overview http://dragoman.org/mws-india.html
Chaals introduces Widgets technology
scribe: history of Widgets
development and standardization in W3C
... Widgets are now split into 7 specifications
Chaals shows source of simple Widget
scribe: describes l10n features
of Widgets
... Widgets use xml:lang and for more larger resources separate
language specific directory can be used
... Widgets do not use ITS because namespaces are too hard for
some web develpers, instead few specific attributes and
elements were adopted (span, dir, xml:lang)
... Opera extensions are based on Widgets
... l10n is hard, you should get advice and do proper test
Richard tries to explain what HTML5 mean
scribe: Richard will talk only
about HTML5 specification
... not about related things like CSS3, new Javascript APIs,
...
... HTML5 endorses utf-8 encoding
... simplified encoding declaration <meta
charset=utf-8>
... polyglot documents are both XML and HTML5 (HTML syntax)
documents, use utf-8, no XML declaration
<Steven> Actually, XHTML 1.0 had the same thing, but didn't call it "Polyglot"
<Steven> But it was addressing the same problem
scribe: charset attribute was
removed from link and a elements
... language declaration can use lang attribute or
content-language HTTP header
... content-language can contain more languages then one
... content-language was just recently removed from HTML5
draft
Richard now explains Ruby
<chaals> [Ruby was very common in western medieval texts, where greek, latin, hebrew etc would be mixed. E.g. religious texts, and scholarly documents]
<Steven> Yes, Chaals, it is very useful for other things than Ruby; pity they called it Ruby mark up, since it is more than that
scribe: HTML5 have support for
Ruby, but uses slightly different markup then XHTML 1.1 or ITS
(missing rb element for base text)
... Bidi support
... HTML5 adds bdi element for bidi isolation
... dir="auto" allow run-time decision about directionality
<Steven> I sent a last call comment to the ruby WG, saying they should call it something more generic, but they declined "because Microsoft had already implemented it"
scribe: Richard invites all to get involved in spec development
Gunnar talks about some problems in the HTML5
scribe: validation of email input
type field is too restrictive in spec - doesn't support
IDN
... each browser provides different UI for changing preferred
language
... some browsers has bugs in this
<Steven> Some browsers have bugs, but some do it completely wrong :-)
scribe: language negotiation is
missing some feature
... how to label original and translation
... how to label human and machine translation
Jochen shows mind map of presentation
scribe: presents details about
Thomson Reuters company
... customers require high quality
... combination of human and automatic methods is in use
... XML and Unicode is heavily used
... main issue is not lack of standards but developer
education
... i18n and l10n is not a part of curriculum
... new challenges are supported for multimedia content
... some content is
hidden (Facebook, Twitter, ...)
... proposes more open twitter-like messaging system with
better support for i18n
... it might be useful to HTML tag saying that some page is
translation of a different page
Question from Google: Defends current state of affair regarding language selection. Asks whether easier UI will help?
Chaals: Interface should be
easier to use, most users doesn't set their language
... content should contain as much metadata as possible to
inform about alternative versions of content
Richard: mentions some extension that allows easier change of preferred language
Question from Olaf: What is chance to implement some notation for marking document being in the original language.
Chaals: There are many notations
starting from simple rel= going to RDF
... you should use it, browsers will support what is used on
the pages visited by users
... you should talk to producers of content creation tools
Richard: you should be more involved, create proposals, ...
Felix Sasaki: It's possible to introduce new language subtag for this
<fsasaki> .. use the ietf-languages list to discuss this with the people reviewing such proposals
Felix: Welcome to afternoon session
Dag: 37 langs, 51 markets
... some countries have more than one language (eg Belgium,
Canada)
... adding value to Office
... content, templates, also sell Office
... campaigns in different markets at different times
... market specific engagement
... Recent migration, site management and authoring from XMetal
to Word
... and using sharepoint instead of a custom publishing
system
... we did extend Word to support this
... allows federated authoring
... helps with localization
... Lessons from this migration
... internationalisation was a key stakeholder
... designed for scale
... it was quite an effort, next time we won't do everything at
once
... 100s of thousands of help documents for at least the last
three releases
... content heavy
... complexity wasn't where we expected, and was more complex
than we expected
... General lessons from the site
... Serve all global market needs, English is just another
language
... scale up *and* down
... design for growth
[gives example of content originating in Japan, and translated to other languages]
Dag: No character formatting,
only character styles
... We have an XML format for translation
... Local touch
... deliver right experience to each market
[examples]
Dag: Customer connection
... feedback, evaluation, SEO
[examples from site]
Dag: Continuous updates
... respond to regional events, A/B testing
... use some machine translation
... Future trends
... moving to the cloud
... multilingual multimedia
... language automation
... interoperability with standards
... Conclusions
... It is possible to design for scale and local relevance
Jirka: tag set designed to help
with translations
... usable with any XML vocabulary
[example of use]
Jirka: Allows automatic software to see what should not be translated, as well as human translators
<chaals> [As Jirka said, you don't have to use the actual ITS namespace to use the ITS pieces - and the decision for widgets was indeed to do that]
Jirka: Now to look at formats
that support ITS
... first DocBook
[example]
Jirka: Next format, DITA
... for topic-bsed documentation
... DITA doesn't natively support ITS
... can be added
... Now OOXML
... Open Office, and even for MS Office 2007+
... no native support, but can be added
<jan> Office Open XML is a MS developed standard, not Open Office... ;-)
Jirka: ODF is similar
... XHTML allows use of ITS
... HTML5 has no extension points to allow ITS
... what is to be done?
... HTML5 needs to be augmented to support ITS
Dag: MS translator does support something similar
Steven: If XHTML5 supports it, why not just say "Use XML serialization if you want this facility"?
Jirka: Not sure if people can produce well-formed XML
<Jirka> Slides from my presentation http://www.kosek.cz/xml/2011mlwpisa/
Chaals: What standards should be
developed?
... there are lots of multilingual sites. Substantial
problems
<Tomas> I am here ... just in case
Chaals: principles - don't break
existing stuff
... expect it to take time
... two sides of coin: users and webmasters
<Tomas> Slides - http://dragoman.org/pisa/carrasco-mw-pisa.pdf
Chaals: But it is often less
clear-cut
... Currently - no consistent user interface for a ML
website.
... this should be fixed
... No standards for multilingual content production
... this should be fixed
<Tomas> No standards for content production - in general - not a particular problem to MWS
Chaals; Most users are monolingual
<Tomas> One needs hard data
Chaals: Webmasters must manage
multilingual system
... users don't want more complexity
... webmasters aren't necessarily experts in this stuff
... interfaces for content from the user side are
well-established
... not so for webmasters
... Some ideas - language button in the browser
... use HTTP header fields maybe
... content negotiation
<Tomas> Another good "high level" variant is memento http://www.mementoweb.org
Chaals: reserved URIs
... I am not sure if reserved URIs are a good idea
... It should be possible to request a translation
... there's an Opera extension for that
<Tomas> A reserved URI is very good as one can have all the pages in the MWS with the same URI pointing to the variants
<Tomas> maitaining pages with different URIs for the variants is very hard
Chaals: need a metaresource concept
<Tomas> RDF might do it - needs verification
Chaals: Need server-side standards
<scribe> Scribe: RDFa was largest growing web format last year http://rdfa.info/2011/01/26/rdfa-grows/
Chaals: Next step? Working group
maybe
... at W3C? Elsewhere?
<Tomas> No WG, not specifications
Chaals: or create a new
initiative?
... Need guides for best practice on user and webmaster
sides
<Tomas> A tabular view http://dragoman.org/mws-india.html
Sophie: 90% of HP's customers buy
based on content rather than touching product
... 42% of web users are from Asia
... only 13% from USA
... yet English still leading language
... asia has highest usage but low penetration
... therefore it's a growth area
... 10% retail sales in China are done online
<chaals> [My concern with reserved URIs is that it breaks some existing standards and expectations. I think HTTP headers and metadata are better approaches. (I generally hate reserved URIs - they are used in P3P, favicons, robots.txt and a couple of other places, but I don't think they're going to handle the complexity of multilingual websites without creating as many problems as they solve...)]
Sophie: How to represent brand
consistently, locally
... how to make it relevant
<chaals> [I certainly think that being able to get the information about available variants is really important]
Sophie: how to manage
translation
... First is to use component based system
<Jirka> chaals: yes, but it might be sufficient to have link/http header pointing to another URL where manifest listing all possible variants will be sitting then to have dozen of alternatives in each page -- to much change when new translation is added
Sophie: synchronisation between
compnenet sis then easy to manage
... allows local components, but global style
... eg Emirates site
... Use positioning information to personalise
information
... example, Lux brand which is up-market in India, but not
elsewhere
... need local input to ensure local nuances are working
... users come with cultural layers as well
... cultures vary in many dimensions
... Finally, managing content
... need a well-managed process
<Tomas> [The browser side is much better, but we have to care for the server side. This is the question: how to implement the server-side. Separate function from the mechanism: we can explore different mechanisms. One fix reserved URI for the whole server combined with the Referer header will certanly resolve a big problem (different URIs for each page).
Sophie: can be automated to large extent (the management, not the translation)
[shows an example process]
Sophie: In conclusion,
translation must be part of a larger picture
... use component, geo-positioning, and translation
management
<Tomas> Question of scope: what should be in MWS and what in other specifications for full translation system.
<Tomas> The picture is larger: Authorship, Translation and Publishing Chain
<Tomas> Translation is only part of the whole production chain
Christian Lieske: For Chaals- I got different messages - we've got to do stuff, but Sophie seems to suggest we can already do it.
Chaals: It's not that we can't do it already, but that there is no agreed way to do it
<Tomas> We need to define the different scopes and how the different fields integrate; a MWS is *not* a translation management system.
Chaals: We have no interoperability
<Tomas> You wont another beer !!!
Sophie: Changing solutions is hard, standards could help
<Tomas> We need to identify what is particular to MWS and is general.
Sophie: We should work towards a position where you need less developers
<Tomas> Language is just one of the dimensions in TCN; e.g., mementos should be integrated in the same mechanism http://www.mementoweb.org/
Dag: We have a translation tag, but it is not standard, so there is less customer value, in the long run a standard lowers the cost of entry for us
<Tomas> +1 regarding further development of XLIFFpers: one should be able to construct a MWS from Apache out of the box
Tomas Abramovitch: Do you use different CSS for different cultures?
scribe: and how accurate is geo-location?
<Tomas> One could (CSS)
Dag: We componentise our pages, the local part is not done by CSS
Sophie: I can't totally answer the geo-loc part.
Chaals: It is a spectrum from identifying one seat in an audience to just someone in a country
<Tomas> One could generate some pages: "5.3. Generating language in parallel" in http://dragoman.org/mws/oamws.pdf
Ian Truscott: identifying people is always a guess until they log in
<Tomas> Or he set his browser preferences
Reinhard: How do we learn from
research? No one has mentioned this
... different people like different things
... 16 year olds in China have more in common with 16 year olds
in the USA than with their parents
... all I've heard is corporate policy. Why not let the user
decide?
<Tomas> A user wants the page in his language
Sophie: Crowd sourcing is an option
<Tomas> Choosing is already a hurdle
<Tomas> We need to look at all the available mechanisms and decide on a recommendation: "4.4. Options" in http://dragoman.org/mws/oamws.pdf
Dag: There are areas where our
interest and the users' coincide
... but we can't do translation on demand
... they pay for premium product
<chaals> [It isn't always a guess identifying the user until they log in. In fact, technically it is often easy to identify users anyway - this is why we have laws to protect privacy and limit the things done to make it easy]
Steven: A good example of
Reinhard's point is websites that conflate refgion with
language. I often don't knwo which question they are
asking.
... and I don't believe that most people are monolingual. There
are 6000 languages, and 150 countries. Most people are at least
bilingual
[scribe's computer is nearly out of battery]
<Tomas> [we need to identify what the user wants, not who he is]
Reinhard: Crowdsourcing translation is often not possible because of copyright issues
Olaf: We need the possibility to
offer translations of parts of sites
... it works on wikipedia
<Tomas> Monolingual user: we need hard data; but circunstancial data point to that the requirement of most user is monolingual.
Olaf: microsoft needs to open its translation tools
Chaals: I use crowdsourced
translation of Norwegian law
... it is easy to do, but by and large it doesn't happen
... too little reward
<Tomas> Translation integration in MWS: a language non available could be defined as a "language potentially available" (after translation). One needs a mechanism covering all the aspects of the different translation techiques: human (professional, crowd), machine (fast as RBMT or slow as SMT).
<Tomas> For the whole enchilada: "Open architecture for multilingual parallel texts" http://arxiv.org/ftp/arxiv/papers/0808/0808.3889.pdf
christian: five areas show that
there is a need for change:
... demand for language related services, shortcomings of
today's translation-related standards, ...
... why talking about standards: demand & lack of
interoperability
... lack of interoperability e.g. for XLIFF
... things break down across tool chains
... standards in localization area are sometimes not
compatible
... example of phrases in TMX vs phrases in XLIFF
... not of work in localization standardization integration new
web technologies
... e.g. aspect of RESTful services, use of related protocols
(odata, gdata) for translation related services
... these problems have lead to implementation challenges,
problems for standards that are already here
... how to solve the problems: four areas of requirements,
methodology, compliance , stewardship are important
... requrements: identify processing areas related to language
processing - and keep them separated
... determine the entities that needed in each area
... chart technology options and needs
... etc. Next: methodology:
... distinguish between models and implementation /
serialization
... distinguish between entities without context and entities
with business / processing context
... set up rules to transform data models into syntax
... set up flexible registries, e.g. CLDR, IANA
... provide migration paths / mapping mechanisms for legacy
data
... third, compliance: e.g. what does "support for standard X"
mean?
... finally, stewardship: driving, supporting standardization
activity
... anyone who shouts for small standards should be willing to
invest
... EC has a track recor, see e.g. mlw project
... make donations / contributions easy
... discourage fragmentation and unclear roles
... LISA does no longer exist, now there is a kind of
competition who could follow in the footsteps
... my fear is that another organization is being cretaed, my
and probably Felix' and Yves' thought is that this should be
avoided
David: christian has covered a lot for XLIFF 2.0 - what do I want to cover?
david: my main statements:
metadata must survive language transformations, content
metadata must be designed upfront with the transformation
process in mind, XLIFF is the principle vehicle for criticial
metadata throughout multilingual transformations
... and finally: next generation XLIFF standard is an exciting
work in progress in OASIS TC
... about preserving metadata: there are various
transformations: g11n, l18n, l10n, t9n ("GILT")
... transformation modi: manual, automated, assisted
... transformation types: MT, human translation, postediting,
stylistic review, tagging (semantic, subject matter review,
transcribing), subtitling, ...
... growing number of source languages
... what metadata is necessary?
... preview and context are critical
... argue for creating standardized XSLT artefacts for
preview
... metadata for legally conscious sharing (ownership,
licensing, ...)
... grammatical, syntactic, morphological and lexical
metadata
... example of m4loc project: they developed an XLIFF
middleware to ensure interop between localization open source
tool and moses MT tool
... tagging of culturally and legally targeted
information
... home for LT standardization? Leverage BP of existing loc
standards (XLIFF, TBX, SRX, ...) - pointing into the past
(OASIS, LISA)
... now: leverage OASIS XLIFF, ISO TC37, Unicode SRX and
GMX
... further development of W3C ITS and RDF, create conscious
standardization including RDF and XLIFF
... OASIS is home of XLIFF, but has also UBL and XBL as its
home
... W3C has ITS and RDF modeling, Unicode - see above
... ISO TC 37, important not for standards creation but for
secondary publishing
... why XLIFF?, and why 2.0? see also presentation from
christian
... good progress of XLIFF in 2011 possible, as SWOT analysis
shows
... prediction: 2011 will see definition of new features, in
2012 new standard
sven: kilgray, "we localize",
Andrä, biobloom are behind the "interoperability now!"
initiative
... translation (technology) industry is a niche industry
... very few computer scientists here, not a technical, but
experience driven industry
... industry is getting more and more important, including
technology
... hence interop is getting more important
... there are enough standards here, but they are complex, not
many have reference implementations
... and there is little exchange within tool providers
table of features in XLIFF that are supported by all tools - only two features (from about 50?) are supported by all tools
sven: we want lossless data
exchange in a mixed (tool) environment
... standards are important, also develo
... but mindset is most important, i.e. about the lossless data
exchange
... basis of our work: "interoperability manifesto"
... pushing standards over the edge, give feedback to the
TC
... modules that we are working on: about content, package,
transportation
... content is modified xliff
... package is currently just made up
... for transfortation we are using regular web services
... basic approach: disclose our concepts
... reference implementations are open source
... early real life usage
... test scenarios to verify compliance
... theoretical aspect: agile vs. standard?
... would be good to have a framework for organizations like
W3C that could help is to bring this into standardization step
by step
... benefits of this approach: it is a limited time that we are
working on this
david: our vision: have a box
that creates quality content very quickly and cheaply
... using MT, we want an efficient solution that will make mlw
a reality
... need to develop MT which is good for blog publishing
... MT will never be ready "as is" for human quality
translation
... we developed a system for cheap and quick post
editing
... currently, explosion of content, lots of it is local
because of language barriers
... translation costs are very high
... we are targeting open source CMS platforms
... 20 % of web sites are published on such platforms
... we could offer a good translation solution to these
... large media publishers who use open source CMS
... wordpress, movable type are created for all kinds of web
sites, not only blogs
... our solution: based on MT; human post editing, and crowd
sourcing
... crowdsourcing startups in many regions
... our solution: not automated open source CMS solution for
small guys
... no automated tools for post editing / MT either
... our solution uses data from blogs that is available on the
web
... workflow: user installs workpress, MT is done, email
notification is sent to crowdsourcing translators, integrated
after review by a moderator
... interested in opportunities for funding this kind of
work
Pål: opera has been using crowd sourcing for a long time
Pål: caveat of crowd sourcing:
it is not free, organizing it is difficult
... e.g. employing managers for the crowd
... should only be used for cretain tasks
... not for time critical tasks
... mostly students are participating, picked up from
university talks
... large crowd is not necessarily a good crowd
... better 3,4,5 good translators, than 50 translators doing
nothing
... e.g. press releases, marketing material are not well suited
for crowd translations
... good for crowd sourcing: applications (web site "my opera",
"opera com"), with a stable set of text
... and documentation, that is easy to maintain
... start small, put your crowd under embargo / NDA
... try building up a hierarchy
... be careful with your branding
... and your terminology
... for opera we used XLIFF - we used our own, incompatibly
version of XLIFF
... discovered that open source is not open standard
... tools we used: gettext and po4a, transifex, translate
toolkit with pootle and virtaal, homebrew applications to
bridge the vast gaps
... XLIFF is a mindfield, in the current version
... about html: keep it as simple as possible, semantic markup
is key
... write proper CSS - write a separate RTL - stylesheet to
negate RTL-challenged CSS
eliott: everything that was said
from David, Christian etc. in this session about
interoperability was right, I concur with them
... we need standards because of interdependence.
... the demise of LISA. Sad that they are gone, but opportunity
to look into this in a new way
... LISA standards are important
... now is a good opportunity for a new model of
standardization
... new kids on the block: TAUS and Gala
eliott: currently losts
of different technologies
... and many different standards
... OAXAL is a solution that brings these together
... that can be used for free
description of various aspects of standards and applications built on top of it
eliott: how to spread the message: important e.g. in academic curricula
manuel: presentation about
PangeaMT project
... translation is something that you have to go through for
achieving what you want
... web people expect immediate translation
manuel: why don't we have
immediate translations?
... inroducing pangeanic: LSP, major clients in Asia and
Europe
... we wanted to provide faster service for translation
... became founding member of TAUS
... four years ago created relation with computer science
institute in valencia
... challenge at that time: turn academic develpment (moses)
into a commercial application
... limitations: plain text, language model building (first),
no recording, no update feature, data availability, ...
... objectives: provide high quality MT for post editing
... and to use only open standards: XLIFF, tmx, xml
... built an TMX - XLIFF workflow
... not to be locked into a solution
... PangeMT system: comes as TMX or as XLIFF
... TMX should not die, people are still using it
... future work: on the fly MT training
... pick and match sets of data
... objective stats for post-editors
... confidence scores for users
reinhard: thank you, was a great
session
... about remarks on crowd sourcing: there was emphasis on
crowd sourcing for enterprise
... this does not go well together
... other people like rosetta foundations, translators without
borders etc. have made good experiences
Pål: crowd sourcing was good for us
scribe: it just took us a lot of effor and time to get there
jörg: there is some similarity: you have to train translators, otherwise you won't get the good results in medical translation
felix: one comment on interop now, it is very important to go into a standards body as a next step
sven: thanks, we will definitely try to do that
richard: w3c just created business groups / community groups, that might be a thing for you to look into
david: about what reinhard
said
... if your expectation is high you will be disappointed, but
the business case is in the future