W3C

- DRAFT -

rax cg

25 Nov 2016

Agenda

See also: IRC log

Attendees

Present
philr, felix, timea, christoph
Regrets
christian, gerard, jose
Chair
phil
Scribe
fsasaki

Contents


meeting start

phil: did a review of use cases this morning. not too much change, missed one that christoph added.

https://www.w3.org/community/rax/wiki/Draft_Material#Data_acquisition_from_job_postings_via_GATE

phil: thanks a lot for adding this, christoph - can you give a brief description?

christoph: sure. have not yet managed to share the descriptions, I have more material, and will get it done to share this
... will also add more concrete examples. Application setting is: we collect job postings in the form of plain text from the web
... we do named entity recognition with gate, and we get XML output
... begining and end of each token is annotated

<clange> text text text <start/>recognised entity<end/> text text

christoph: see above XML example. this has to be translated to RDF

<clange> <start id="foo"/>

<clange> <start href="#foo"/>

christoph: start and end tags look like the above

<clange> ids or refs (forgot which direction) are in these start/end tags

christoph: we are using XSLT based tool I developed (trextor) to create RDF. it is quite hard

<clange> krextor

christoph: with XPath it is hard to select elements between start and end tags
... that is a bit tricky, you need a good knowledge of XPath, the sibling axis' etc.
... in context of European project, in which another partner is doing the extraction

phil: is this similar to Martynas case?

christopher: in terms of Xpath complexity, yes
... general XML to RDF transformation issue?

https://github.com/fsasaki/its20-extractor/tree/master/wikipedia-extractor

<philr> felix: I've written various converters

<philr> ...it is always special case issues

<philr> ...XML has various ways to include content

<philr> ...special purpose handling is somwhat unavoidable

<philr> ...example documents with guideance would be useful

scribe: may be useful to give guidance on how to handle various cases

christopher: there are patterns, e.g. parent child relations in XML and RDF properties
... for this you can provide a high level translation patterns

<philr> clange: High level translation is possible with simple parent-child relationships

<philr> felix: mixture of text and element nodes is challenging

https://github.com/fsasaki/its20-extractor/blob/master/wikipedia-extractor/its-ta-2-nif-wikipedia.xsl#L43

<clange> fsasaki: handling of specific links (specific to wiki markup)

phil: in FREME project we are also doing named entity recognition on plain text. our services are capable of returning turtle files, but we can cover many formats

https://api-dev.freme-project.eu/ckeditor-dev/ckeditor/samples/freme.html

various types of output, inline or external using json-ld

<scribe> ACTION: felix to provide examples of round tripping as done in the freme project [recorded in http://www.w3.org/2016/11/25-rax-minutes.html#action01]

bdva summit

<philr> felix: to collect information on what better tooling is needed

<philr> ...best practices abd standardization

<philr> ...1.5 hour session on requirements

<philr> clange: is there more I can do if I do not attend the summit?

<philr> felix: it would be good if someone from your organization could attend

<philr> ...questionnaire to bdva members but want input from companies

<philr> Is there a fee to join bdva?

felix: yes, will send info on that

<clange> fsasaki 14:29: EU is not necessarily interested in new standards being developed, but in existing standards to be _applied_ in a better way

thanks, clange

discussion on automationML use case

felix will send further infos on BDVA around

AOB

next meeting 9th of December

phil cannot make it, christian to chair

Summary of Action Items

[NEW] ACTION: felix to provide examples of round tripping as done in the freme project [recorded in http://www.w3.org/2016/11/25-rax-minutes.html#action01]
 

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.148 (CVS log)
$Date: 2016/11/25 13:41:09 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.148  of Date: 2016/10/11 12:55:14  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/this/this, christoph/
No ScribeNick specified.  Guessing ScribeNick: fsasaki
Inferring Scribes: fsasaki
Present: philr felix timea christoph
Regrets: christian gerard jose
Agenda: https://lists.w3.org/Archives/Public/public-rax/2016Nov/0008.html
Got date from IRC log name: 25 Nov 2016
Guessing minutes URL: http://www.w3.org/2016/11/25-rax-minutes.html
People with action items: felix

[End of scribe.perl diagnostic output]