NIF Web Services
Introduction
This document describes best practices to follow for the implementation of RESTful NLP web services that rely on the NLP Interchange Format (NIF). „NIF is an RDF/OWL-based format that aims to achieve interoperability between NLP tools language resources and annotations.“ As a proof-of-concept, we have implemented NIF wrappers for the Stanford POS tagger and Stanford parser. Both are licensed under Creative Commons Attribution 4.0 International.
Natural Language Processing Interchange Format (NIF)
NIF is an RDF-based format. The classes to represent linguistic data are defined in the NIF Core Ontology. All ontology classes are derived from the main class nif:String which respresents strings of Unicode characters. One important subclass of nif:String is nif:Context. It represents a text in its entirety and holds the characters of this text in the nif:isString property. There are several classes (e.g. nif:Word, nif:Phrase, nif:Sentence) for representing partitions of a text, their choice depends on the unit of annotation. All such subunits have a property nif:referenceContext pointing to their respective nif:Context instance. Furthermore, their position inside the context is specified using the nif:beginIndex and nif:endIndex properties. The actual substring represented by these units can be specified using the nif:anchorOf property. Annotations like POS tags or relation types (see below) can be added as properties to the respective nif.String objects. NIF individuals are identified by URIs following a nif:URIScheme which restricts the URI's syntax. E.g. a URI following RFC 5147 consists of a prefix string followed by „#char=x,y“, where x and y are the start and end positions of the string in its context. For nif:Context URIs y can be omitted or set to the total number of characters in the text.
Recommended service parameters
NIF services should conform to the NIF 2.0 public API specification. The following parameters are supported by a specification compliant service. Required parameters need to be specified by the user in order for the service to function. Optional parameters can be omitted, in which case default values are used by the service.
Required:
- input (i): The input to be processed by the service.
Optional:
- informat (f): The format in which the input is given. Supported argument values are text, turtle (default) and json-ld.
- intype (t): Specifies how the input is retrieved. Supported argument values are direct (default), file and url.
- outformat (o): The format in which the output will be serialized. Supported argument values are turtle (default) and json-ld.
- urischeme (u): the URI scheme the service must use to create new URIs
- prefix (p): the service must use this as the prefix part of new URIs. A UUID will be generated if no prefix is specified
Furthermore, we recommend to implement a parameter info which, according to the NIF API specification can be used to output all implemented parameters if info=true. In addition to that, we recommend to output supported parameters and default values as well.
Further recommended parameters, which are not part of the NIF API specification, are the following:
- verbosity (v): Accepting two values: true and false. True returns full output in NIF format, while false returns only the triples added to the data
- model (m): the path/url of a trained model to be used by the service, a default model should be used if no model is specified
- language (l): a parameter specifying the language of the input, default is English
Log messages
NIF services should generate log messages in RDF format using the RDF Logging Ontology. An rlog message is of type rlog:entry and should contain the properties rlog:level, rlog:date and rlog:message. We recommend to generate a log entry in the following cases:
- If no input is specified. Log level should be rlog:FATAL.
- If the input is given as file or url but couldn't be retrieved by the service. Log level should be rlog:FATAL.
- If a parameter value isn't supported by the service. Log level should be rlog:FATAL.
Example Implementations
Wrapping the Stanford POS Tagger
Given the content of a file named example.ttl
@prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> a nif:Context , nif:RFC5147String , nif:Sentence ; nif:isString "This is a sample sentence"^^xsd:string . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> a nif:RFC5147String , nif:Word ; nif:anchorOf "This"^^xsd:string ; nif:beginIndex "0"^^xsd:int ; nif:endIndex "4"^^xsd:int ; nif:nextWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:sentence <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:referenceContext <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> a nif:RFC5147String , nif:Word ; nif:anchorOf "is"^^xsd:string ; nif:beginIndex "5"^^xsd:int ; nif:endIndex "7"^^xsd:int ; nif:nextWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:previousWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ; nif:sentence <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:referenceContext <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> a nif:RFC5147String , nif:Word ; nif:anchorOf "a"^^xsd:string ; nif:beginIndex "8"^^xsd:int ; nif:endIndex "9"^^xsd:int ; nif:nextWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:previousWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:sentence <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:referenceContext <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> a nif:RFC5147String , nif:Word ; nif:anchorOf "sample"^^xsd:string ; nif:beginIndex "10"^^xsd:int ; nif:endIndex "16"^^xsd:int ; nif:nextWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ; nif:previousWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:sentence <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:referenceContext <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> a nif:RFC5147String , nif:Word ; nif:anchorOf "sentence"^^xsd:string ; nif:beginIndex "17"^^xsd:int ; nif:endIndex "25"^^xsd:int ; nif:previousWord <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:sentence <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:referenceContext <e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .
our web service wrapping the Stanford POS tagger can be invoked via curl using the following example call.
Example call 1:
curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordPOSTaggerWebService/NifStanfordPOSTagger -d v=true --data-urlencode i="$(<example.ttl)"
The input is expected to be in NIF format and to contain at least one nif:Context element as well as a set of nif:Word elements. The service reads the nif:anchorOf values of all nif:Words elements belonging to a given nif:Context found in the input and passes them to the Stanford POS tagger. Each word is then annotated by adding a nif:posTag property with the POS tag as a literal value to the nif:Word.
The example output of the service can be found here:
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> a nif:RFC5147String , nif:Word ; nif:anchorOf "This"^^xsd:string ; nif:beginIndex "0"^^xsd:int ; nif:endIndex "4"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:posTag "DT"^^xsd:string ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> a nif:Word , nif:RFC5147String ; nif:anchorOf "is"^^xsd:string ; nif:beginIndex "5"^^xsd:int ; nif:endIndex "7"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:posTag "VBZ"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> a nif:Context , nif:RFC5147String , nif:Sentence ; nif:isString "This is a sample sentence"^^xsd:string . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> a nif:RFC5147String , nif:Word ; nif:anchorOf "sample"^^xsd:string ; nif:beginIndex "10"^^xsd:int ; nif:endIndex "16"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ; nif:posTag "NN"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> a nif:Word , nif:RFC5147String ; nif:anchorOf "a"^^xsd:string ; nif:beginIndex "8"^^xsd:int ; nif:endIndex "9"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:posTag "DT"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> a nif:RFC5147String , nif:Word ; nif:anchorOf "sentence"^^xsd:string ; nif:beginIndex "17"^^xsd:int ; nif:endIndex "25"^^xsd:int ; nif:posTag "NN"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .
Wrapping the Stanford Parser
Our web service wrapping the Stanford dependency parser can be invoked via curl using the following example call where the input is assumed to be given in a turtle file called input.tll.
Example call 2:
curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordParserWebService/NifStanfordParser -d v=true --data-urlencode i="$(<input.ttl)"
The service can be used to parse input that is already POS tagged and sentence annotated. I.e. it expects the input to be in NIF format and contain a) at least one nif:Context element b) at least on nif:Sentence element and c) a set of nif:Word elements associated with the former two and containing a POS annotation in nif:posTag and the represented string in nif:anchorOf. The words are ordered by the position given in their URLs. The service then passes the annotated input to the Stanford parser. For each dependency relation of the parse a nif:dependency property is added to the relation's head with the URI of the dependent word as object. As a word can only have one head, the type of the relation is annotated in the nif:dependencyRelationType property of the dependent word (as a literal).
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix nif: <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> a nif:RFC5147String , nif:Word ; nif:anchorOf "This"^^xsd:string ; nif:beginIndex "0"^^xsd:int ; nif:dependencyRelationType "nsubj"^^xsd:string ; nif:endIndex "4"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:posTag "DT"^^xsd:string ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> a nif:Word , nif:RFC5147String ; nif:anchorOf "is"^^xsd:string ; nif:beginIndex "5"^^xsd:int ; nif:dependencyRelationType "cop"^^xsd:string ; nif:endIndex "7"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:posTag "VBZ"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> a nif:Context , nif:RFC5147String , nif:Sentence ; nif:isString "This is a sample sentence"^^xsd:string . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> a nif:RFC5147String , nif:Word ; nif:anchorOf "sample"^^xsd:string ; nif:beginIndex "10"^^xsd:int ; nif:dependencyRelationType "nn"^^xsd:string ; nif:endIndex "16"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> ; nif:posTag "NN"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> a nif:Word , nif:RFC5147String ; nif:anchorOf "a"^^xsd:string ; nif:beginIndex "8"^^xsd:int ; nif:dependencyRelationType "det"^^xsd:string ; nif:endIndex "9"^^xsd:int ; nif:nextWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:posTag "DT"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> . <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=17,25> a nif:RFC5147String , nif:Word ; nif:anchorOf "sentence"^^xsd:string ; nif:beginIndex "17"^^xsd:int ; nif:dependency <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> , <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=8,9> , <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,4> , <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=5,7> ; nif:endIndex "25"^^xsd:int ; nif:posTag "NN"^^xsd:string ; nif:previousWord <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=10,16> ; nif:referenceContext <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> ; nif:sentence <uuid:e899ea51-fb30-4102-8cdd-9d0ec691a0db#char=0,25> .
Chaining
As one of the services described above (the tagger) produces output the other one (the parser) relies on, they can be used to demonstrate the integration of NIF compliant NLP services. The following nested call combines example calls 1 and 2. It invokes the tagger which produces the output of example call 1 and passes this POS annotated NIF data to the parser. The output is the same as in example call 2.
Example call 3:
curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordPOSTaggerWebService/NifStanfordPOSTagger -d v=true --data-urlencode i="$(<example.ttl)" | curl -G http://sc-lider.techfak.uni-bielefeld.de/NifStanfordParserWebService/NifStanfordParser -d v=true --data-urlencode i@-