Warning:
This wiki has been archived and is now read-only.
To Last Call/Federated Query Review
This partially completes my ACTION-284 on reviewing fed. query... find below part 1 of my review.
I didn't really get to the meat of Carlos' changes yet, I believe, but mainly have feedback on the examples so far, in general I think that the examples should make clearer what they illustrate and apart from that I have some editorial feedback.
1) Remove: "Please refer to the errata for this document, which may
include some normative corrections.
The previous errata for this document, are also available.
See also translations.
This document is also available in these non-normative formats: XML and XHTML with color-coded revision indicators. "
-> removed
2)
"This specification defines the syntax and semantics of a SPARQL 1.1 Query extension for executing distributed queries."
- better? ->
"This specification defines the syntax and semantics of a SPARQL 1.1 Query extension for executing queries distributed over different endpoints."
-> changed
3) We should have this either in all or none of our documents:
"The documents produced by this Working Group are:
* SPARQL 1.1 Query * SPARQL 1.1 Federation Extensions (this document) * SPARQL 1.1 Update * SPARQL 1.1 Uniform HTTP Protocol for Managing RDF Graphs * SPARQL 1.1 Protocol for RDF * SPARQL 1.1 Service Description * SPARQL 1.1 Entailment Regimes * SPARQL 1.1 Property Paths * SPARQL 1.1 Conformance Tests
"
-> removed, keeping it only in the main document
4) "This publication includes the extension SERVICE to the SPARQL 1.1 Query specification. The structure of this document will change to fully integrate the new features."
-->
"This publication describes the SERVICE extension to the SPARQL 1.1 Query specification."
-> changed
5) Remove: "The design of the features presented here is work-in-progress and does not represent
the final decisions of the working group. Implementers and application writers should not assume that the designs in this document will not change.
"
-> removed
6)
"This document will be presented to the SPARQL Working Group, which is part of the W3C Semantic Web Activity." --> "This document was produced by the SPARQL Working Group, which is part of the W3C Semantic Web Activity."
-> changed
7) Add:
"Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress."
-> added
8) Section 1
"The growing suite of SPARQL query services offer consumers an opportunity to merge data distributed across the web. A small number of extensions to SPARQL 1.1 enable expression of the merging queries. In particular, a SERVICE allows one to direct a portion of a query to a particular SPARQL query service, just as a GRAPH directs queries to particular named graphs. This specification defines the syntax and semantics of these extensions. "
-->
"The growing number of SPARQL query services offer consumers an opportunity to merge data distributed across the web. The SERVICE extension allows one to direct a portion of a query to a particular SPARQL query service, similar a GRAPH graph pattern, which "directs" queries to particular named graphs in the (local) dataset . This specification defines the syntax and semantics of this extension."
-> changed
9) Meta-remark across all documents: we should hav econsistent capitalization of "Web" vs "web", "Semantic Web" vs "semantic web", etc.
-> changed
10) Remove:
"The SPARQL query language is closely related to the following specifications:
* The SPARQL Query for RDF [SQRY] specification defines a language for matching and reporting on RDF data. * The SPARQL Protocol for RDF [SPROT] specification defines the remote protocol for issuing SPARQL queries and receiving the results. * The SPARQL Query Results XML Format [RESULTS] specification defines an XML document format for representing the results of SPARQL SELECT and ASK queries."
-> removed
11) Section 1.1
You refer to fn: and rdfs: both of which aren't used in the document... In general, I suggest, you just say:
"This document uses the same conventions as and terminology from the SPARQL1.1 Query document [Ref]."
-> changed
12) Editorial note in the beginning of the doc:
"Editorial note The BINDINGS section will be moved to the SPARQL query main document: SPARQL 1.1 Query . All references to BINDINGS in this document will be removed."
Not sure, but wouldn't we want to actually leave the BINDINGS *example* in the document. The example in the query doc is not about the combination of SERVICE with BINDINGS. I think the example at least makes sense
-> example added
13) SECTION 2
Given that BINDINGS is now defined in Query, this should be renamed to
"SPARQL 1.1 Basic Federation Extension" -> done
and I'd change
"Queries over distributed data often entail querying one source and using the acquired information to constrain queries of the next source. This section covers the SERVICE operator giving examples of how to use it and its behavior."
to
"Queries over distributed SPARQL endpoints often involves querying one source and using the acquired information to constrain queries of the next source. This section illiustrates how this can be achieved using SPAQL1.1's SERVICE Graph patterns by examples."
-> done
I'd then remove subsection heading 2.1 and make subsubsections
2.1.1 -> 2.1 2.1.2 -> 2.2 2.1.3 -> 2.3 2.1.4 -> 2.4 2.1.5 -> 2.5 -> done
2.2 BINDINGS -> 2.6 Using SERVICE in combination with BINDINGS -> done
(in the following comments I will still use the old section numbers)
14) 2.1.1 "For instance, an endpoint which contains information about people working:
Data in <http://people.example/sparql> endpoint:"
not a sentence...
Next, I'm not sure about the names. Are these names of real people? I would rather use fictitious ones.
Also, I don't find the example very useful to just query a remote endpoint, without joining the data with any local data (in that case, I can directly query the endpoint, why should I want to use SERVICE here)... so I suggest, rather to rewrite the whole example as follows:
For instance, let us assume a SPARQL service endpoint available at <http://people.example/sparql> that contains the following data in its default graph:
<http://example.org/people/people15> <http://xmlns.com/foaf/0.1/name> "Alice" . <http://example.org/people/people16> <http://xmlns.com/foaf/0.1/name> "Bob" . <http://example.org/people/people17> <http://xmlns.com/foaf/0.1/name> "Charles" . <http://example.org/people/people18> <http://xmlns.com/foaf/0.1/name> "Daisy" .
which I want to combine with my local FOAF file at <http://example.org/myfoaf.rdf> that contains the single triple:
<http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows> <http://example.org/people/people15> .
The following query allows to get the name of persons I know from the remote SPARQL service.
Query:
SELECT ?name FROM <http://example.org/myfoaf.rdf> WHERE {
<http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows> ?person . SERVICE <http://people.example/sparql> { ?person <http://xmlns.com/foaf/0.1/name> ?name . }
}
This query, on the data above, has one solution.
Query Result:
name "Alice"
-> changed
15) Section 2.1.2
Again, I'd change the name to "Alice" -> changed
Is this example illustrating something that the first example doesn't illustrate? Is it so much different to have two service queries? It would be good to have a senctence in the beginning for each example that explains what it should show.
"For instance, an endpoint which contains information about people working:"
-> example removed, it does not say anything new
--> "Several SERVICE patterns can be combined in the same query to join results from different SPARQL service endpoints. For example, let us now assume two service endpoints which contain information about people and projects as follows."
16) Section 2.1.3
Again, there's no rationale what this example should illustrate. I assume something like "SERVICE patterns can be nested and used within other complex patterns, e.g. within OPTIONAL patterns. We again assume two SPARQL endpoints containing information about people and projects."
I don't think the example is correct as it stands, BTW... I think as you wrote it, it should only return the first three results.
Isn't what you want to write rather:
PREFIX people: <http://people.example/ns#>
PREFIX project: <http://project.example/ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?projectName
WHERE
{
SERVICE <http://people.example/sparql> { ?people foaf:name ?name . OPTIONAL { ?people people:worksIn ?project . SERVICE <http://project.example/sparql> { ?project project:hasTitle ?projectName . } } }
}
That would IMO return the results you put, and also illustrate nested SERVICE patterns.
-> changed to this example
17) Section 2.1.4
the use of dcterms:subject for a numeric id is a bit akward, dcterms:subject is meant to point at a subject/topic. I suggest to change the example something like as follows:
We assume the following data on sparql endpoints about various projects in certain subject categories in the default graph:
@prefix void: <http://rdfs.org/ns/void#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix doap: <http://usefulinc.com/ns/doap#> .
[] dc:subject "Querying RDF" ;
void:sparqlEndpoint <http://projects1.example/SPARQL> .
[] dc:subject "Querying RDF remotely" ;
void:sparqlEndpoint <http://projects2.example/SPARQL> .
[] dc:subject "Updating RDF remotely" ;
void:sparqlEndpoint <http://projects3.example/SPARQL> .
Data in default graph at SPARQL service endpoint http://projects2.example/SPARQL:
_:project1 doap:name "Querying remote RDF Data" . _:project1 doap:created "2011-02-12"^^xsd:date . _:project2 doap:name "Querying multiple SPARQL endpoints" . _:project2 doap:created "2011-02-13"^^xsd:date .
Data in default graph at SPARQL service endpoint http://projects3.example/SPARQL:
_:project3 doap:name "Update remote RDF Data" .
_:project3 doap:created "2011-02-14"^^xsd:date .
We now want to query the project names of projects on the subject "remote"
Query:
PREFIX void: <http://rdfs.org/ns/void#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX doap: <http://usefulinc.com/ns/doap#>
SELECT ?service ?projectName WHERE {
# Find the service with subject "remote". ?p dc:subject ?projectSubject ; void:sparqlEndpoint ?service FILTER regex(?projectSubject, "remote")
# Query that service projects. SERVICE ?service { ?project doap:name ?projectName . }
}
The bindings of ?service provide the location of the service to query, yielding:
Query result:
service title <http://projects2.example/SPARQL> "Query remote RDF Data" <http://projects2.example/SPARQL> "Querying multiple SPARQL endpoints" <http://projects3.example/SPARQL> "Update remote RDF Data"
-> changed to this example
18) "Editorial note When having variables for specifying the address of a SPARQL endpoint in a SERVICE operation this variable must be bounded. In order to clearly define what "must be bounded" mean we point to a boundedness definition. This is still an issue for the SPARQL Working Group, as it the question of having variables in SERVICE calls at all. Feedback from the community is encouraged."
Is this Ed note still appropriate here?
-> removed editorial note but I maintained the link to the boundedness definition, I think it makes sense to have such definition
19) 2.1.5
"SERVICE execution may fail due to several reasons: server down, wrong endpoint IRI, or there may be no results from the query. In order to allow users to continue with the other parts of t he query we propose to use a service silent operation Service(IRI,G,P,SilentOpt) which is false by default."
--> "The execution of a SERVICE pattern may fail due to several reasons: the remote service may be down, the service IRI may not be dereferenceable, or the endpoint may return an error to the query. Normally, under such circumstances the invoking query containing a SERVICE pattern fails as a whole. However, SPARQL 1.1 allows to explicitly allow failed SERVICE requests by the keyword 'SILENT'."
-> changed
Again, I'd prefer "Alice" -> all examples changed to Alice & friends
"Query result if an error happens when querying the remote SPARQL endpoint::" --> "Query result if an error happens when querying the remote SPARQL endpoint:" -> fixed
20) Section 2.1.6 is obscure to me... it talks a bout two results when there is one, it talkes about a query, when there is no query, I suggest to simply remove that section..
-> removed section
21) Section 2.2 BINDINGS
"In order to efficiently communicate constraints to sparql endpoints, the queryier may follow the WHERE clause with BINDINGS. In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"
I don't understand entirely, as in case BINDINGS doesn't appear in the SERVICE clause, the "constraints" don't even reach the remote endpoint... shouldn't we reformulate the example to actually have the BINDINGS *within* the SERVICE pattern?
That would make more sense to me.
Accordingly, I would suggest to rephrase:
"In order to efficiently communicate constraints to sparql endpoints, the requester may use SERVICE in combination with a BINDINGS clause (see [SPARQL 1.1 Query], Section 18.2.5.6 BINDINGS). In order to efficiently address the constraints, the query on http://people.example/data could be expressed as follows:"
Also, note that the advantage of BINDINGS only comes across if you use several bindings, since a single binding can be written directly into the query. So, I would suggest to think of a better example or drop the BINDINGS section alltogether.
-> I removed the whole BINDINGS section, I will put it back
22) Section 3 on syntax can be dropped. The syntax is clear from the grammar and illustrated with the examples already, I don't think the schematic syntax adds anything.
-> section removed
to be continued... at section 4.
Here comes the rest of my review... starting at section 4.
1) What do you mean by
"We introduce the following symbols:
* Join(Pattern, Pattern) * LeftJoin(Pattern, Pattern, expression) * Filter(expression, Pattern) * UNION(Pattern, Pattern)
"
these are defined in the query doc, they don't need to be re-introduced, right? -> right, I removed them
I understand, that you want to extend the transformation rules for GroupGraphPattern from in Section 18.2.2.4 Translate Graph Patterns of [SPARQL 1.1 Query Language], since you want to reuse information about variables already bound. Fine, but that should be said/explained.
So instead of the "We introduce" part, say:
"In order to define the transformation of SERVICE patterns we extend the transformation of GroupGraphPattern from Section 18.2.2.4 Translate Graph Patterns of [SPARQL 1.1 Query Language], since we assume the Service invocation
-> changed
2)
Why do you have two different definitions for
Definition: Evaluation of a Service Pattern
and
Definition: Service Silent Function
Can't they be merged into one, where SilentOpt is just a boolean flag that's true for SILENT (in which case execution doesn't fail) and
false otherwise (where overall execution fails)?
-> yes, I added Silent Function to Evaluation of a Service Pattern
3) I think this looks weird to me:
if IRI is a SPARQL service Service(IRI,G,P)) = Invocation( IRI, vars n bound, P, Bindings(G, vars) )
eval(D(G), Service(var,G,P)) =
Let R be the empty multiset foreach i in O(?var->i) if i is an IRI R := Union(R, Join( Invocation( i, vars n bound, P, Bindings(G, vars) ) , O(?var->i) ) ) else exection fails. the result is R
shouldn't this rather be:
if IRI is a SPARQL service Service(IRI,G,P)) = Invocation( IRI, vars n bound, P, Bindings(G, vars) ) else: eval(D(G), Service(var,G,P)) =
Let R be the empty multiset foreach i in O(?var->i) if i is an IRI R := Union(R, Join( Invocation( i, vars n bound, P, Bindings(G, vars) ) , O(?var->i) ) ) else exection fails. the result is R
also, by only projecting vars interect bound, you can have strange effects since the evaluation becomes order dependent, which I am not sure whether it is implied by the algorithm referred in Section 4.1. There you have:
"For each element E in the GroupGraphPattern"
note that this - per se - doesn't imply any order of the elements in GroupGraphPattern
However, I assume that you assume/imply that
{ P1 SERVICE i {... } P2 }
behaves different from
{ P2 SERVICE i {... } P1 }
do you? -> yes, if P1 and P2 are group graph patterns, if they are just triple patterns they behave the same: if P1 = ?s1 ?p1 ?o1 and P2 = ?s2 ?p2 ?o2 whould behave the same
-> that's a part a do not have clear, this was wrote by Eric, and I did not work much on it, since my idea was to incorporate the semantics we discussed by email a couple of weeks ago.
4) What about
" @@All binary operators that have open LHS: new UNION, MINUS, (NOT)EXISTS
@@SubSELECT??
"
?
-> I fixed it using Lee's comments
5) I skipped section 4.2 and 4.4, assuming it will be removed
I removed them
Lee's review
This review discharges my ACTION-385.
Overall: I think this specification still needs a fair amount of work before it is ready for Last Call. Please see my detailed comments below.
- Title: Since we've moved BINDINGS to the query document, should we
change the name of this document, since there are not multiple "Extensions"? Perhaps "SPARQL 1.1 Federated Query"? -> changed to Fed Query as you suggest
- Status of this Document -- this should be updated to reflect the fact
that this is now an active WG editor's draft. That said, this isn't that important since the SotD gets replaced when we publish as a WD. -> there is a reference to the editors wd, is that enough?
- 1. Introduction. Suggest rewording the first few sentences like:
""" This specification defines the syntax and semantics of the SERVICE keyword for SPARQL 1.1. The SERVICE keyword extends SPARQL 1.1 to support queries that merge data distributed across the Web. """ -> rephrased according to Axel's comments
- 1. Introduction. Replace the listing of other documents with the full
set of documents or a pointer to the overview document. -> I removed it following Axel's comments
- 1.1.1 The only prefix listed here that's used elsewhere in the
document is "xsd:". -> I removed it following Axel's comments
- 1.1.2 I don't think the formal definition of "binding" helps much. If
you do want to keep it, it should be in section 1.1.3 Terminology. -> moved to terminology
- 1.1.3 No need to repeat the information about IRIs. I'd remove the
first paragraph.
- 1.1.3 "and used in SPARQL" => "and reused in this document" ? -> changed
- 2.2 BINDINGS needs to be removed. It can be referenced informatively
from this document, but should not have its own section. Once this is done, Section 2 needs to be restructured so that it is all about the SERVICE keyword
-> Almost removed, I keeped it following Axel's comments
- I think there needs to be at least an informative, informal
explanation of the SERVICE keyword before diving in with examples. Perhaps the examples should all come after the syntax and semantics sections.
-> added explanation for each example
- I think the example should be more consistent both in presentation and
content. Specifically, the example in 2.1.3 specifies that the data is part of the default graph for the endpoint, but the previous examples don't specify that. More troubling is the fact that the data in 2.1.3 uses blank node subjects whereas the previous examples use URIs. It seems to me that all the examples could use a common set of endpoints and data which could be presented upfront before the examples. This might be easier to work with and less distracting as you read from one example to another.
-> I changed all the examples in the document, hope now is better
- In 2.1.4, is this an appropriate use of dcterms:subject? -> no, changed following Axel's comments
- I haven't tested this, but I think there's a syntax error in the 2.1.4
example - there should be a "." after the "void:sparqlEndpoint" triple pattern, right? -> it is not necessary, I tested it in other sparql endpoints such as bio2rdf
- What is the status of variables in SERVICE? There is still an
editorial note by the example in 2.1.4 that says this is unresolved. We need to clarify this before Last Call. -> I'd like to maintain what must be bounded mean, I deleted the editiorial note
- 2.1.5 Why is returning no results considered a failure condition? I
would omit that. -> it returns no results instead fo failing becaue the SILENT token
- 2.1.5 This text needs to be cleaned up to be more prescriptive. In
particular, it should not say that "we propose" something or other. It should say something like "The SILENT keyword indicates that error encountered while accessing a remote SPARQL endpoint should be ignored while processing the query. The failed SERVICE clause is treated as if it had a result of a single solution with no bindings."
-> fixed using Axel's comments
- 2.1.5 This section is presenting examples. It should not talk about
algebra constructs such as the Service(...) construct.
-> removed
- 2.1.5 The example should be improved. It should include valid data at
the endpoint and indicate the comparative results when there is an error at the remote endpoint and when there is not an error. To make this as clear as possible, there should be another part to the query that just accesses the local default graph.
- 2.1.6 This section needs to be rewritten. It is very hard to
understand. Here are some of the issues with is:
* There is no example. * I don't understand the comparison with GRAPH. * The terminology needs to be tightened to align with terminology
used in SPARQL query. (E.g., what is a "querying system"?)
* As with 2.1.5, the text here should be more prescriptive, and less
speculative sounding.
-> subsection removed
- 3 Syntax - I don't think all of these examples are necessary. I think
that one sentence about the SERVICE clause would suffice, along with the grammar rules. That would be more consistent with how SPARQL 1.1 Query presents syntax.
- This section should include the SILENT keyword.
- Remove 3.2.
-> I removed the whole section 3
- 4.1 I don't think this should restate the text from SPARQL 1.1 Query.
It should only include the new additions to the algorithm, along with a clear reference to where the new bit is inserted.
- 4.1 This doesn't seem to take SILENT into consideration.
- 4.1 The algebra expression given for an example seems to be completely
incorrect. Can this be checked?
- 4.1. I think the definition is unclear as written. Specific questions
I have are:
* What is "B"? -> BINDINGS * What does "if IRI is a SPARQL service" mean? -> if it is a valid URL pointing to a SPARQL endpoint * What is omega? -> the solution variables
- Remove 4.2. -> done
- I'm unclear as to how 4.1 relates to 4.3? -> removed section 4.3 following the discussion we had by email
- Remove 4.4 -> removed
- 4.5 needs to be explained in the context of 4.1 and 4.3.
- The conformance section needs to be tuned specifically to federated query.
Lee