This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The XQuery Update specification allows the expressions () or fn:error() to appear in certain contexts where updating expressions may appear. However, it does not allow the expression () or fn:error() to be enclosed in parentheses in such contexts. Thus it is OK to write: if (b) then delete node $x else fn:error() But it is wrong to write if (b) then (delete node $x) else (fn:error()) This violates the common expectation that any expression can be enclosed in parentheses without changing its meaning or validity. It makes life unnecessarily difficult for software that is generating XQuery code (for example, a stylesheet that generates XQuery from XQueryX), and it also makes life unnecessarily difficult for people implementing XQuery parsers, because they have to retain some representation of redundant parentheses in the expression tree until after semantic checks have been performed. The problem can be solved by partitioning expressions into three categories instead of two: updating, non-updating, and neutral. The expressions () and fn:error() should fall into the neutral category, and a parenthesized expression should have (as now) the same category as its content. Contexts that currently allow an updating expression or () or fn:error() should allow any updating or neutral expression. (And of course, places that require a non-updating expression should now require a non-updating or neutral expression) I would suggest that fn:trace() should also be added to the neutral category.
Please ignore the comment about fn:trace() - allowing update instructions to be traced is a much more complex subject than I thought. I've been trying to implement the rules that () and fn:error() (and no other non-updating expressions) are allowed in updating contexts, and the rules are surprisingly disruptive. The problem is that you can't tell whether one branch of a conditional, say, is an updating expression until you are well into the semantic analysis (because you can't do it until function calls have been bound to the relevant function declarations), and by the time you get to this stage of semantic analysis, a lot of the original syntactic detail will have been lost. Not just redundant parentheses, as mentioned in comment #0, but probably quite a lot else too. Many parsers work by constructing an expression tree in some kind of "core grammar" that discards syntactic distinctions - in my case, for example, this reduces typeswitch to an if/then/else construct. Ideally the rule that allows () and fn:error() in updating contexts should be rephrased to work in terms of the type system, so that the rule relates to the type of the result of the expression, and not to its syntactic form. That's probably a significant challenge. Short of that, I would suggest two new rules: (a) If the processor is able to infer that the result of an expression will always be an empty sequence or an error then it may allow that expression to appear in a position where an updating expression would be allowed. (b) If the processor is able to infer that a branch of a conditional/typeswitch will never be evaluated, then it is allowed to eliminate that branch before determining whether all branches are consistently updating or non-updating. (This would be the case if it were a dynamic error or type error rather than a static error, and one way of achieving this rule would be to reclassify the error.)
Another observation: the following is allowed by the current rules: if ($a=1) then () else if ($a=2) then error() else if ($a=3) then delete node $x else () but the following simplification is disallowed: if ($a=3) then delete node $x else if ($a=2) then error() else () The reason is that in the second case the "else" branch of the outer "if" is not an updating expression and is not one of the two permitted exceptions (() and error()).
"My eyes! The goggles, they do nothing!" etc, etc. I'm sure this is the tip of the iceberg, and many more horrible examples could be derived. I agree with Michael that this is a horrible rule to understand as a user, and nigh-on impossible to implement. I know for a fact that XQilla can't tell the difference between () and (()) when checking this rule. I think there are only two viable solutions to this problem: 1) Michael's updating / non-updating / neutral classification, along with a firming up of the rules for how these properties are derived. 2) Adopt the Scripting Extensions approach to updating expressions, and allow any non-updating expression to exist where an updating one is expected. This requires a definition of how certain expressions handle mixed updating / non-updating results, which we already understand reasonably well from SE. I'm pretty sure that the rules for (1) will be hard to describe, and I consider (2) to be the only philosophically sound solution, although it's obviously more disruptive.
I think a possible (and relatively simple) fix would be to change the places where we refer to "the expression () or fn:error()" by a term such as "an ineffective expression", and define "ineffective" to mean "the expression (), or fn:error(), or any non-updating expression that the processor is statically able to determine will always either return an empty sequence or fail with a dynamic error". We could attempt to define some additional kinds of ineffective expressions that processors are obliged to recognize as such, for the sake of interoperability.
The trouble is that the phrase "expression that the processor is statically able to determine will always either return an empty sequence or fail with a dynamic error" needs defining unambiguously, otherwise this will be a source of incompatibility between implementations.
My proposal was to use the phrase knowing that it described behaviour that would not be fully interoperable. This is analogous with the pessimistic static typing rules - implementations that can make better inferences than those defined in the spec are allowed to do so. Of course, if someone can come up with a better proposal, I'd jump at it. This is the best solution I can come up with.
Michael, The working group considered this issue on 22 Jan 2008 and decided to resolve it by defining a category of expression ("vacuous expression") that can be combined with either updating or non-updating expressions. Vacuous expressions will be detected statically, and will include ( ) and error() as well as other expressions whose effective return values are computed by vacuous expressions. If you are satisfied with this resolution, please change the status of this bug to Closed. Don Chamberlin (for the Query Working Group)
This bug does not seemed to be resolved yet based on the latest XQUF spec. > > > From: w3c-xml-query-wg-request@w3.org [mailto:w3c-xml-query-wg-request@w3.org] On Behalf Of Zhen Liu > Sent: 05 February 2009 20:13 > To: w3c-xml-query-wg@w3.org > Subject: issues of vacuous expression in XQUF > > In the latest XQUF spec, vacuous expression is defined as > Definition: A vacuous expression is a simple expression that can only return an empty sequence or raise an error.] > I recall it is introduced because March 2008 version of XQuery scripting extension has > added vacuous expression concept and then XQUF started to adopt this concept. > > My first question: > ===================== > My understanding from the rest of XQUF spec that defines how vacuous expression is > determined is that the vacuous expression analysis is done statically. This means one can > figure this out without dynamic evaluation. If so, we shall fix the definition to reflect this. It is > not clear from the definition that vacuous expression is determined statically. > > Furthermore, what type of static analysis is allowed here ? > > For example, given the following expression: > if (fn:true()) then () else 3 > > If the static analysis is sophisticated, it can figure out this expression only returns empty sequence, > so it is a vacuous expression. But if one follows the current XQUF spec AS IS, this is not > vacuous expression. This appears to be quite inconsistent as the current XQUF can deduce that > if (cond) then () else () > as vacuous expression, but why not > if (fn:true()) then () else 3 > > My second question regarding to function call: > ======================================== > According to 2.5.6 Function call in XQUF, it states that a call to the built-in function fn:error() > is a vacuous expression. What about calling a function which always return empty sequence? > That function call shall be considered as vacuous expression, right ? > > If so, then just as 'updating' keyword, > 'vacuous' shall be used as a keyword to describe function as well > But this is not the case in the XQUF spec. > > The other situation is that in many situations, a caller may invoke functions defined in other modules > whose query text may not be available to do static analysis, in such case, > how could one determine if a > function call is vacuous unless vacuous is a keyword to describe a function. > > So it seems to me that introducing vacuous expression in XQUF appears to be unnecessary. > The latest XQuery scripting extension has dropped the vacuous expression category completely. > Shall we revert the XQUF back to the version which does not have vacuous expression > definition ? > > The trigger that promotes me on this is that the latest XQUF conformance tests start to add > test cases that require static analysis of vacuous expressions crossing typeswitch, conditional > expr, sequence expr etc. While adding static analysis to support vacuous expression determination > is a simple exercise, there appears to be some fundamental issues > regarding to vacuous expressions (described above) that needs to be resolved. > > Thanks > > zhen >
1. I agree that the definition could be tightened. I would suggest: [Definition: Various expressions are defined in this specification to be vacuous: examples are <code>()</code>, <code>fn:error()</code>, and <code>if (x) then () else ()</code>. When evaluated, a vacuous expression will either return an empty sequence or raise an error. The analysis to determine whether an expression is vacuous is done statically.] 2. My understanding is that the WG rejected my original proposal to allow processors flexibility to decide that expressions such as "if (true()) then () else 3" were vacuous, preferring instead to define a simple set of interoperable rules. I don't see any reason for reopening that question. 3. No-one is ever going to deliberately write a function that always returns () or throws an error, so there seems no point at all in defining a keyword to allow such functions to be labelled. One could extend the rules so that a function call is vacuous if the body of the function being called is vacuous; but that seems an unlikely scenario, and it's hard to define the rule without risking circularity if the function is recursive.
in your comment #9, consider the xquery dev env where modules can be compiled and linked separately, without having 'vacuous' as keyword to describe a function, how could static analysis to determine the following expr is legal or not: import module namespace md = "http://foo.com" if ($a) then delete $x/a/b else md:raiseErr() From the function signature of md:raiseErr() , I only know it is a simple expression, but I don't know if it is really vacuous or not and the source code for module "http//foo.com" is not available. I agree that determination of vacuous expression has to be done statically. However, depending on how sophisticated your static analysis is, the interoperability is not guaranteed.
As currently defined, a function call on a function other than fn:error() is never vacuous and cannot therefore be mixed with an updating expression. I can see why you might want to allow this, but I don't think that's an enhancement we should be considering at this stage of the game.
for your comment #11, I understand if we disallow function to be vacuous, it simplifies the matter. However, is not that inconsistent with the vacuous expression definition which states that it is a simple expression that returns only empty sequence or raises errors. Imagine, for code modularity, people may wish to develop one generic error function that raises errors and then make all their code to call that common generic error function.
On 25 Feburary, the WG decided to change the definition to address concerns raised by Zhen: RESOLVED: Change the definition of Vacuous Expression to: [Definition: Certain expressions are defined in this specification to be "vacuous expressions". These all have the characteristic that they can be determined statically to either return an empty sequence or raise an error.] Some expressions are always vacuous; for instance, an empty parenthesized expression ( ) is a vacuous expression. Other expressions may be vacuous if one of their operands is vacuous; for instance, if both branches of a conditional expression are vacuous, the conditional expression is a vacuous expression.
(In reply to comment #13) > On 25 Feburary, the WG decided to change the definition to address concerns > raised by Zhen: > > RESOLVED: Change the definition of Vacuous Expression to: > > [Definition: Certain expressions are defined in this specification > to be "vacuous expressions". These all have the characteristic that > they can be determined statically to either return an empty > sequence or raise an error.] Some expressions are always vacuous; > for instance, an empty parenthesized expression ( ) is a vacuous > expression. Other expressions may be vacuous if one of their > operands is vacuous; for instance, if both branches of a > conditional expression are vacuous, the conditional expression is a > vacuous expression. > In the editor's draft, I changed this definition to the following, which I believe to be equivalent: [Definition: A "vacuous expression" is an expresson that can be determined statically to always return an empty sequence or raise an error.] Some expressions are always vacuous; for instance, an empty parenthesized expression ( ) is a vacuous expression. Other expressions may be vacuous if one of their operands is vacuous; for instance, if both branches of a conditional expression are vacuous, the conditional expression is a vacuous expression.
See Zhen's "first question" in comment #8, specifically the example if (fn:true()) then () else 3 This expression can be determined statically to always return an empty sequence, and so qualifies as a vacuous expression using the definition in comment #14. However, the intent is that it not be a vacuous expression, which is allowed by the definition in comment #13.
Jonathan, I don't believe your wording is equiavalent. The wording that we agreed on might seem contorted, but there was good reason for it: we wanted to make clear that "vacuous expressions" were defined extensionally (we provide a list of constructs considered vacuous), not intensionally (anthing that can be statically inferred to return () or error() is by definition vacuous). Your revised wording fails to capture this distinction. For example, someone could argue that under your definition, xx[0] is a vacuous expression: but it isn't.
(In reply to comment #16) > Jonathan, I don't believe your wording is equiavalent. The wording that we > agreed on might seem contorted, but there was good reason for it: we wanted to > make clear that "vacuous expressions" were defined extensionally (we provide a > list of constructs considered vacuous), not intensionally (anthing that can be > statically inferred to return () or error() is by definition vacuous). Your > revised wording fails to capture this distinction. > > For example, someone could argue that under your definition, xx[0] is a vacuous > expression: but it isn't. OK, I think I understand the intent now. But the definition the WG agreed on is neither a clear extensional nor a clear intensional definition. An extensional definition has to state what constructs are vacuous, and not just some examples (outside the definition) listed as "for instance". I suggest we hash this out on today's call.
>An extensional definition has to state what constructs are vacuous What the agreed definition does is to say that the detail is to be found elsewhere. >I suggest we hash this out on today's call. We spent some time on this before, I personally don't see why it needs to be reopened. Michael Kay
How about this definition, which is more explicit: <snip> The following expressions are defined by this specification to be "vacuous expressions": * An empty parenthesized expression ( ) is a vacuous expression. * A call to the built-in function fn:error is a vacuous expression. * If all branches are vacuous expressions, the typeswitch expression is a vacuous expression. * If both branches are vacuous expressions, the conditional expression is a vacuous expression. * If all operands are vacuous expressions, the comma expression is a vacuous expression. These expressions can be determined statically to always return an empty sequence or raise an error. </snip>
It might be explicit but it's not complete, for example it leaves out ((())) I can't see why you are trying to improve the text which we arrived at so painfully.
> It might be explicit but it's not complete, for example it leaves out > > ((())) > > I can't see why you are trying to improve the text which we arrived at so > painfully. Because the reader can not easily determine what is and what is not a vacuous expression without reading the entire specification and thinking about it. Because someone designing a test suite would have a hard time determining the complete list. Because searching for terms like "is a vacuous expression" in the text also apparently leaves out some instances, like the one you just cited.
I can see that I need to at least add this: A non-empty parenthesized expression is a vacuous expression if the expression it contains is a vacuous expression. Are there other cases that I am missing? If not, how about this: <snip> The following expressions are defined by this specification to be "vacuous expressions": * A call to the built-in function fn:error is a vacuous expression. * An empty parenthesized expression ( ) is a vacuous expression. * A non-empty parenthesized expression is a vacuous expression if the expression it contains is a vacuous expression. * If all branches are vacuous expressions, the typeswitch expression is a vacuous expression. * If both branches are vacuous expressions, the conditional expression is a vacuous expression. * If all operands are vacuous expressions, the comma expression is a vacuous expression. These expressions can be determined statically to always return an empty sequence or raise an error. </snip>
There's at least one other I noticed, namely a FLWOR can be vacuous if its return clause is vacuous. But on principle, I think it's a bad idea to include this list. It means we are saying things twice, and that creates the risk of inconsistency. Michael Kay
For the record, the final outcome in the text of the PR is that the list of vacuous expressions is included as a non-normative note.