Martin Probst's weblog

Tail Recursion in XQuery

Tuesday, February 7, 2006, 12:58 — 5 comments Edit

Today I finished a feature for the X-Hive/DB XQuery processor that has been sitting on my wishlist for quite a while: Tail Recursion. The problem is that in functional languages (like XQuery) you often write recursive functions to achieve some goal. While this is very elegant, it has the problem that you can run out of stack space quite fast if you process deeply nested structures or long lists. Tail Recursion can solve this in certain cases by evaluating the function in an iterative way. E.g.

declare function local:sum($start as xs:integer) as xs:integer
  if ($start eq 0) then 0
  else $start + local:sum($start - 1)
If you rewrite this function to this:
declare function local:sum($start as xs:integer, $acc as xs:integer) as xs:integer
  if ($start eq 0) then $acc
  else local:sum($start - 1, $start + $acc)
the interpreter can evaluate each tail call after the method body has been evaluated, because the values returned by the call are not used in the calculation of the body, but rather returned directly. To enable this, the path to the tail call within the function must only contain if/else or sequence (“,”) operators, as opposed to the “+” operator above.

To enable this in X-Hive, we had to turn of lazy evaluation of the function call parameters for tail calls (otherwise you’d run out of heap space), which should not be a problem, and we had to accept a minor non conformance. The problem is that you can’t really check function return values as mandated by the standard if your function doesn’t really return the whole evaluated sequence. The end result of the whole function call is still checked, but subresults of the single tail calls aren’t. We consider this to be ok, as it doesn’t create false results for correct queries and the examples of offending queries are pretty contrived. I would be very suprised to encounter something like this in the wild:

declare function local:foo($x) as xs:integer
  if ($x eq 0) then ()
  else (1, local:foo(0))
Calling this function with any value for $x will return a single “1”, which matches the declared return value, but the result returned by the intermediate local:foo(0) call does not - it’s the empty sequence which doesn’t match the declared “exactly one xs:integer”. While we’re aware that this is interpreting the Rules for optimization and error cases in a very liberal way, we feel that it’s worth it, as tail recursion allows a wide range of problems to be solved within XQuery that would be impossible otherwise.

Hi Martin,

minor nit-picking: the f(0) in local:foo should be local:foo(0).

While I agree with your analysis of conformance concerns, I stumbled across your statement “To enable this, the path to the tail call within the function must only contain if/else or sequence (”,“) operators …”.

In which way is ‘+’ different to ‘,’ when it comes to tail calls? Would the recursive call in

declare function local:bar($x as xs:integer) as xs:integer* { if ($x eq 0) then () else ($x, local:bar($x - 1)) }

be considered a tail call by X-Hive/DB?

Cheers, –Torsten

The ‘+’ operator needs the righthand value to calculate the result, whereas the ‘,’ operator just appends. This way you can evaluate the method body if you just remember that you have to insert something at the end, so your local:bar() would be considered tail callable (and it is, indeed).

Aha, thanks for the explanation. What about this variant of loca:bar() then?

declare function local:bar($x as xs:integer) as xs:integer* { if ($x eq 0) then () else (local:bar($x - 1), $x) }

In Haskell terms, are the left and right sections of ‘,’ treated alike when it comes to tail callability?

Kind regards, –Torsten

No. It’s probably possible to get that working (with the amount of work needed varying depending on your internal processing model), however we don’t do that. Tail Calls are expected to happen at the tail, so we can just append the values, otherwise you’d have to remember positions in the stream, which (at least for us) greatly complicates the thing and might defeat the purpose …

OK, so in a sense X-Hive/DB implements a variant of ‘tail call modulo cons’ as introduced by David Warren (of WAM fame): here, the role of the cons head is played by the RHS argument of ‘,’.

Thanks for the discussion, Martin. –Torsten