Martin Probst's weblog

Extensible modules in XSLT and XQuery

Saturday, January 20, 2007, 13:14 — 0 comments Edit

Michael Kay notices something on the extensibility of XSLT vs. XQuery code:

(2) It's interesting to note how hard it would be to do this in XQuery. The main XQueryX stylesheet certainly benefits immensely from XSLT's top-down apply-templates processing model, but in theory it could have been written in XQuery. The modification layer, however, that changes the behaviour of the transformation to do something slightly different, would be quite impossible to add without modifying the source code of the original query. This is an observation I've made in a number of larger applications: once you want to write code that can be reused for more than one task, XSLT has quite a few features that make it a stronger candidate for the job than XQuery.

Where “this” is importing an existing stylesheet and overwriting part of it’s implementation by redefining a specific template. This is indeed something that is currently very difficult in XQuery. Templates are somewhat like a function in XSLT, and it’s possible to selectively overwrite/replace certain templates/functions.

Quite remarkably this kind of extensibility wouldn’t even possible if XQuery had higher order functions - in that case the original module would have had to anticipate that someone might want to replace that certain function.

Embedding XSLT transformations in JARs, or how I learned to (not) love the resolver

Saturday, January 20, 2007, 12:26 — 2 comments Edit

Procrastination

Once again I’m trying to switch our X-Hive build to Maven (2). I’ve tried that about two years ago, but at that time Maven 2 was not quite ready, and Maven 1 was already deprecated. This is a clear act of procrastination, I should be preparing for my tests. I think I could seriously increase my productivity if I always had some tests to procrastinate around…

XSLT-based Mojo

Anyways, I needed to write a small build plugin as in X-Hive some sources are generated from XML specifications using XSLT. Java build plugins are called Mojos in Maven. I wanted to create one single jar containing both the Mojo itself and all required resources, namely the XSLT file and the DTD files that are used by both the XSLT document and the XML files that are transformed. Please don’t ask me who came up with using DTDs in an XSLT file. I thought I’d simply set up a resolver, but the whole process was surprisingly painful.

In Java you create a Transformer using a TransformerFactory. Both have a setURIResolver() method that takes a URIResolver.

Resolvers

But the surprise is that you need a URIResolver to resolve XSLT document() calls, but an EntityResolver to resolve DTD load operations. No idea why they separated that (even if someone wanted to redirect XSLT document calls but not regular document loads, they could simply have extended the existing API…).

The tricky part is getting the Transformer to actually use your EntityResolver - it doesn’t provide any API for that. You need to create an XML reader, set the entity resolver on it, and use that to create a SAXSource with the input source you created using your resolver. The reader is then used by the XSLT transformer to load the XSLT document. Still there?

Creating the Transformer

I ended up with this code:

    /* create the transformer */
    TransformerFactory factory = TransformerFactory.newInstance();
    final Resolver resolver = new Resolver();
    factory.setURIResolver(resolver);
    Transformer transf;
    try {
      XMLReader reader = XMLReaderFactory.createXMLReader();
      reader.setEntityResolver(resolver);
      InputSource is = resolver.resolveEntity("", "mystyle.xsl");
      SAXSource source = new SAXSource(reader, is);
      transf = factory.newTransformer(source);
    } catch (... lots of possible exceptions ...) { ... }

Resolver simply loads Resources from the current ClassLoader using ClassLoader.getResource(filename).openStream() (beware of null returns). When creating an InputSource you’ll have to set system ID and public ID to something meaningful on them or you’ll get MalformedURIExceptions from the XML parser.

The actual transformation

The actual transformation is quite similar, however in this case we do not resolve the input source with our resolver, as it’s a real file lying around on the disk, not in our jar. We still have to provide the resolver though because of the DTD files in the jar.

      try {
        XMLReader reader = XMLReaderFactory.createXMLReader();
        reader.setEntityResolver(resolver);
        InputSource is = new InputSource();
        is.setSystemId(inputfile);
        SAXSource source = new SAXSource(reader, is);
        StreamResult target = new StreamResult(targetfile);
        transf.transform(source, target);
      } catch (...) { ... }

Let’s remember: all we wanted to say is:

Dear Java, here is an XSLT file. Now process this file. Now process that file. And let me do the resource loading.
I could also express that in one not all too ugly sentence. In Java, you need about 80 lines of code, including error handling, not including comments, and not including the actual resolver implementation. Add to that the code I wrote to “get all files below path X with an extension of Y” (= 40 LOC, and why the hell does Java’s File do new File(parent, childname) instead of parent.getChild(name)?) and you end up with 220 lines of code. Actual domain code, e.g. what I really wanted to do, is about 100 lines of code, and those could be expressed with radically less code. With a proper API and a proper programming language this ought to be less than 20 lines.

Mojo API

Mavens Mojo (= “Maven Plain Old Java Object”) API is quite nice. The actual code needed to interact with Maven is not much - some annotations on member variables, the execute method, that’s basically it. They read in javadoc @tags and generate a Mojo description from that, and then use reflection at run-time to execute the task. Again a nice use of reflection for declarative programming.

Update: Oleg Tkachenko shows a .NET version of this. Looks significantly easier, though I’m not sure it does the same thing (entities+XSLT doc() calls).

Vista - beta for the masses

Monday, January 15, 2007, 14:58 — 0 comments Edit

I recently installed Windows Vista on my girlfriends computer. The previous Windows XP already had some graphics issues and then died away completely after failing to install some update.

Initially I was quite impressed with Vista. I had installed it before on my MacBook Pro, just for the fun of it, but deleted it again later as it’s simply to big to keep around just for fun. On the 2.4 GHz Dell machine it looked quite nice, and many long awaited features are there.

However I soon found out that you’ll need new drivers for any existing hardware you have. Not like Windows NT -> 2000 -> XP, were some or most drivers continued to work. Also, current driver support by manufacturers is quite bad, only beta drivers if any available.

This is somewhat to be expected, as Vista is really new. However what strikes me is the amount of bugs. I installed nearly all Windows versions when they were quite new, and I never noticed real showstopper bugs. With Vista, the first thing you see from the new Explorer is a hang. Then USB devices randomly fail. Hotplug of USB stops working, all new applications have major stability issues. Mix in some good old friends from Windows XP (randome Explorer hangs). This is the by far worst Windows release I’ve ever seen. At the same time it’s the most promising, as it provides a lot of really nice features and applications. However Microsoft seems to have a major problem at controlling that complexity and at productivity. Apple ships operating systems with a comparable feature set without near as many issues, and with much less personnel.

The new user interface in Vista is quite shiny. The task bar and alt-tabbing with window previews are nice. But nothing totally revolutionary, they mostly copied the features from OS X (just as they did with the applications and even the background image on the login screen…).

So I’m still quite happy I got my Mac. I had my share of hardware troubles, but Mac OS X makes up for that. Windows still has a lot of weird user interfaces, and now also a lot of stability issues.

Reflection based Java extensions for X-Hive/DB

Friday, January 12, 2007, 17:34 — 1 comment Edit

I’ve just finished a nice feature for X-Hive/DB 8.0, Java extensions based on reflection. The idea is to use reflection to introspect in a Java class and provide the functionality there to an XQuery statement. This is (intentionally) quite similar to functionality in other XQuery or XSLT implementations. Example:

import namespace math = ‘java:java.lang.Math’;
$math:PI, $math:sqrt(4)

Type marshalling

Most of the work went into type marshalling from XQuery to Java and back. The code inspects Java method and constructor parameters and then creates appropriate parameters for it from whatever is passed in XQuery, casting types as necessary (using XQuery type promotion). The code also maps Java collections as necessary, using Java 5 generics to find the actual type. So this basically means you can write:

String getFoo(Iterator it, String[] s, int x);
And in XQuery:
import module namespace x = ‘java:package.MyClass’;
let $nodes := //foo
let $s := (“hello”, “world”)
return x:get-foo($x, $nodes, $s, 5)
… and X-Hive/DB will automatically create the required arrays/iterators etc. Same goes for return values. It currently understands anything that is Iterable, Iterator, Lists and Collections, Arrays, and Sets. You might have noticed the extra $x parameter - this is an instance of MyClass as non-static methods naturally need an instance of the declaring class. Such instances can be constructed using constructors via x:new(…) or passed in via external variables. Also note that the Java-ish “getFoo” is translated to a more XQueryish “get-foo” :-)

I think this is really handy. It should also not be that expensive as all reflection stuff is done at compile time, so in large queries this should not make a difference. Additionally you can of course always create your XQuery once and re-use it.

I initially started to implement this to make an integration with the Java Servlet API less painful. I.e. you would import a module from ‘java:javax.servlet.http.HttpServletRequest’ and access it’s methods to handle sessions and such. More on that in a later post.

Academic re-use

Coincidentally, this project was also a nice match with my University schedule. I held a presentation about Java reflection in the Metaprogramming and reflection course this semester, and was able to use examples and experience from the code there. The drawback of the topic was that research on Java reflection is actually quite boring. The other topics included somewhat esoteric Smalltalk stuff, dynamic languages (Ruby, Python) etc. To compensate this I started looking into Erlang ;-)

Bad books

While doing research for that paper I found a sample chapter from the book “Java Reflection in Action” (the book and the sample chapter) about “Call stack introspection”. The book describes how you might use Throwable.getStackTrace() to inspect the current call stack. Quite surprisingly they recommend this to prevent infinite recursion, do logging, and for security checks to see if the code calling you came from the right package. However they do not mention that it’s totally o.k. for JVMs to randomly leave out any or even all the information from that, even though that’s clearly documented in the API.

Where is the glue in loose coupling with REST?

Friday, December 1, 2006, 09:19 — 1 comment Edit

Just a thought: all the REST people argue that this particular architecture made it possible to scale systems to the size of the web. This is - according to them - possible in particular because REST services as they are on the web provide a uniform interface to all consumers, which leads to loose coupling etc.

Now, the thing with loose coupling is that you need something to plug the loosely coupled things together, and probably provide some glue between them. Strongly typed systems like SOAP ensure compatibility between components by requiring consumers to explicitly adhere to certain types. This is sometimes painful, but it somehow makes sure the client nows what he’s doing.

The web, as it is today, does not have this problem at all. Not because the architecture was any better, but because the things that glue interfaces together are magical systems capable of adjusting to unbelievable complexity, dynamic learning of interfaces and complex reasoning over input types. Yeah, humans.

Has someone thought about this? What’s the RESTafarian answer?

That boot thing again

Thursday, November 30, 2006, 13:46 — 0 comments Edit

Every once in a while I mess up my old Thinkpad X30 (currently in use by my girl friend) so that it stops booting, I have to reinstall the operating system or something similar.

Now the problem is, the Thinkpad X30 does not have any removable drives. That’s right, any. So no CD-ROM, no Floppy, no nothing. Which leaves three methods to boot:

The external CD-ROM trick is by far easiest and - as far as I know - the only way to (re-)install Windows on the machine. Problem is: I don’t own an external CD-ROM, and I don’t feel like buying one just for this old notebook.

PXE network booting is quite elegant. You set up a server that serves boot images to clients, and at some point in time I had a working setup that distributed a kernel to clients that would be able to install a full Debian system via network. Problem is: this requires quite a complicated setup, custom compilation of a fitting kernel etc. pp.

The USB KeyDrive should be the easiest way. Should. It’s totally beyond me why every single web page lists a whole set of instructions to make a key drive bootable. Why not just: download this image-usb.bin, dd if=image-usb.bin of=/dev/sda, or even better some sort of executable that does that for you. The only exception is HPs tool, but I didn’t get that to work either.

So next steps will be to try and find someone who owns an external CDROM, then try setting up that PXE server.

Misunderstanding Eclipse messages

Friday, November 10, 2006, 13:35 — 0 comments Edit

Please wait while the online information is indexed. This will happen only once.

Of course, they meant “only once for each time you open the help”.

PHP Soap "code generator"

Thursday, November 9, 2006, 12:01 — 3 comments Edit

I somehow tragically got into a project where we need to access fairly complicated SAP WebServices from PHP using SOAP. PHP provides the SoapClient class, but there is apparently no way to generate code from WSDL files to make your life easier. I created two small tools to help me with that, one to make a WSDL definition browsable, another one to somehow generate PHP code from the WSDL.

dumpws.php:

<?php
include(‘../main/auth.php’);

if (isset($argv[1])) { $wsdl = $argv[1]; } else if (isset($_GET[“wsdl”])) { $wsdl = $_GET[“wsdl”]; } else { echo ‘<form method=“GET” action=“’ . $_SERVER[“PHP_SELF”] . ‘”>’ . “\n”; echo ‘Specify a WSDL to dump: <input name=“wsdl” type=“text”/></form>’; exit(0); }

// replace word character groups delimited by non-words with a hrefs to #word function hrefify($str) { return preg_replace(‘/(\\W)?(\\w+)(\\W)?/‘, ‘\\1<a href=\’#\\2\‘>$2</a>\\3’, $str); }

$options = $SAP_AUTH; try { $soap = new SoapClient($wsdl, $options); echo “<h1>$wsdl</h1>”; echo “<h2>Types:</h2>\n”; echo “<pre class=“prettyprint”>\n”; foreach($soap->getTypes() as $name => $type) { preg_match(‘/\w+ (\w+)/‘, $type, $matches); $name = $matches[1]; echo “<h4><a id=‘$name’>$name</a></h4>\n”; echo hrefify($type); } echo “</pre>\n”; echo “<h2>Functions:</h2>\n”; echo “<pre class=“prettyprint”>\n”; foreach($soap->getFunctions() as $name => $func) { preg_match(‘/\w+ (\w+)/‘, $type, $matches); $name = $matches[1]; echo “<h4><a id=‘$name’>$name</a></h4>\n”; echo hrefify($func); } echo “</pre>\n”; } catch (SoapFault $f) { echo $f; } ?>

create_types.php:

<?php
/*
 * Use to transform the types in a given WSDL file to PHP classes.
 * <p>
 * Think of this as a really poor code generator…
 */

include(‘../main/auth.php’);

if (isset($argv[1]) && $argv[1] != “”) { $wsdl = $argv[1]; } else if (isset($_GET[“wsdl”]) && $_GET[“wsdl”] != “”) { $wsdl = $_GET[“wsdl”]; } else { echo ‘<form method=“GET” action=“’ . $_SERVER[“PHP_SELF”] . ‘”>’ . “\n”; echo ‘Specify a WSDL to transform to PHP: <input name=“wsdl” type=“text”/></form>’; exit(0); }

$options = $SAP_AUTH; $actual_types; try { $soap = new SoapClient($wsdl, $options); echo “<pre class=“prettyprint”>\n”; echo “&lt;?php\n”; echo “// This file was generated from $wsdl using create_types.php\n\n”; $types = $soap->__getTypes(); foreach($types as $type) { if (preg_match(“/^struct./“,$type)) { // this is a complex type preg_match(‘/struct (\w+)/‘, $type, $matches); $name = $matches[1]; $actual_types[$name] = $name; echo “class $name {\n”; $lines = array_slice(explode(”\n”, $type), 1, -1); foreach ($lines as $line) { $line = preg_match(‘/ (\w+) (\w+);/‘, $line, $matches); $subtype = array_values(preg_grep(“/^\w+ $matches[1]/“, $types)); if (isset($subtype[0]) && !preg_match(“/^struct./“,$subtype[0])) { // primitive type echo “ // ” . $subtype[0] . “\n”; } else { echo “ // ” . $matches[1] . “\n”; } echo “ public \$$matches[2];\n”; } echo “}\n\n”; } } echo “\$classmap = array(\n”; while (list($name, $class) = each($actual_types)) { echo “ \”$name\” => \”$class\”“; if (current($actual_types)) { echo “,”; } echo “\n”; } echo “);\n”; echo “?>”; echo “</pre>\n”; } catch (SoapFault $f) { echo $f; } ?>

Mighty right clicks

Sunday, October 22, 2006, 17:41 — 0 comments Edit

I tried the Mighty Mouse, and now I tried the wireless Mighty Mouse. It’s totally impossible for me to reproducibly right click with them. I always manage to actually get a right click from the mouse, but quite often only on the third attempt. This really sucks, it’s both unnerving and unsafe (in situations where a left click means something dramatically different from right click). I heard from others they are fine with the mouse … strange.

Anyways, I’ll rather go with Logitech as in the past. Initially I wanted to get a Bluetooth mouse, but I now figured that wireless gives me close to nothing - if the cable annoys you it’s simply time to tidy your desktop and get the blocking things out of the way :-) - and I’d need to worry about batteries. Plus the higher price of course.


New Post