Martin Probst's weblog

XQuery too complex?

Tuesday, December 6, 2005, 09:22 — 1 comment Edit

I’m repeatedly reading that XQuery was ‘too complex’, i.e. from Uche Ogbuji on xml-dev or in this article. That may be true if your implementing an XQuery engine, but the surprising thing is that this is mostly claimed by XQuery users (not implementors!).

I kind of wonder why. Sure, the spec took much too long, it’s quite complicated, and with all it’s references too static typing, XML schema etc. it can be quite scary. But which user reads the spec? And why should they?

What you have to know about XQuery are three things:

What you don’t need to know (but may find interesting) is: there is a lot more to XQuery (functions, typing, …), but it wont get into your way if you don’t explicitly ask for it. The whole syntax a newbie who knows XPath and XML, e.g. the typical XSLT 1.0 user, has to understand is this:

  let $foo := “bar”
  for $x in doc(‘a.xml’)/my/xpath[@attr = $foo]
  for $y := doc(‘b.xml’)/more/xpath
  where $x/@id = $y/@id
  order by $x/name
    <newtag>{ $x/name, $y/address }</newtag>

There’s a lot more out there, but this is everything you really have to know. for loops and let statements to aggregate XML, where for easy-to-use joins, XPath to actually get the data, XML literals to group it together. If that’s too complex for you, I don’t know. Anyone who was able to twist his mind around XSLT should be able to use that. Plus, in the areas where XSLT 1.0 fails horribly (strings, grouping, …) XQuery provides the functionality without forcing you to write 20 lines of stylesheet code for “replace($haystack, $needle, $replacement)”. This is just a subset of what XQuery has to offer, but it’s the most important part, and it’s the part that gives the user direct, large benefits over other XML technologies (XSLT, XPath) in querying XML.

The working group took a long time, actually too long, but this is something they really did right. There is a subset, a ‘core’ as Uche Ogbuji calls it, that gives all the important benfits and is really easy to learn.

Gentoo to Ubuntu

Sunday, November 20, 2005, 23:53 — 1 comment Edit


Ubuntu is an ancient african word, which means : "I'm sick of compiling Gentoo all the time" -- Jeff Waugh

Maven 2

Saturday, October 22, 2005, 11:38 — 3 comments Edit

I wanted to try out Apache Maven 2 today. So I started off with the tutorial and created a default project. After some playing with targets and plugins, a connection to timed out and maven reported that this repository had been blacklisted.

And that’s it. I grepped through all the files in ~/.m2, lots of POM files mention the given URL, but nothing looks like blacklisting. So basically the tool error’ed out and I don’t have any clue how to fix it. Plus, there is of course no documentation about this “feature” at all. Every attempt to download a plugin or a dependency fails by now.

Java 6 Preview

Friday, October 21, 2005, 23:44 — 0 comments Edit

Lars Trieloff is happy that Java 6 Swing picks up GTK themes. While that is certainly nice, what I like a lot more is that Java 6 runs a certain native XML database about 15% faster in our benchmarks.

Of course it’s quite hard to tell why, but it seems that Java 6 is bringing several new optimisations to the HotSpot virtual machine, most notably the ability to have small, local objects created on the stack, and not on the heap. That saves general allocation overhead plus the time needed for GC, so this may or may not be a reason. Anyways, it’s cool, and quite suprising that they can still get such big improvements.

Atom XML Schema

Saturday, October 8, 2005, 20:53 — 2 comments Edit

Does anyone know about an up-to-date XML Schema definition for Atom 1.0? I only found this one, which is nice, but it doesn’t fit the current spec very good, and I’m too lazy to fix it ;-). There must be a (non-RELAX NG) schema out there, or not?

I’m currently playing around with the Atom Publishing Protocol, Atom itself and this idea of a Atom-based web storage facility. I’m not completly convinced that it’s useful, I mainly wanted to try how hard it would be to implement something like that using X-Hive/DB. Or maybe it’s just that XQuery is being finalized, and I need a new quick moving target to complain about changes in the spec …

Speaking of that, we just released X-Hive/DB 7.0, which is really cool. I will probably write some stuff about it later, when the website is properly updated.

Collaborative Editing with Gobby

Wednesday, September 28, 2005, 12:47 — 1 comment Edit

There is a text editor for Macs called SubEthaEdit, allowing multiple users to edit files collaboratively. Quite cool, but while the editor is free you have to get yourself at least the smallest hardware dongle at $ 500 (iMac mini).

Now there is Gobby, an editor doing roughly the same for Linux, Windows and Mac. I just tried it out and it does work on a local machine. Unluckily I didn’t have a second box to try the advertised Zeroconf support etc., but it looks very promising!

Now all we need is a generic protocol for alle realtime collaborative editors …

Media-less Linux installation

Wednesday, September 7, 2005, 14:11 — 0 comments Edit

Install Linux without any media. If I had known of this slightly earlier (don’t know since when it exists, though) it would have saved me a lot of trouble. Installing linux on a ThinkPad X30 without any external drive can get quite difficult.

When installing Gentoo on it I managed to get there by booting a kernel which had it root filesystem on a NFS share on a second box. Works, but is quite a lot of hassle setting up the server. Plus you learn a lot of things about tftp, NFS etc. you really never wanted to know.

When installing Ubuntu I found out that you just need to have the kernel + initrd. I formatted my USB key, marked the primary partition as bootable using fdisk, installed grub and the Ubuntu kernel on it, and it actually worked, pulling the whole installer from the net. Except that sometimes my wireless LAN card was recognized in the installer, sometimes not. This works probably better by now.

The method described by Marc Herbert seems a little more difficult than the USB key drive, but if you don’t have a Linux system to set up the keydrive or don’t have a Notebook that supports booting from keydrives, it’s definetly the way to go.

[via Ben Maurer]

Java Unit Test Coverage

Sunday, September 4, 2005, 11:16 — 1 comment Edit

I’ve spent one and a half day last week setting up a Java Unit Test code coverage system. This was somewhat surprising to me, I don’t think something like that should take that long. The major problem was the state of the available tools. I wanted to find if there exist any usable open source tools first, so I avoided Clover, JCover & Co. Instead I tried:

So now we have something that is somewhat working. Somewhat because I ran into (presumably) a bug of ant (1.6.3) where custom junit task result formatters don’t get their extension passed along if the <junit/> task is set to forkmode=‘once’. This currently makes it impossible to view the results of the unit tests if they are run with code coverage enabled, and by that makes it quite difficult to hunt down errors. I still have to check if that bug is fixed in a later version of ANT.

The forkmode=‘once’ also lead to quite a number of errors on our side, as our test machinery relies on static class fields in several places, and those might be set to something wrong after a test. That’s probably an error on our side, but annoying nonetheless. The forkmode=‘once’ is necessary though, as anything else slows down the testing horribly.

In the aftermath coverage testing is quite nice, and the results are not as horrible as I expected. In most packages we have a coverage of over 90%. Most of the untested code is in generated classes. I presume most of it is untestable and not used at all. Code coverage in terms of lines or blocks is of course a very bad criterium for test completeness, path coverage wouldn’t be that much better too, but it can at least give you good pointers to areas that are under- or untested. Another step to better software development ;-)

PS: Also a plus for EMMA is that it’s self contained, only two jars, as opposed to other projects which require 6-8 libraries to be on your classpath. This is generally just a little more work to do when setting up, but wait until tool A requires a different version of a library than tool B. DLL hell for java, but that’s another story …

BOM of death

Thursday, August 4, 2005, 16:40 — 0 comments Edit

Note to self: next time you get really strange XML parse and comparison errors, try running this before looking at XML and Java files, cursing at XSLT, JUnit, Eclipse & the world in general for an hour:

find | grep -v .svn | xargs sed "s|\\xEF\\xBB\\xBF||" -i.from-bom

(Unix shell script to remove UTF-8 byte order marks from all files below the current directory).

Afterwards start cursing about Notepad, Windows, and Microsoft’s use of the BOM in general.

Writing strategy

Friday, July 15, 2005, 22:41 — 1 comment Edit

Bennaco tells us How to Ruin a Writing Project in 10 Easy Steps. After that, he writes how to really do it, step 1:

1. Decide that you're not going to really "do it". Which is to say, decide that you are not going to approach the whole big, terrifying, thing in one go. Instead, you're going to do some noodling around, some very small, easy, graspable, low-intensity, and non-threatening things, one at a time, until the project gets done.

While he is talking about how to write as in literature, I feel this does apply a lot to programming, too. If your starting to write a big piece of complex software, do not try to approach the whole big thing at once. Just start writing all the little parts that glue together to the whole.

Start off by dissecting the problem into smaller building blocks. This is the most important step and requires quite some time. Dissect the blocks more until you can describe what each block really does in two sentences, without “and then something magic happens”. Really figure out how the single parts work together, otherwise you’ll be screwed afterwards. Discuss every detail with you team members, if any, to really make sure it works this way.

Then start to write all the smaller parts. Don’t start with your “public static void main(String[] args)“, but rather with the smaller helper routines, the data model your working on, conversion etc. If you have a proper development environment, you can test those parts using you favorite Unit Testing framework. The important thing is not to implement something that does the full job partially, but rather do a small part completely. Otherwise you will end up with a codebase that is completly cluttered by adding feature after feature without a bigger plan. That results in rewrites over rewrites and lots of bugs, not to forget the maintenance nightmare.

If you just continue to do so, at some point you will start using these components and glueing them together, more or less automatically. At this point, everything you should be finished with all the low level stuff and just put together the system in a bigger sense.

Finally putting the blocks together and seeing how it takes off can be quite rewarding. On the XMLDB project I did at the end of my bachelor studies, the senior developers supervising us recommended to just get something working as fast as possible. We did not go that way but rather took 2 months of planning out of 7. Then we started writing components, testing and slowly putting them together. The system didn’t run a single query until after 4 12 months. But at that point, most of the hard work was done and we managed to deliver a working XQuery and XUpdate implementation including a persistent storage backend on time. And that with seven students, of whom one was busy writing a GUI for the server and one was doing documentation, infrastructure and other related work.

I just cited Bennaco’s first point, but the rest is quite similar to what I wrote. Lots of refinements of an abstract plan until it’s really trivial to write the single steps and glueing them together.

New Post