Martin Probst's weblog

Learned ruby today

Tuesday, May 30, 2006, 21:41 — 0 comments Edit

Well, not quite, but at least I wrote my first ruby script, something using the IMAP API and walking the tree in my mail account. I also debugged a tool called RExchange, something that retrieves calendar items from Exchange servers. I’ve not been succesful at that, still.

This language is ridiculously easy and straight forward. I’m not sure if one should be allowed to call this programming ;-)

Tomorrow: Ruby on Rails.

Mail and server woes

Tuesday, May 30, 2006, 16:14 — 0 comments Edit

Quite some time ago I lost my login password to my hosters configuration area. It didn’t really bother me as I didn’t have anything to set up, but yesterday I actually called them and received a new password (positive: they will only ever give you a password if they call you back on the previously set up telephone number).

Strange business practice

After that, I noticed they have a new offer which is significantly, so I upgraded my “Hosteurope WebPack L (r1)” to a “Hosteurope WebPack L (r2)”. What I failed to note whas the warning text telling me that I would loose all my data. So when I called support today asking why my email and webpage didn’t work, I was quite surprised. Hosteurope is a good hoster, I’ve never had any problems with them. But this is pretty annoying - I can’t think of a technical reason not to just copy the data over (if they even have to migrate to a different server), in case someone changes the contract.

So my guess is that they only do this to keep customers from upgrading to cheaper contracts. Which I really don’t like, and also the way they do this is quite dangerous: there is a single dialogue comparing prices in the two systems and then a button that says “upgrade” below that. As far as I can remember, there was no 40pt red letter warning confirmation or anything, the warning must have been in the small text above that. Which is totally ridiculous, an operation that can cost you your whole email archive not guarded by anything but some small text? And that only for the small commercial benefit of increasing the opportunity cost for upgrading?

Luckily I noticed soon enough so they still had the original email server running, so I could just copy my mail using IMAP. I would also have had a local backup of my ~/Library/Mail dir, but still.

Spam

I had quite some trouble creating a backup of the MySQL database on the server, surprisingly phpMyAdmin refused to “send” a backup. After some time I found out that an old webpage I keep around had a public comment function which was largely abused by spammers. The corresponding SQL tables had grown to 75 MB. I wonder how beginners create webpages nowadays, you have to get an expert in anti-spam technology before you can put something online “

If you’ve tried emailing me yesterday or today and didn’t receive a response, try contacting me again, some stuff might have been lost.

XQuery as a web scripting language

Tuesday, May 16, 2006, 08:17 — 1 comment Edit

In my last post, I threw some XQuery together to provide a browsable outline of an XML document in a browser. I used XQuery as a web language by embedding it into JSP pages and using our X-Hive/DB tag library. This somewhat sucks, because I don’t like JSP generally and our tag library is somewhat under maintained. It would probably be quite easy to get it up to speed again as it’s not really much code, and of course customers can modify it as we deliver the source code to them.

I’d really like to use XQuery exclusively. On first thought, it’s a perfect fit for a web language, being functional and such, but on after thought you run into quite some problems. Just returning the query result as a web page is nice, but that doesn’t get you very far. To provide a useful web interface you need to set arbitrary HTTP headers (most importantly response codes).

MarkLogic uses custom functions to do that, e.g.

xdmp:set-response-content-type(“text/html”),
<html>
  …
</html>
eXist goes with the same approach, I can’t find something about headers though.

We used to do that with our debugging capabilities, e.g.

xhive:queryplan-debug(‘stdout’), …
But the problem with that method is that it’s highly un-functional. It doesn’t fit the language to have side-effect functions that always return the empty sequence. And it’s not only ugly, it might get you into real trouble, e.g.
  let $doc := doc($uri)
  return
   if ($doc/type = “text”) then
     xdmp:set-response-content-type(“text/html”)
   else
     xdmp:set-response-content-type(“application/pdf”)
This is a bit contrived and you could work around it in this case (construct the content type first, then call the method), but in the general case the query processor is allowed to evaluate both function calls, and in any random order it finds suitable. So you might end up first setting the content type to html, then to pdf, and then delivering an HTML document. We now use a syntax like “declare option xhive:queryplan-debug …;” in the query header, but that of course doesn’t work for HTTP headers. (I’m not writing this to bash on MarkLogic, I can understand their decision to do it like this very well, and it’s simply an ugly problem - I just took them as an example as their documentation is readily available).

The only XQuery-ish solution would probably be to provide a custom document format to encapsulate those web specific results. Have a document type called HTTP response (and one called HTTP request) and have the query return this document, e.g.

<response xmlns=“http://http/what/ever">
  <header>
    <code>HTTP/1.1 404 Not found</code>
    <entry name=“Location” value=“…” />
  </header>
  <body xmlns=“”>{
    for $x in …
  }</body>
</repsonse>
This would probably work, though it puts a bit more work on the user. Should be possible with some library functions though … one would need to pay attention to some escaping issues etc. And there is probably quite a lot I didn’t think of …

Now, does anyone now such a format? I didn’t really research, and “xml http format” doesn’t make up a good Google query. Maybe having a consistent format between different XQuery implementations would be nice, too, as this stuff goes into the query, and the whole point of XQuery is to have portable queries. Or other people have completely different ideas on how to do this, I’d like feedback anyways.

Update: Lars Trieloff comments that he would rather use processing instructions on the document root level, e.g.

<?http.header Location: /other/path ?>
where the name is open to discussion. This raises two minor escaping issues: the user might already use < ?http.header?> (or whatever) and PIs don’t support namespaces, and the user might include line breaks in the processing instruction. The latter is probably just an error. The benefit of this technique is that it would be a lot less invasive to user code.

Create a document browser with XQuery in 50 SLOC

Saturday, April 22, 2006, 15:49 — 0 comments Edit

Some time ago I was asked to create a quick demo of how to create a documentation browser with X-Hive/DB. The idea was to present one huge document with a tree navigation on the left and the content of a selected tree node on the right. An XSL stylesheet to transform the document into HTML already existed.

AMM Browser

For the curious: AMM stands for “aircraft maintenance manual”, this documents the procedures necessary for different defects or routine maintenance on air planes. And no, the engineers don’t do Lorem ipsum all the time ;-)

This is something XQuery was designed for. I took an existing JavaScript tree library that eats a trivial XML format and the X-Hive/DB JSP taglibrary. The whole application consists of three jsp files:

index.jsp

Some quirky HTML and my misguided attempts to create CSS, plus this script code:

  tree=new dhtmlXTreeObject(“treeboxbox_tree”,“100%”,“100%”,0);
  tree.setImagePath(“imgs/”);
  //set function object to call on node select
  tree.setOnClickHandler(onNodeSelect);
  tree.setXMLAutoLoading(“tree.jsp”);
  tree.loadXML(“tree.jsp”);
  function onNodeSelect(nodeId) {
    ajaxpage(“content.jsp?id=” + nodeId, “content_box”);
  }
… which just tells the JavaScript library to use tree.jsp for the tree and content.jsp for the content. Can’t get much easier.

content.jsp

Probably the most trivial query possible:

      1 <?xml version=“1.0” encoding=“utf-8” ?>
      2 <%@ taglib uri=“http://www.xhive.com/taglibs/xhivetags-1.0" prefix=“xhtags” %>
      3 <xhtags:session>
      4   <xhtags:transaction>
      5     <xhtags:contextnode contextNodePath=“/amm”/>
      6     <xhtags:xquery>
      7       //*[@KEY = “<jsp:expression>request.getParameter(“id”)</jsp:expression>“]
      8     </xhtags:xquery>
      9     <xhtags:foreach>
     10       <xhtags:transform styleUrl=“amm-common.xsl” />
     11     </xhtags:foreach>
     12   </xhtags:transaction>
     13 </xhtags:session>
This JSP simply imports the X-Hive taglib, opens a session with the database, within that a transaction, takes a context node in the library, runs an XQuery for nodes with a specific ID (the parameter passed as “id”) and runs a stylesheet on each of the returned nodes. The necessary configuration (X-Hive database path, DB user and password, cache size) is taken from web.xml.

tree.jsp

A simple XQuery to produce the XML format the JavaScript library requires from the document in the database. The library expects something like this:

      1 <tree id=“0”>
      2   <item id=“1” text=“foo” child=“false”/>
      3   <item id=“2” text=“bar” child=“true”/>
      4 </tree>

Which simply means: the tree with the ID “0” (which is an opaque string, it is only expected to be unique) has two children, one with the description “foo” and the other called “bar” and IDs 1 and 2, respectively. The latter has children, too. It also supports fancy stuff like special icons etc., but I left this out for simplicity.

The document which is queried roughly looks like this:

      1 <?xml version=“1.0” encoding=“utf-8” ?>
      2 <AMM>
      3   <CHAPTER KEY=“…” CHAPNBR=“3”>
      4     <TITLE>…</TITLE>
      5     <SECTION KEY=“…” CHAPNBR=“3” SECNBR=“1”>…</SECTION>
      6     …
      7   </CHAPTER>
      8 </AMM>
Levels also include SUBJECT and PGBLK. The KEY attribute is always unique and each level element has a TITLE child and other structural elements are always direct children of their parent. This is all we need to know about our source.

tree.jsp looks like this:

      1 <?xml version=“1.0” encoding=“utf-8”?>
      2 <% response.setContentType(“text/xml”); %>
      3 <%@ taglib uri=“http://www.xhive.com/taglibs/xhivetags-1.0" prefix=“xhtags” %>
      4 <xhtags:session>
      5   <xhtags:transaction>
      6     <xhtags:contextnode contextNodePath=“/amm”/>
      7     <xhtags:xquery>
      8       declare function local:tree($root as element()) as element()
      9       {
     10         let $rootId := if ($root/@KEY) then $root/@KEY/string() else “0”
     11         return
     12         <tree id=“{ $rootId }”>
     13           {
     14             for $child in $root/(CHAPTER | SECTION | SUBJECT | PGBLK)
     15             let $numStr := string-join( ($child/@CHAPNBR,
     16                                          $child/@SECTNBR,
     17                                          $child/@SUBJNBR,
     18                                          $child/@PGBLKNBR),
     19                                         ‘-’)
     20             let $hasChildren := exists($child/(CHAPTER | SECTION | SUBJECT | PGBLK))
     21             return
     22               <item id=“{ $child/@KEY/string() }” text=“{ $numStr , ‘ ‘, $child/TITLE/string() }” child=“{ if ($hasChildren) then 1 else 0 }”/>
     23           }
     24         </tree>
     25       };
     26
     27       let $root := //*[@KEY = “<jsp:expression>request.getParameter(“id”)</jsp:expression>“]
     28       return
     29         if (empty($root)) then
     30           (: empty root, return main doc :)
     31           local:tree(/AMM)
     32         else
     33           (: children of the node with the given ID :)
     34           local:tree($root)
     35     </xhtags:xquery>
     36     <xhtags:foreach>
     37       <xhtags:tostring/>
     38     </xhtags:foreach>
     39   </xhtags:transaction>
     40 </xhtags:session>

The query retrieves the element with the given ID if existant, the root node (/AMM) otherwise. It then calls to the function local:tree to create an XML document that matches the format. local:tree first checks whether we have a KEY attribute on the root (in AMM documents the root node doesn’t have an ID) and makes sure we always have something. It then creates the root of the tree in line 12 and for each child of the root node an <item/> with the proper text, ID and child flag. It only iterates over CHAPTER, SECTION, SUBJECT and PGBLK nodes (line 14), everything else is considered non-structural and shouldn’t show up in the tree. In line 20 it creates the chapter number that is prepended to the title of each element. This could also be calculated dynamically from the XML, but if you’re handling fragments of the document that wouldn’t work. The “child” attribute (which should have been named “hasChildren”) is just an optimization so that the client doesn’t have to ask for the children of each node just to decide whether to display a “+” in front of them.

While this certainly has deficits (e.g. no proper escaping used on parameters to the queries, should be query parameters, XSL stylesheet runs on the server) it shows how easy it is to query XML with XQuery. I wrote the whole thing in less than a day, most of the time was spent debugging the JavaScript library and setting up Tomcat and the build environment. As a bonus, the interface between the server and client components couldn’t be more trivial. Using the ID attribute we can easily drop our browser based editing component into the application to make the whole thing read/write.

To have the whole thing run fast I added an index on the attribute KEY of any element. This can also easily benefit from HTTP caching. If the whole thing needs to scale up, you can simply change the path to the database file (e.g. /var/xhivedb/data/XhiveDatabase.bootstrap) in your web.xml to a remote machine running X-Hive/DB (e.g. xhive://server:1235/). This way, all requests to the application servers will use a local data cache and the server is only ever queried if pages have been modified. Using that technique you can serve a lot of users.

AMM Browser

Saturday, April 22, 2006, 14:47 — 0 comments Edit

Screenshot of the AMM browser

First MacOS X impressions

Thursday, April 13, 2006, 06:22 — 0 comments Edit

I’m just in the process of getting used to MacOS and my new MacBook. The MacBook itself simply rocks - it’s fast, looks good, runs acceptably long (3.5 or something), etc.

For MacOS, some things are different in a pretty strange way, I definetely need to get used to that. Plus I don’t know some of the very basic things, I’m just learning how to install applications and especially Java applications atm. Something pretty difficult for me is for example text editing. On Windows and Linux there are text editing shortcuts which are supported virutally everywhere. I guess I just don’t know these on MacOS yet, though I also already noticed some differences between e.g. text editing in SubEthaEdit vs. Eclipse.

Regarding “looks good”: Many people say MacOS is extremely beautiful and it’s because of the window theme or the animated effects etc. I don’t think they are right - the major difference is font rendering. On MacOS all text looks really good. Proper anti aliasing where you want it, no anti aliasing if the font is too small etc. I don’t know exactly how they do it, but the effect is that everything looks very professional and is very usable. Nice.

MacBook shipping time

Wednesday, March 22, 2006, 17:39 — 0 comments Edit

I ordered one of those nice MacBook Pro’s, the 2 GHz variant, with an additional GB of RAM (making it 2 GB), the 7200rpm harddrive and in full English (keyboard/OS). The webpage initially says it’s going to take 3-5 days to ship. Now the shipping information says it’s going to be sent on April 18th, making it roughly 4 weeks.

I feel like a child who got it’s candy stolen …

UPDATE: 4 weeks in Apple time are significantly shorter than expected:

Verzonden op Mar 27, 2006 via TNT International Express

… ‘Verzonden’ is the Dutch equivalent of ‘sent’.

Functional programming

Tuesday, February 14, 2006, 17:24 — 0 comments Edit

Some articles/books about functional programming I came across:

Probably nothing new to the reader, I just note them so I won’t forget them.

File identity in version control

Friday, February 10, 2006, 14:02 — 0 comments Edit

Something I find pretty annoying about version control systems is the instability of file identities. The problem is that changesets and operations in e.g. Subversion consider files to be identified by their relative file system path. If you merge a changeset from your main development branch back to an older release branch, changes are applied to the files which have the same path. If you now rename or move a file on the development branch, e.g. because of refactoring, patching fails because the target of the change operation cannot be found.

The solution would be to give out identities to files once they are initially added to the version control system. These IDs would need to be stable within the repository and over moves/renames. The file system path would then only be a property of the file. If a changeset is applied to the tree, files are identified by their ID and patched the normal way. If multiple files within the tree have the same ID (e.g. a file has been copied within version control) the change should be applied to both of the copies. The semantics of the merge operation would then be “take the changes that happened in this subtree between these revisions, find the files with the IDs in this target subtree, and apply the changes to them”. I’m not sure if it should be an error when a changeset contains changes to files which are not present in the target tree and if the IDs of the roots of the two trees would need to be identical (e.g. you have to apply changes to files rooting at the same directory hierarchy), but this would surely make changes easier to track and backports a lot less painful.

Tail Recursion in XQuery

Tuesday, February 7, 2006, 12:58 — 5 comments Edit

Today I finished a feature for the X-Hive/DB XQuery processor that has been sitting on my wishlist for quite a while: Tail Recursion. The problem is that in functional languages (like XQuery) you often write recursive functions to achieve some goal. While this is very elegant, it has the problem that you can run out of stack space quite fast if you process deeply nested structures or long lists. Tail Recursion can solve this in certain cases by evaluating the function in an iterative way. E.g.

declare function local:sum($start as xs:integer) as xs:integer
{
  if ($start eq 0) then 0
  else $start + local:sum($start - 1)
};
If you rewrite this function to this:
declare function local:sum($start as xs:integer, $acc as xs:integer) as xs:integer
{
  if ($start eq 0) then $acc
  else local:sum($start - 1, $start + $acc)
};
the interpreter can evaluate each tail call after the method body has been evaluated, because the values returned by the call are not used in the calculation of the body, but rather returned directly. To enable this, the path to the tail call within the function must only contain if/else or sequence (“,”) operators, as opposed to the “+” operator above.

To enable this in X-Hive, we had to turn of lazy evaluation of the function call parameters for tail calls (otherwise you’d run out of heap space), which should not be a problem, and we had to accept a minor non conformance. The problem is that you can’t really check function return values as mandated by the standard if your function doesn’t really return the whole evaluated sequence. The end result of the whole function call is still checked, but subresults of the single tail calls aren’t. We consider this to be ok, as it doesn’t create false results for correct queries and the examples of offending queries are pretty contrived. I would be very suprised to encounter something like this in the wild:

declare function local:foo($x) as xs:integer
{
  if ($x eq 0) then ()
  else (1, local:foo(0))
};
Calling this function with any value for $x will return a single “1”, which matches the declared return value, but the result returned by the intermediate local:foo(0) call does not - it’s the empty sequence which doesn’t match the declared “exactly one xs:integer”. While we’re aware that this is interpreting the Rules for optimization and error cases in a very liberal way, we feel that it’s worth it, as tail recursion allows a wide range of problems to be solved within XQuery that would be impossible otherwise.


New Post