Martin Probst's weblog

EMC += Martin Probst

Tuesday, November 4, 2008, 10:20 — 1 comment Edit

Some of you already know it: I’ve joined EMC Corporation, starting on Oct 15th. I’ll be teleworking from Potsdam but commuting every month to Rotterdam for one week, which is an ideal arrangement for me.

EMC acquired X-Hive, my former employer in the Netherlands, in summer last year, so I’m basically returning to the company I left some time ago, mostly to finish my studies and try some stuff out (like taking a look into SAPs corporate innards, doing some freelance work).

I’m very happy that this worked out. X-Hive used to be a great employer, with really nice people and very interesting work. And now they suddenly pay much better ;-). Seriously: from my first days, it seems as if the influence of EMC is very good. Of course there is some corporate beaurocracy (which so far seems very acceptable), but on the other hand there are some seriously smart people giving input into my favorite native XML database. There is a huge set of really cool requirements to match, and X-Hive/EMC now certainly has the resources to fulfill them.

I’m very happy to work on this really cool product once again, and I’m particularly happy that it is probably going to have a much larger impact very soon. Nice times.

Time Machine works

Tuesday, November 4, 2008, 06:26 — 0 comments Edit

I’m happy to report that I tested Time Machines Backup-Restore capability yesterday evening, and it works, sigh.

I brought my MacBook Pro (1st gen.) in for repairs, because the left fan failed again, and I also had them build in a bigger hard drive. Restore from Time Machine took about 2.5 hours, maybe a bit more, but afterwards you’re directly booting into your complete system. Very nice!

On the fan: Apple decided to fix it on guarantee, as the very same fan failed a bit less than two years ago. Also, not long before that fan, the right fan failed. So Apple, you’re building a computer that sells for more than 2000 €, and you cannot build/buy/ship fans that last more than a year?

Update: after booting into Mac OS, you’ll have to re-import your emails, and then the next Time Machine run took ages, at least for me. My machine crashed twice while importing the mails before I increased fan speed - apparently there is still a heat problem. I also get some weird graphics errors (violet areas in windows, horizontal lines). I think I will get to know some more people in the Apple hotline shortly …

Trying to control the heat issue, I switched from smcFanControl to Fan Control. smcFanControl only allows a user to set specific fan speeds (via presets), where Fan Control dynamically adjusts fan speed depending on current temperature (which in turn depends on work load). So if you have a large job running, it will dynamically increase fan speed a bit more than Mac OS would, to keep your computer a bit cooler. Nice.

.NET VM sizes and Java

Wednesday, October 22, 2008, 15:39 — 0 comments Edit

I just installed all the new Windows XP patches in my virtual machine running parallels, including a bunch of hotfixes and service packs for Microsoft .NET.

As the download took quite a while, curious, I checked the size of the various .NET frameworks after installation. According to the Add/Remove programs dialog (no idea where these files really end up, so I can’t check directly), .NET 2.0 and 3.0 consume ~280 MB and ~335 MB respectively, including the German language packs, each at ~100 MB. For the also installed .NET 1.1 there is no size given, but I’d guess it’s not that much smaller then .NET 2.0, so about 150 MB should be a conservative guess.

So in summary, to run Microsoft .NET programs, I spend probable well above 750 MB of hard drive space. Compare to Java, where one version suffices due to backwards compatibility, where the JRE is 114 MB (and the JDK is larger, but not by much if I remember correctly). This is actually a pretty good argument for at least some investment in backwards compatibility.

I wonder what they include in those .NET packs? And what if they continue “innovation” (or whatever) at this pace? In 6 years since 2002 there are 3 incompatible version, does that mean we’ll have to install 2 GB in 6 frameworks in another 6 years?

Online Dictionary Bookmarklet

Thursday, October 9, 2008, 09:48 — 0 comments Edit is a nice service that translates, amongst others, from Dutch to German. I created a small bookmarklet that allows to quickly translate text on web pages. It’s a direct rip-off from the LEO Dict bookmarklet.

First, drag this link: NL->DE to your address bar. Then select a word on a web page, and klick the link - this should open a window with the translation search. In case you didn’t select anything, you will be prompted for a word to look up.

I’m trying to brush up my Dutch a bit, so apart from the great Woord van de dag service by the Niederlandistik (is that Dutchery in English? ;-)) at FU Berlin, I’ve started reading a bit in the Dutch Wikipedia, and this tool really helps.

GNOME Online Desktop

Thursday, October 9, 2008, 06:41 — 0 comments Edit

GNOME Online Desktop, via Silvan, with screenshots and a tour available at RedHat Magazine.

This basically looks like a nice idea, going forward to really integrate desktop functionality with web based apps. However it feels somewhat backwards, with the desktop developers implementing lots of connectors to various web applications. Shouldn’t it be the other way around?

What I’d envision is a desktop that tightly integrates with the web browser and provides a set of hooks for web applications to integrate with, something like a one-click installation of a small plugin that augments the desktop with functionality related to the web app.

This could be small JavaScript pieces or maybe even only XML configuration that tell the desktop where to search for documents/calendar events/IM conversations/…, how to integrate with IM, pull notifications and so on. That would make the system much more open - any website developer could nicely integrate his application, without relying on the GNOME developers to add his webapp to the desktop. Of course there are security issues with that, but they should be fixable.

I generally think there is much value in extended JavaScript access to the desktop. It is certainly dangerous and needs to be done right ™, but the possibilities are really cool - like access to calendars and address books. Mac OS nicely shows how this can work - they provide basic, central services like the address book and the calendar, and allow other applications to re-use the functionality, which is of huge value to users. This would also make it viable to write real applications for mobile devices (iPhone, Android) just through web pages and JavaScript. Users would need to be asked for permissions, just like they do it with HTML5’s openDatabase offline storage, and a good user interface for that is crucial.

My guess is that even if the desktop fails to deliver such integration, the web applications will, sooner or later. Google already has all the right APIs in place, it’s just lacking a proper model to share some information with some applications without exposing your whole online life to some foreign app (giving away your GMail username/password is not an option). So the question isn’t whether this tight integration over services is happening, but rather whether the desktop will be part of it.

Taleo E-Recruitement

Thursday, October 9, 2008, 05:54 — 2 comments Edit

Some time ago I applied at a company that uses the “Taleo E-Recruitement” software. Taleo provides an ASP solution where - basically - HR people can post jobs and applicants can submit resumes. Gartner puts Taleo “in the leaders quadrant”, as their website boasts. Once again, no idea on what Gartner judges (but some hints via Lars), but it’s probably not related to the quality of the product.

I’ve rarely seen such a sucky web app, I thought they had died out somewhen around 2001. Search is pretty much broken, back button is broken, can’t open pages in tabs, can’t post links to jobs, it’s integrated into the company’s site in an iframe so that it takes a maximum of 13 of your screen, you need to create an account and will then be spammed with irrelevant job postings (need to login to turn them off, and the password recovery appears to be broken, too), etc. After you made it through the broken search and registration, you can submit your resume. First you can upload the CV, then you have to fix the automatically extracted name and address (why not just type it?!), then the system expects you to manually type your CV again in plain text, but please with formatting fixed. Then you’ll have to re-enter all the information (work experience, education) from the CV in awkward HTML forms.

What could be simply writing a proper cover letter and sending it in an email including your CV is magically turned into a 1hr+ task, full of broken, annoying software and the constant fear that all your work will be eaten by another browser incompatibility on the last “wizard” page.

Taleo’s website states that there is “Heightened competition for skilled workers.” Yes, very much indeed. But why are you writing software that actively tries to keep people from applying to jobs then? It’s like a specially designed filter that will drive off all the good people that don’t need to put up with this stuff.

How to look for applicants

Saturday, September 20, 2008, 08:54 — 3 comments Edit

I’m currently finishing my Master’s thesis (finally!), so I’m looking for a job. It’s a bit weird though: there are apparently a gazillion of books on the market, telling you how to apply properly for job offerings, but from the job offerings I see, we desperately need literature on how to properly write a job offering.

You’ll find hundreds of boring “J2EE/Hibernate/Spring/JUnit/$DB” ads, all listing a long number of technologies you ought to be familiar with (hint: good people will have no problems learning any technology!), and telling you effectively nothing about the company, the domain, or anything. Sucks.

And then you have all the ads that don’t even meet the minimum standards - typos, duplicate copy/paste content, bad formatting, completely broken HTML. What are they thinking? Is that the impression you want to give, “we can’t even produce proper job ads, come work for us”?

Memory accounting in Linux

Friday, September 19, 2008, 09:07 — 0 comments Edit

Once again random processes on my virtual server running this blog were killed, due to out of memory errors. This time, it actually led to some outage, because Apache was killed.

Normally, Apache servers the whole blog from a static cache, and only comments and new posts are handled by Rails, which makes the whole thing pretty fast - otherwise it would be unusable. Rendering single requests takes something close to seconds on my MacBook Pro, and thus much longer on the vserver. I have no idea why, and it’s probably some bug in my own implementation, but introducing efficient caching was both easier and more interesting than performance tracing the rails code.

This is getting quite annoying. I have 128 MB guaranteed memory on the vserver, and this is simply not enough for Apache, SVN, Rails, and Tomcat. Interestingly, Tomcat is much less of a culprit than Rails, which consumes a lot more memory through the several process instances, as I found out earlier.

The bad thing is that there really is currently no way under Linux to find out which of your applications is actually using your memory, and how much of it. RSS, VSZ, “size” and others simply don’t give any relevant insight into memory usage - the only thing you can probably do is look at memory consumption, start the app, and compare. Which is a pretty bad state, IMHO.

Lately I found that the /proc filesystem provides a ‘smaps’ file for each process, which contains its memory mappings, each explained with segment, rss, private and shared pages. This might actually lead to a useful memory analysis. One could probably write a simple tool that reports the private memory usage of each process, and accounts the memory usage through shared libraries to the apps using them. This should probably separate fixed cost (i.e., the shared memory consumed by running anything Ruby based) and dynamic costs (i.e., the added private memory for each Ruby/Rails instance).

I’ve started playing around a bit with a small Ruby script, and maybe if I have some spare time, I can turn it into something useful, though this will probably take a lot of learning about virtual memory in Linux.

Maybe I’m just using the wrong tools or operating systems, but I find it somehow depressing that we don’t have a proper way of accounting memory to applications. It’s really annoying that even today a badly written application that does something like while (true) malloc(…); can effectively bring down your whole system…

Higher order functions in XQuery

Friday, August 1, 2008, 07:39 — 0 comments Edit

The requirements for XQuery 1.1 contain a MAY item called “higher order functions”. I’m really fond of the idea of higher order functions in XQuery, and as there are currently no use cases for that, I’ll contribute some in here:

Simple predicates

A simple use case is to pass a predicate to another function, as shown below:

(:: Selectively only copy those elements for which $pred returns true
    TODO works recursively, but copys all attributes regardless of $pred
declare function local:selective-copy($nodes as element(), $pred as function($node as element()) -> xs:boolean)
    for $n in $nodes
      (: call $pred :)
      if ($pred($n)) then element { node-name($n) } { $n/@, local:selective-copy($n/*, $pred) }
      else ()

declare function local:mypred($node as element()) as xs:boolean { let $name = node-name($node) return namespace-uri-from-QName($name) = ('', '') };

let $xml := <user xmlns=""> <name><first>Fritz</first> <last>Müller</last></name> <password xmlns="">vj/b5ZaUYQ6kU</password> <preferences> … </preferences> </user> (: user the curry operator & to get a function "handle" :) let $predicate := &local:mypred return local:selective-copy($xml, $predicate)


The next use case would be the natural extension - real currying of functions. local:selective-copy is as above, but we’ll generalize our predicate a bit:

(: this function compares the namespace of a given node against a set of legal namespaces :)
declare function local:namespace-matches($namespaces as xs:anyURI*, $node as element()) as xs:boolean
  let $name = node-name($node)
  return namespace-uri-from-QName($name) = $namespaces

let $xml :=
  <user xmlns="">
    <name><first>Fritz</first> <last>Müller</last></name>
    <password xmlns="">vj/b5ZaUYQ6kU</password>
    <preferences> ... </preferences>
(: user the curry operator & to get a function "handle"
   we pass in a namespace to check against, and get an unary function in return :)
let $predicate := &local:mypred("")
return local:selective-copy($xml, $predicate)

The & operator

The & operator (“curry operator”) returns a handle to the function given by name, possibly setting values for parameters by specifying them in parentheses. This provides the reification for functions, i.e., the way from a function to a value.

Currying is only allowed from left to right, i.e., one cannot curry the second argument, but leave the first argument to a function unbound. I don’t think this is a large restriction, and it makes the syntax much easier.

Calling function handles

The $someval(…) syntax allows straight forward calling of a function handle, so it is somewhat of the inverse of the & operator.

Types for higher order functions

I think the static typing feature of XQuery never really gained much traction. However it would of course be possible to introduce a new type, called “function()”, and specify parameters and return values as shown above. I think it should be possible to type-check that, but I’m not an expert.


I didn’t implement this yet (I currently don’t have access to an XQuery implementation), but it should not clash with any existing syntax, so from a grammar point of view it should be ok. It does require some changes to the runtime system, but that shouldn’t be too difficult, IMHO.

The nice thing about higher order functions is that they can allow some method of dynamic dispatch. That is, they allow to write programs that decide at runtime which code is going to be executed, in an elegant way.

This is of course not complete. It doesn’t support a terse syntax for lambdas, which would also be nice (not having to declare all those pesky one-line functions). XQuery should also have something like a “fn:resolve-function($name, $argc)” that provides dynamic access to the curry operator by specifying the QName and argument count of the function.

I think the example shows that this little extension can get you a lot of nice functionality and a lot less typing. Please leave a comment with your opinion!

XQuery Scripting Extensions and Use Cases

Wednesday, July 9, 2008, 07:48 — 0 comments Edit

In march, the W3C published a first draft of the XQuery Scripting Extensions use cases and a working draft.

The XQSE propose to extend XQuery to add a defined expression evaluation order, variable assignments, a while loop, and some other control flow statements. This worries me a lot - I’ve spent considerable time implementing and using XQuery, and I really feel that this extension will break XQuery as a language.

The problem is that XQuery was intended to be a functional, declarative language. This allows implementations to reorder statements, executing them in a (hopefully) optimal order, benefiting from index lookups and the like. Now that they add side effects and state to the language, this is no longer possible in the general case. It will also greatly complicate XQuery implementations.

Of course, the question is, what benefits might this extension of the language give. The use cases document provides some insight.

Use cases section R Q1-3 define queries that perform some modification of a persistent document and at the same time return a result (the new bid, number of deleted accounts, …).

Use case Q4 describes the use of a while loop to constantly poll the current highest bidder on some item and perform an action if it changes. I’m not sure what this seemingly strange scenario is supposed to solve, and the editors appear to be weary of this, too.

Q1-3 can easily be solved by allowing queries to perform modifications and return values at the same time. I’ve implemented this in X-Hive/DB, and it was actually simpler than what the W3C prescribed. I’ve since argued against this arbitrary limitation of the XQuery Updates specification, and maybe now would be a good time to fix that issue? Simply drop the separation between modifying and non-modifying queries and be done with it. About Q4, I’m not really sure. It looks like something that could be easily solved with some event based or message sending system. The fact that the editors are unsure how to implement this should probably be taken as a warning sign. Are we sure someone really needs this?

The use cases XHTML / AJAX both describe a scenario where a script first has to show a “busy” notice to the user, and then look up some data. I’m not sure if they envision XQuery running on the client, but otherwise this is perfectly solvable today. The client side JavaScript execution allows this trivially, and I really see no reason why to mix this client side GUI stuff with the server side operation. It breaks the MVC pattern for no really good reason.

Use case WS is again a variation of the “I want to perform changes and report results” theme. Drop the limitation in the XQuery Updates spec and be done with it.

In summary, it seems like they want to solve an actual problem, but approach it at a really complex angle. 5 use cases can trivially be solved by modifying the updates spec, 2 use cases (XHTML) should probably be considered harmful anyways, and one (R-Q4) I fail to understand. The collateral damage caused by the XQSE spec in complexity is not worth the net effect, if it can also be achieved by simplifying another spec!

I think we should rather concentrate on XQuery 1.1 with the grouping and windowing functionality that many people really need. Which, by the way, could also really use some simplification.

New Post