Martin Probst's weblog

File identity in version control

Friday, February 10, 2006, 14:02 — 0 comments Edit

Something I find pretty annoying about version control systems is the instability of file identities. The problem is that changesets and operations in e.g. Subversion consider files to be identified by their relative file system path. If you merge a changeset from your main development branch back to an older release branch, changes are applied to the files which have the same path. If you now rename or move a file on the development branch, e.g. because of refactoring, patching fails because the target of the change operation cannot be found.

The solution would be to give out identities to files once they are initially added to the version control system. These IDs would need to be stable within the repository and over moves/renames. The file system path would then only be a property of the file. If a changeset is applied to the tree, files are identified by their ID and patched the normal way. If multiple files within the tree have the same ID (e.g. a file has been copied within version control) the change should be applied to both of the copies. The semantics of the merge operation would then be “take the changes that happened in this subtree between these revisions, find the files with the IDs in this target subtree, and apply the changes to them”. I’m not sure if it should be an error when a changeset contains changes to files which are not present in the target tree and if the IDs of the roots of the two trees would need to be identical (e.g. you have to apply changes to files rooting at the same directory hierarchy), but this would surely make changes easier to track and backports a lot less painful.

Tail Recursion in XQuery

Tuesday, February 7, 2006, 12:58 — 5 comments Edit

Today I finished a feature for the X-Hive/DB XQuery processor that has been sitting on my wishlist for quite a while: Tail Recursion. The problem is that in functional languages (like XQuery) you often write recursive functions to achieve some goal. While this is very elegant, it has the problem that you can run out of stack space quite fast if you process deeply nested structures or long lists. Tail Recursion can solve this in certain cases by evaluating the function in an iterative way. E.g.

declare function local:sum($start as xs:integer) as xs:integer
{
  if ($start eq 0) then 0
  else $start + local:sum($start - 1)
};
If you rewrite this function to this:
declare function local:sum($start as xs:integer, $acc as xs:integer) as xs:integer
{
  if ($start eq 0) then $acc
  else local:sum($start - 1, $start + $acc)
};
the interpreter can evaluate each tail call after the method body has been evaluated, because the values returned by the call are not used in the calculation of the body, but rather returned directly. To enable this, the path to the tail call within the function must only contain if/else or sequence (“,”) operators, as opposed to the “+” operator above.

To enable this in X-Hive, we had to turn of lazy evaluation of the function call parameters for tail calls (otherwise you’d run out of heap space), which should not be a problem, and we had to accept a minor non conformance. The problem is that you can’t really check function return values as mandated by the standard if your function doesn’t really return the whole evaluated sequence. The end result of the whole function call is still checked, but subresults of the single tail calls aren’t. We consider this to be ok, as it doesn’t create false results for correct queries and the examples of offending queries are pretty contrived. I would be very suprised to encounter something like this in the wild:

declare function local:foo($x) as xs:integer
{
  if ($x eq 0) then ()
  else (1, local:foo(0))
};
Calling this function with any value for $x will return a single “1”, which matches the declared return value, but the result returned by the intermediate local:foo(0) call does not - it’s the empty sequence which doesn’t match the declared “exactly one xs:integer”. While we’re aware that this is interpreting the Rules for optimization and error cases in a very liberal way, we feel that it’s worth it, as tail recursion allows a wide range of problems to be solved within XQuery that would be impossible otherwise.

Europeans, Austrians, Germans, whatever ...

Monday, February 6, 2006, 16:46 — 1 comment Edit

So demonstrants in Teheran attacked the Austrian Embassy and burned a German flag in front of it (article in German). I guess the Austrians are really offended now …

Usability of Command Line Interfaces

Sunday, January 29, 2006, 13:30 — 1 comment Edit

Something really astonishing about Ubuntu (and all Debian distributions) is the state of the Command Line Interfaces to the package management system.

First, there is “dpkg”, coming in several incarnations (20 different programs). This is not supposed to be used by actual humans, and there are good reasons for that, it’s just unusable. Then there is apt (9 different programs) providing something somewhat usable. With package management, I have 4 main things I usually want to do:

Installing and removing is pretty straightforward, but upgrading and finding out information about installed packages is surprisingly complicated. If an upgrade just works, your fine. But if it e.g. wants to remove some package you’d rather not see removed, ‘apt’ doesn’t tell you why. Plus, finding out which version a package is usually means first searching for the name of the package using apt-cache (and reading through screens full of names to find what you were actually looking for, ending up with three candidates of which 2 are ‘transitional’, ‘obsolete’ or whatever), as everything has rather obscure names, and then querying information about it using dpkg-query. Why is this so difficult? And why is it so easy with Gentoos “emerge” command?

Speaking of Gentoo, the whole init/rc system of Debian/Ubuntu is also a huge step backwards from Gentoos rc system. The command line tools are obscure, the user has to specify the start order manually where this could be specified by package maintainers. In Gentoo there are runlevels which have clear, understandable names and a tool that manages all the dependencies for you. Instead of pondering when to start which service in what order, and especially when to stop them again, you simply state that you’d like to have service x running at runlevel y, or not. While there is somewhat of a GUI for that in Ubuntu, it doesn’t seem to work all too often and it doesn’t include many of the services.

This was pretty surprising to me, as I figured that Debian was the system used by many administrators and they’d probably care about such things. Strange.

Performancing

Monday, January 23, 2006, 21:43 — 0 comments Edit

Lars Trieloff’s Software Documentation Weblog

Performancing for Roller Dave Johnson anounces that there will be soon compatible releases of Roller and Performancing for Firefox. Cool.
I’m blogging this using Performancing. Cute, though it does not seem to have options for trackbacks …

Eclipse VE & PermGen

Monday, January 16, 2006, 10:57 — 0 comments Edit

I currently use the Eclipse Visual Editor (VE) to create some SWT layout. The tool itself is really great as it produces high quality code that actually looks as if it had been written by a developer - for appropriate values of ‘developer’. It’s still a bit messy, but you can take it as a good start. I’m currently not sure how to use the code - I’d like to c&p it from the generated class to break it up a little bit, as my dialog is quite complex, though this probably makes it impossible to do further work on the code with the VE. (That’s actually the great thing about VE - it operates on real Java code, no compilation step between it, and you can even modify the code without it biting you.)

The only problem I ran into were frequent OutOfMemoryErrors using it. I finally figured it’s probably because of PermGen space, the heap area where interned strings and classes are stored. You can increase the PermGenSpace with this VM flag (on Sun’s VM):

java -XX:MaxPermSpace=128M
Default is 64M, I’ll see if this helps me.

MacBook Pro

Thursday, January 12, 2006, 17:19 — 1 comment Edit

I’m thinking about buying one of those MacBook Pros. Linux on a notebook has never really worked for me (suspend, WiFi, modem …), at least not without excessive fiddling. Windows is just not an option - I need a Unix(oid) system.

What kind of annoys me is the pricing. Those things are always a lot cheaper in the US than in Europe, for no proper reason. The US Apple store lists the bigger configuration with 2,499$, the German with 2,599€, and the Dutch with 2689€. With the Euro currently at 1.20 something, 2,499$ translate to about 2080€ - that’s a difference of >500€.

Now to be fair, the German and Dutch stores include luxuries tax of 16% or 19%, but even without that it’s still a difference of over 160€. Why?

I could get a MacBook from the United States, but I guess the calculation doesn’t work out. Apart from possible customs (VAT tax, 16% or 19%, computers themselves are tax free) you’d have to pay about 5% sales tax. Adding to that a new power connector it’s only 300€ difference. Plus you won’t get support in Europe and probably have to mail the whole thing to the US if there is a problem … if you get caught in customs, it’s a plus/minus zero thing, if not, it might save you quite some money. I’m unsure …

SWT is like HTML 3.2

Thursday, January 12, 2006, 16:44 — 0 comments Edit

SWT is like HTML 3.2 – the only way to create something like a layout is tables (GridLayout) and they are a pain to use.

Compiled XQuery

Tuesday, December 20, 2005, 23:53 — 1 comment Edit

Lars, just because XQuery can be statically typed you can’t remove all the runtime checks and by that be much faster, there are still a lot of (possibly non-type related) runtime errors. It might get faster if you don’t have to parse a lot of integers because they are already stored as such, but that won’t give you that much.

An XSLT processor might benefit from the schema information where it knows that “//foo/bar” can only occur at very certain points, so it doesn’t have to match it against the whole tree. But for XQuery that’s more or less pointless, as the language is for querying, not for transformation. Meaning, in most usecases you will not want all bars below foos because you want to make a nice table out of them, but rather all foos with a @bar value of 5, because you need the endpoint of that link, or all nodes containing the text “baz”. In that case, you need indexes, and schema information doesn’t help much.

Registration process from hell

Thursday, December 8, 2005, 23:02 — 0 comments Edit

I just had to sign up for a new ICQ account, as my old account somehow got lost. I suspect someone took it over as the password was weak, but who knows. I have not used any ICQ product beyond the network service for over 8 years, using various other clients as ICQ itself is just too much of a torture. I just got reminded how awful ICQ can be on their website.

First of all, all of their websites contain at least 3 blinking, jumping and sometimes sounding flash ads. Apparently they don’t care if potential customers die of an epileptic shock before finishing registration. Then the website contains incredible amounts of garbage, but nowhere are the things people actually might want to do: get an ICQ number, login to an account on the webpage.

After finding the registration form, you’ll first be surprised that “only” a nick, an email address and security related stuff are needed. Of course they have a captcha. After filling out the page, it returns, stating the password must be 6-8 characters and may include some special chars. I re-filled out the page with password and captcha three times until I realised that they actually limit passwords to 8 characters. Everything else would be too secure or what?

The answers to the special questions also give pain - again, at least 6 characters. What if my answers are shorter? Plus, why do I actually need this, if I have an email address to come back to? It’s not as if I’d forget my email address but remember a 10 digit number … and everytime you type something wrong at that page, you have another chance at the captcha plus reentering your password twice.

If a company has been doing online-stuff since well over 10 years, how did they manage to learn nothing about how to do it right? Just a tiny bit better than the average PHP coding “my homepage” guy?


New Post