Martin Probst's weblog

Subversion and Eclipse again

Wednesday, April 6, 2005, 18:16 — 0 comments Edit

Some time ago Lars Trieloff helped me out with my Eclipse/Subclipse/Subversion problems by pointing me to a pure Java implementation of SVN client, which can be used to overcome Subclipses deficiencies.

This really made my day and solved everything - for about two days. Then I was back to normal, Subclipse didn’t work and everything just annoyed me.

Today I took another try at it. The problem with JavaSVN was it couldn’t find some jar (SVNClient.jar) in the (…).subclipse.core_0.9.28.1 plugin directory. Turns out there are two directories, one with version 0.9.28 and one with - but only the first one contains the jar and is actually used according to plugin.xml and feature.xml. After some fiddling around with texteditors, the xml files and some file-copy tries I gave up.

Subclipse itself is intended to work with the JNI javahl bindings, which it couldn’t find on my system. The Subclipse page states that Linux distributions should actually provide these bindings, they only ship them to Windows systems. And by following that hint I found out that Gentoo includes these bindings in the subversion ebuild (when compiled with USE flag java and berkdb). The only problem is that by default $LD\_LIBRARY\_PATH is apparently not set on Gentoo Linux systems.

Long story, short fix:

martin@perseus ~ $ export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib
martin@perseus ~ $ eclipse-3.1

And everything (?) works again. On Gentoo you can set this environment variable permanently by editing e.g. /etc/env.d/00basic, add the line “LD\_LIBRARY\_PATH=$LD\_LIBRARY\_PATH:/usr/lib” and do an /sbin/env-update && source /etc/profile (as root). On the next login you should have your environment set up.

Applied XML in RSS, SOA(P) and REST

Tuesday, March 29, 2005, 15:46 — 3 comments Edit

Lately there have been lots of debates about SOAP vs REST. To quickly summarize it: both SOAP and REST are about making ressources accessible via the web using XML. They have two different solutions for this. SOAP provides a toolkit that (in the best case) seamlessly integrates with your programming language of choice and provides the access to ressources by e.g. method calls on an object. This is done by providing a machine-readbale definition of the interface (WSDL file).

REST is about not providing a toolkit but rather defining simple APIs using HTTP GET, POST, PUT and DELETE as the commands, passing around chunks of XML. People argue that this is simpler than SOAP with all it’s toolkits (which are partially incompatible to each other etc.) and much more similar to the current architecture of the web, which is a great success after all.

Now all this stuff about Service Oriented Architectures seems to be marketing speak. But will REST really be better then SOAP? RSS feeds are an XML application which is already deployed very widely. It’s supported by many different tools in a read-only mode and by some even with a posting method.

The pro for REST is that RSS somehow works. People started providing content via HTTP GET in some rather loosley defined format, other people wrote aggregators for these ressources, and today we can read news from many different sites on our desktops easily.

The con is the effort needed to provide RSS support. Some of this comes from the “loosely defined format”. But the biggest part is from malformed XML, at least as far as I can see (blog posts by prominent RSS tool authors seem to support this). There is hardly any feed out there that really adheres the format. Additionally, there are lots of feeds out there which aren’t even XML. There is a huge amount of issues, from simple bad nesting to unescaped content, double-escaped content, bad namespaces, bad content encodings, etc. pp. And this is not even about adhering to a special schema. XML looks as if it was very easy to write and read, but in reality it’s a lot more complicated than you might think.

Now what does this mean for REST? I think it is very likely that all custom XML applications where people don’t use toolkits, but rather write angle bracktes themselves, will suffer from the mentioned problems. You might argue that this is not such a big problem as RSS reader authors seem to get along with it too. But in this case everyone who wanted to use a specific REST application would need to write some magic, ultra-liberal parsing application. She wouldn’t be able to use XML technologies like XPath, DOM, XSLT out of the box, she wouldn’t even be able to use SAX.

Apart from that, assuming people were able to at least provide valid XML. What about the API. With RSS it’s relatively easy, there is only one big GET. But what if you wanted to provide more, like GET last 5 posts? You would start inventing an API, e.g. via query strings. Nothing bad about that, but how do you document that API? As far as I have seen it, most documentation in companies is done using Word documents. Most of this documentation is either too old, barely understandable, much too short, simply wrong or a combination of these.

A decent SOAP toolkit would provide the XML serialization of objects. It would also provide a basic minimum of documentation of the API (people would at least know which functions exist). Noone really has to care about the exotic WS-* stuff, but it’s there if you need it.

REST might be good for really small, really simple applications. But if you want to start something bigger, something that might involve lots of developers, might change over time, has an API that provides more than two methods, you should really use a toolkit or it will be a mess.

DITA - Darwin Information Typing Architecture

Thursday, March 17, 2005, 16:25 — 2 comments Edit

I just stumbled across DITA, the Darwin Information Typing Architecture. It’s an XML application for software documentation. While I just gave it a very quick read (I’m mainly blogging this so I don’t forget to check on that later) it seems to differ from DocBook in two points:

The second point sounds interesting. DITA mainly consists of a central XML DTD which describes a “topic”. Every information collected in the system has to be hierarchically below a topic. And while DITA provides a basic topic-DTD it’s intended to be extended or restricted by it’s users. E.g. the user would create a new DTD which only contains a subset of the DITA syntax to ensure that information using this topic DTD only contains specific elements. She could also write a new DTD which brings more elements to describe information specific to her application domain.

Now I just wonder why they use DTDs?

Moving to Rotterdam

Thursday, March 17, 2005, 15:56 — 2 comments Edit

This is probably the last post I write from Potsdam, Germany, at least for quite a while. I finished my studies at the Hasso-Plattner-Institut for Software-Systems-Engineering on February 28th with the degree of a Bachelor. Now I’m off to Rotterdam to start working at X-Hive, a company building an XQuery enabled XML database called X-Hive/DB and on top of that a Content Management System called X-Hive/Docato.

Both products seem to be really interesting, the team appears to be really cool, and Rotterdam is said to be a nice city too. So I’m really looking forward to this. I have been implementing XQuery as part of a students project in the last semester and I’m really eager to do more on that.

On the other hand, leaving Potsdam is quite sad. I found a lot of really good friends in the 3.5 years I spent in here, and I learned to love the city. From an aesthetic point of view Potsdam probably wins against Rotterdam with all it’s castles, lakes, rivers, parks etc. Rotterdam is quite a modern city, remembering me of Hannover, with lots of modern architecture and an impressive skyline. Let’s see how it works out.

This is the good news. The bad news is that I have to move all my stuff to Rotterdam. It’s not such a big distance (about 900kms from Berlin, should be reachable in one day) but I just hate moving. This is my fourth move in the last 4 years, which is really starting to annoy me. Also, renting a transporter to drive from Potsdam to Rotterdam is unreasonably expensive.

Finally a PDF reader for GNOME-Linux?

Tuesday, March 15, 2005, 16:43 — 0 comments Edit

I just tried out Evince, a PDF reader. Displaying a PDF doesn’t sound like a big thing but actually this has been one of the minor annyoances on my GNOME desktop for quite some time. Xpdf, ggv, gpdf, etc. are either badly integrated with GNOME, have a really strange user interface, tend to display PDFs wrong or don’t support basic features like searching.

Adobe Acrobat Reader does not seem to be a real alternative. I’ll try out the 7.0 version soon but the 5.0 just plain sucks. Version 7 promises GNOME integration but according to Luis Villa’s review it doesn’t really succeed on that.

Evince seems to fix that. I can’t really say much about it’s capability to display PDFs correctly (only had a few samples which worked) but integration and user interface seem to be ok. It also supports searching. Evince still lacks some stuff though: multi-page scrolling, the grab-cursor mode for draging the document view, possibly a zoom tool that lets you specify areas to display. But this is (as far as I can see) on their TODO list and might be integrated soon.


Monday, March 14, 2005, 13:08 — 1 comment Edit

Every now and then someone thinks about namespaces and nearly everyone seems to be at least slightly confused afterwards. David Magginson tries to clarify on that, although he admits the decision the Namespace group has taken is not really perfect.

Now what really bugs me is that according to his post (and Dare Obasanjo has written about that before too) there are three cases for the namespace of an attribute:

The annyoing thing about that is in my opinion the “locally scoped” case. I might miss something, but I don’t see any other XML standard that really requires or uses that “locally scoped” feature. At least XPath, XQuery etc. don’t use it, and XML Schema as far as I know doesn’t too, does it?

The namespace should have been either inherited from the element the attribute is in or (in my opinion clearer) been the default namespace set with <myelem xmlns=“foo”/>. There should really not be any difference between elements and attributes regarding XML namespaces as both use the same syntax (QNames). Using the inherit-mechanism not only for attributes but for elements also might bring strange effects when including XML snippets within other documents.

Another solution would be to forbid elements without explicit namespace prefixes altogether. This would bring some annyoance to users who do not use XML namespaces or just use one within one document, but it would also be absolutely clear. I would prefer to have a clear syntax instead of surprising effects when using namespaces …

XSLT and XQuery application domains

Friday, March 11, 2005, 16:15 — 0 comments Edit

Once again someone published an article comparing XQuery and XSLT. As others have mentioned (here or here), this article isn’t really that helpful. In fact, it’s actually misleading in several places. The author compares XSL 1.0 with XQuery 1.0 where XSL 2.0 would really be the one to pick. Also the author describes how to extend XSL or XQuery processors giving code samples which are tailored for two specific implementations. I’m not really sure how that is supposed to be helpful as the mechanism for extension is bound to be vendor specific and can be very different from implementation to implementation.

My main criticism of the article is that it once again mixes up application domains of the two languages. You cannot do a direct comparison of XQuery and XSL, they have been created for two very different purposes. The only thing similar is that they both work on XML. Think about it, the W3C wouldn’t invent two languages for exactly the same purpose, would it?

XSL is the eXtensible Stylesheet Language. An XSLT is a Stylesheet Transformation, e.g. it’s supposed to take an input document and apply some kind of a style to it by converting it’s contents to something different.

XQuery is the XML Query Language. It’s supposed to be used for querying XML data sources. This means: take several input sources, fetch information from them (e.g. by matching certain criteria against the sources), and return that data. The XML element constructors allowed in that language are not thought to be used to re-style document contents, but rather to give the user a means to structurize his returned content. Do not use XQuery to re-style documents, you will probably end up with lengthy, complicated queries requiring “manual recursion” (as opposed to XSLT’s automatic recursion with “apply-templates”), endless typeswitchs and an ugly mix between presentational and application logic. Look at the XQuery use cases. None of the queries tries to convert documents.

A typical application might use XQuery to fetch XML from an XML database or other sources (like the filesystem or web sources - whatever). The XML would just be taken from it’s source, maybe structured by some tags and then passed on to a presentational part of the application where it might be styled using XSLT.

The XSLT standard arguably expects a document (and usually exactly one document) to be fully available in memory (it doesn’t really require that, but all scripts and implementations I’ve seen actually work like that). XQuery doesn’t need that, it has been designed with large data stores in mind from which you might only want to extract minimal parts. XQuery has been designed to be able to query large data stores as opposed to XSLT which has been designed to format/re-style XML documents of a size that actually fits into main memory.

In a 3-tier model (introduced by SAP back in the 80’s?) you would typically find XQuery statements in the data-server-tier (as stored queries) or in the application-server-tier. XSLT scripts would be found either in the application-server-tier or, since most browsers support XSLT nowadays, in the client-tier.

MarkLogic’s use of XQuery as a CGI language is quite an interesting example of using XQuery in the application-tier though in the screencast presentation we can once again see people trying to transform XML documents to XHTML using XQuery. A better example might have been aggregating information from the book database (e.g. all authors and how many books they’ve written) and transforming that information into something displayable by the client using XSLT. Apart from that it’s quite nice btw.

XML file types

Sunday, March 6, 2005, 12:16 — 1 comment Edit

Just a quick note: why does every tool that uses XML files store them in files ending in “.xml”? This is really getting annoying. If you are using XML files in various different applications neither Windows nor Linux can provide the correct application when double-clicking them (I don’t know about MacOS, the have MIME types associated with files, don’t they?). You might have “mydoocbook.xml”, “build.xml”, “project.xml” etc.

This is especially striking when working with Eclipse. Every XML related plugin seems to consider it valid to conquer the “.xml” ending. So double-clicking an XML file in Eclipse most likely opens the wrong editor/view/whatever. Application programmers should really consider doing it like StarOffice. Provide a default filename ending for your XML application and use it!

On the other hand most current filesystems provide the ability to handle meta-data like MIME types. NTFS does, ReiserFS does, Ext3 does, XFS, JFS, etc. This has been around for quite some time so someone (Gnome?) should take the first step and use it.

Trackback Spam & Spam Karma

Monday, February 28, 2005, 15:53 — 0 comments Edit

While Lars is a little bit annoyed over my recent WordPress upgrade to Version 1.5 I made a great step forward regarding spam fighting.

Spam Karma now actually works. Which is really great as I used to get about 20-30 spam comments/trackbacks every week, sometimes even more frequent. Before I tried Spam Karma I just disabled the comments form but soon thereafter they started to do fake trackbacks. Spam Karma can handle both comments and trackbacks and since it started working (with WordPress 1.5) I haven’t received a single spam, I didn’t even have to moderate much.

The con is Spam Karma only works that good because the spammers are really simple at the moment. They don’t even try to maskerade in any way. The good thing is: fake characters (e.g. “1” instead of lowercase-“L”) won’t help them. They just do this to get goodGoogle ratings but what good is a perfect Google rating on “texa5 ho1dem”? :-D

Java Performance and Garbage Collection

Saturday, February 26, 2005, 13:23 — 0 comments Edit

I’ve never really read a lot about Java performance tweaking which is something I’m going to change in the future. Partially because of a new job that will presumably require these skills, partially because it’s just quite interesting to see what the guys at Sun and other companies did under the hood of the JVM.

There are two general areas, optimization of your code and tweaking of the JVM parameters. I found really interesting information about GC tweaking at Sun (they have a truly ugly stylesheet for that article, but Opera users can use the custom stylesheet …).

More about memory management can be found in this article about Soft-, Weak- and PhantomReferences. I hadn’t heared about these and the package java.lang.ref at all before. Looks very interesting especially for server applications that need caches.

So far about memory management. I’m still lacking information about performance programming dos and don’ts. Apart from general stuff programmers should know (virtual functions, allocation/deallocation, …) I haven’t read much about things in Java. Can anyone recommend me a good book about java performance programming?

New Post