Martin Probst's weblog

Why do XML APIs suck?

Wednesday, February 9, 2005, 00:32 — 3 comments Edit

The complexity of XML parsing APIs seems to be a general complaint about XML parsing APIs. So why do these APIs suck?

I've worked with the three major API styles myself (DOM, SAX, XML Push thingies) and yes, they do suck. But if you come to think about why they suck and how to make them better you'll find out it might be about your programming language. I've used these APIs with Java and C++ (or C) and it was unbelievably complex and hackish to navigate and recognize the XML structure. Even navigation to some element at the child axis takes a lot more code than it should. Creating XML is just a nightmare, just writing out pointy parentheses to a stringbuffer or the equivalent is way easier than using any API. But after all, how would you formulate that in Java (without just using an XPath library) in a better way?

I think the problem is using an imperative language with good support for single leveled structured data of statically known types to query/modify/whatever a data type which is strongly oriented on hierarchical, dynamic, ordered structures. To really manage this you would need a language that provides built-in support for lists, hierarchical navigation and a good approach to dynamic typing. Also it would need to be extensible to really mary the XML support with the language. So you could either go and create/use something like EAX or CΩ (C-omega) or start with XQuery. XQuery sounds like a better candidate as stable engines with good typing support seem to be a lot less science fiction than the other languages.

Implementing something XML-ish starting of with writing a SAX consumer is IMHO just the wrong approach. It seems to be like implementing a GUI application starting of with raw drawing primitives and a user event queue. Those things have to be done, but they should be done ideally only once. In a slightly less ideal world it will be several times but at least not every application programmer.

Oh and yes, there will be a performance drawback with using an interpreted high-level language like XQuery. But if you really need that performance in the area where your application is dealing with XML you might be either one of the guys who really has to use DOM and SAX or your doing really strange stuff ;-)

Someone in the wild (not an XQuery stakeholder) who likes the idea of XQuery as a programming language! Interesting thoughts, and I agree with the basic thrust that XML support should evolve into languages with first-class support for tree-structured data and the XPath-like operations on the trees. The static vs dynamic issue seems to be very dependent on use case – with XML that is very much like serialized objects, static typing is a Good Thing; for loosely structured documents, it tends to get in the way.

My bias is that XQuery is a Good Thing … as an XML Query Language. I’m very skeptical about it as a programming language: First,it is currently read only; what interesting things can you do besides extract information? Second, it is burdened by the XSD type system, which has few friends and many people who detest it. Finally, it is no less science fiction-y than the alternatives – NONE of this has been proven with serious applications in practice.

Basically, Microsoft shared your opinion of XQuery a couple of years ago, invested heavily in making it real, and became disillusioned as the problems multiplied. Maybe others will have better success, but until then ….

I’m sorry Mike, I am an XQuery stakeholder. I’ve been implementing it for the last 6 months so it’s not really someone from the wild you’re talking to.

I don’ really agree to your skepticism though, I think XQuery has the chance to really take off as a good interface to XML data. Working implementations are already available (e.g. Saxon, but many proprietary products too) and they will start to attract a user base. XQuery is a lot easier to handle than XSLT and I really think users are waiting for something like this.

There still have things to be done, like modifying queries and possibly in the future a standardized way of connecting XQuery and imperative languages in a better (closer and more native) way than JDBC.

AFAIK there are already proprietary extensions to XQuery to provide update facilities in Tamino so a standard for doing that seems to be really needed.

XSD does suck somehow, but after all it’s what everyone is using.

XPath is a nice DOM wrapper, but what to do about SAX?

Martin Probst complaints about overly complex XML-APIs and recommends the use of XQuery as a language that can be used to work with XML data programatically, that does not suck.

In most cases XQuery is more than people need in their applications. I