Martin Probst's weblog

Joel on the HPI

Sunday, January 9, 2005, 11:41 — 1 comment Edit

Joel on Software writes this:

The moral of the story is that computer science is not the same as software development. If you’re really really lucky, your school might have a decent software development curriculum, although, they might not, because elite schools think that teaching practical skills is better left to the technical-vocational institutes and the prison rehabilitation programs. You can learn mere programming anywhere. We are Yale University, and we Mold Future World Leaders. You think your $160,000 tuition entititles you to learn about while loops? What do you think this is, some fly-by-night Java seminar at the Airport Marriott? Pshaw.

The trouble is, we don’t really have professional schools in software development, so if you want to be a programmer, you probably majored in Computer Science. Which is a fine subject to major in, but it’s a different subject than software development.

Which basically sounds like the very idea of my University, the Hasso-Plattner-Institut for Software Systems Engineering. I wonder how long it will take until this is insight leads to a large scale change in our CS education systems. It’s not really new and everyone who has finished University in CS and starts to work knows it - when will the Universities start to do something about it?

Project management under GNOME

Saturday, December 18, 2004, 00:09 — 0 comments Edit

I just gave Imendio Planner a try. It’s a simple (compared to MS Project) yet very useful project planning application. With Planner you can easily define ressources and tasks and compile them in a Gantt chart. The most important features are available such as four types of end-start conditions within the Gantt chart, assigning ressources to tasks, sub-tasks and more.

The tool has a nice simple GNOME GUI and seems to be a lot easier to use than MS Project. It lacks the (IMHO) important feature of displaying your use of ressources which is a pity. While it exports to HTML and prints nicely an option to export the Gantt chart to some graphics format might be helpful too.

Anyway I would use this in upcoming projects as its very simple to use, free, and fits into my usual development environment.

C++ builds the easy way with scons

Monday, December 6, 2004, 19:50 — 0 comments Edit

One of the minor but still annyoing pitfalls of development with C++ are Makefiles. The syntax is rather cryptic, if dependencies are getting bigger large Makefiles have to be maintained and more complex tasks require really dirty hacks.

There are a lot of make replacements out there. Today I took a look at SCons which looks really nice. It’s written in Python and does not invent a new syntax but rather uses Python as the language to write build files in. Build files are declarative using calls to functions SCons provides to tell the system which targets have to be built. After executing the build script the targets are made using a set of implicit rules.

The major pros of SCons are the smart helper functions. You don’t have to define dependencies between source files - SCons takes care of that by scanning the files itself (supporting quite a nice set of languages already). Implicit rules are available for compiling executables, libraries (shared and static) and some other files. The developers claim it should be easily extendible (maybe I’ll try with antlr when I get some spare time). SCons doesn’t just look at file modification times but uses md5 hashes by default, which avoides the whole mess applications create when touching files accidently. Also SCons keeps track of the state of intermediate files - a change in a source file that doesn’t lead to a change in the object file won’t lead to re-linking libraries or executables. Because SCons does not recurse into nested directories (it rather “includes” sub-build files) it should also be quite good with multiple build jobs and/or distributed compiling - recursive makefiles are a major obstacle for this as the make execution only sees a few source files at a time.

The biggest pro is probably also a con - using Python as the Makefile language. This enables users to easily manage complex build problems using a real programming language. On the other hand it enables people to create really cryptic build files as the syntax does not have any concept of order, grouping etc. It should be possible to overcome this by employing templates, coding standards etc. but it adds another thing to control and manage.

Another con is that a POSIX compliant make should be available nearly anywhere while SCons would be another dependency. However if you distribute binary packages anyway this shouldn’t be that important.

The pros seem to overweigh the cons, at least for me. I think I’ll use it in future smaller C/C++ projects, if it’s evil despite the good impression I’ll find out all too soon I guess …

More XML sizes

Monday, November 29, 2004, 08:14 — 0 comments Edit

Yes Lars, you don’t really have to have size statistics if you’ve got growing containers. But what should the growth rate be? How big should the smallest container be?

I can imagine that most XML text nodes will be below 20 characters/bytes (only whitespace separating other tags), but what will be the next size step? Some size statistics about real wild life documents would be nive to have. This will also vary a lot across different document types and uses of XML. A smart XML DBMS would try to adjust its storage settings to the document in question.

Techwriter Wiki

Saturday, November 27, 2004, 18:54 — 1 comment Edit

Post for Lars:

[Übersetzungshilfe] Das Techwriters Wiki möchte Wissensbasis für technische Redakteure und Übersetzer werden. Es ist noch jung und kann jede Unterstützung brauchen. via Der Schockwellenreiter

XML Size Statistics

Saturday, November 27, 2004, 18:41 — 1 comment Edit

When storing XML in a database the single nodes are put into containers and stored on pages. Because it’s generally easier to have fixed-size containers (representing objects) it’s quite nice to do this with a default size and overflow containers.

But what default size should be used for what kind of nodes? We have to get some statistics on that point, but I get the impression that usually most text nodes are really small, e.g. not more than about 100-200 characters. Other nodes like elements are not that important as their name is usually only stored exactly once.

It seems as if the best solution was an incremental growth for the containers. Usually text nodes will be rather small (<50 chars) but if they are bigger than that the will probably be bigger than 100 chars or even 200 chars too. Most textnodes will be something below 10 chars though, at least for data oriented XML as opposed to document oriented XML (think of formatting XML with breaks and tabs between the elements). So the first text node container should be like 20 chars, the next size maybe 100 and thereafter really big ones. But these are only guesses - I need statistics on that.

DHTML Lemmings

Saturday, November 13, 2004, 22:56 — 0 comments Edit

DHTML Lemmings. Plain unbelievable.

Namespace prefixes in XML

Sunday, November 7, 2004, 18:34 — 0 comments Edit

In the latest W3C XQuery Working Draft the type xs:QName was altered. In former specifications it represented a qualified name as the namespace URI in combination with the local name, now the XQuery processor has to keep track of the user defined namespace prefix too. This seems to be a minor change which is useful to convert xs:QNames into strings, but in my opinion it’s a major change of the data model.

The question is whether to see an XML document as a text document or whether to interprete it as a tree of nodes. The former way has the pro that users editing XML documents with notepad will usually be less suprised by the actual results of queries. While this would be nice, I think it’s a horrible idea for a structure oriented query language, especially in a database context.

While designing an XQuery database we quite stumbled over such questions very often. What about whitespaces and indentation, what about character references, what about XML namespace prefixes etc., I’m sure there are still things to come. Others have run into this kind of problems too as you can read this post from Dare Obasanjo.

I think the only clean solution is to draw a line clearly separating the text representation and the tree representation of XML documents. In the tree representation, namespaces are just unique IDs and the prefixes are completely ignorable. Each qualified name has a namespace ID, but once it has been transformed from text to tree representation the namespace prefix is gone. Same goes for ignorable whitespace, character references and CDATA sections. Otherwise it becomes really tedious to store such things as where namespaces were declared with which prefix or you would even need to store texts twice, once in a normalized format usable for full text search and once in the representation the user expects. But what happens if these contents are updated?

Assigning prefixes, escaping non-representable characters to (character- or entity-) references, inserting CDATA sections and many other things are presentational logic. This might be handled by the XML editor or by an output filter when converting XML documents into their text representation, everything else clearly borks things outside the text representation. And even worse it keeps people thinking of XML as text with a bunch of angle brackets, as opposed to tree-structured data.

The Backside of GUIs

Thursday, September 16, 2004, 19:28 — 0 comments Edit

Don Park blogs about the Backside of GUIs. It sounds a little bit funny, but to me it seems to be a rather intuitive place for settings within a GUI.

Why not use the powerful features of coming windowing systems like Avalon or Looking Glass to provide settings on the backside of GUIs? No more searching through obscure settings dialogs, just turn around the very window whichs behaviour you want to alter. This could also be done for single dialogs of an application, like advanced search settings or similar things.

Choosing the right technology

Wednesday, September 1, 2004, 19:50 — 0 comments Edit

Mark Pilgrim blogs about how to choose the right technology, nomatter what for. Finally we can make great fact based decisions automatically.

New Post