Martin Probst's weblog

Java 3

Tuesday, May 20, 2008, 07:23 — 0 comments Edit

Ola Bini has posted an interesting list of features he’d like to have in a hypothetical Java 3.

There are some points I definitely wouldn’t like, such as “No primitive arrays.” - I know they are a pain to have in the type system, but getting rid of them and thus making it nearly impossible to implement custom data structures is really not the way forward. One should try to come up with a primitive collection-like data structure that better fits into the language and type system, but dropping them altogether is not a good idea.

What I’d really like to see in Java 3 are two things: more syntactic sugar for common things, some important platform features, identifiers as first class citizens, a structured way out of the type system, and named parameters. Which is roughly the order of difficulty in implementation and unlikeliness of having these features actually appear :-)

Maybe one should start a project to actually try this stuff out. I know that many people have stated that Ola’s list is just a description for a subset of Scala (or some other language), and they are right, in some sense. But this is an important point: I don’t want all the complexity of Scala. And I don’t like their implicits, the “object” feature - which is syntactic sugar for something that shouldn’t be there at all, static must die -, the sometimes cryptic syntax, the weird rules about operator/method precedence, and so forth. This deserves further qualification (a lot), but I’m just not happy with Scala.

I guess one should also look at the work done by Gilad Bracha (Newspeak) and possibly Ian Piumarta/Alan Key in their COLA stuff. The former, because Bracha introduces some really nice and useful features, the latter just because it’s totally awesome :-)

Google Groups being spammed

Monday, May 19, 2008, 06:17 — 1 comment Edit

I’m getting lots of comment spam attempts for some Google Groups pages, e.g. “http://groups.google.us/group/dyt-cheaptickets". The mentioned groups actually exist, and the spam content is being hosted by Google.

There is no “report abuse” link, so I have no idea how to tell the Google guys about this.

Also, I wonder how the spammers make it past the CAPTCHA?

Implementing a blog

Tuesday, May 13, 2008, 13:10 — 0 comments Edit

It’s quite funny. Implementing a basic blog is basically the “hello world” of web application frameworks. You simply shove titles, body text and some timestamps into a database and retrieve it from there, again. Next lesson is 1-to-many mapping with comments.

The funny thing is that implementing a blog that way is not really implementing a blog. Implementing a blog is writing a tiny handler that shoves stuff into the database and out of it, and then spending ridiculous amounts of time filtering comments for spam. Which is one of the really difficult things to do on the internet - validating users as real users, without being inaccessible.

XQuery pretty printer and grammar open source

Wednesday, May 7, 2008, 10:11 — 0 comments Edit

I just released my XQuery pretty printer, and with that the XQuery grammar, as open source, Apache 2.0 license. I’ve created a Google Code project for that: xqpretty.

I hope somebody finds this useful. I have a hunch that at least one company will take a look at it :-)

More XQuery pretty printing

Wednesday, April 23, 2008, 17:57 — 0 comments Edit

I’ve upgraded the XQuery pretty printer once again. It’s actually surprisingly difficult to get whitespace handling and indentation only close to ‘right’, at least in a language that is syntactically as complex as XQuery.

This version should insert whitespace at the correct places, properly handle long lines in many more cases (before, you’d get long runs of empty lines in some cases), and remove whitespace in other cases.

I’ve also added some options for the curious. You can now display regular HTML, pure HTML with no other tags around it, plain text, indented, the parse tree before formatting, and the HTML before it is run through the indenter. The options are available through the XQuery formatter form, too.

Of course you can get all the other formatting modes without HTML decoration around it, too. And as you can see you can easily construct URLs for the formatter. So the only thing left is to wire this up to my JavaScript syntax highlighting in this blog, and I’ll have nicely formatted XQuerys!

JSPs suck, episode 217

Tuesday, April 22, 2008, 13:41 — 2 comments Edit

Turns out that the JSP parser doesn’t even try to be an HTML parser (I figured it’s not an XML parser long time ago…). If you comment out JSP code like this:

<!--c:if> ... bla ... </c:if-->

You’ll get this:

org.apache.jasper.JasperException: /WEB-INF/jsp/task/showTask.jsp(109,4) The end tag "&lt;/c:if-" is unbalanced

I truly have to find myself another Java templating engine/language. JSPs come out of the box, but they are simply to annoying in the long run. I guess the effort to integrate something - anything! - else would already have payed off.

Wiring up Jersey JAX-RS & Hibernate

Tuesday, April 22, 2008, 11:42 — 0 comments Edit

Recently I started playing with JAX-RS, and in particular Jersey’s implementation thereof. JAX-RS is specified in JSR311. The goal is to create a common annotation-based library to ease the development of RESTful Java web applications.

So far, this is really nice. The basic idea is that you annotate your Java classes that represent resources with URI templates through the @Path annotation, and the framework takes care of mapping requests, parameters and so on. At the moment, this approach seems a lot more natural to me than other stuff I tried (Spring MVC for example).

Of course, some assembly is required ;-) My goal was to wire up Jersey’s resource classes with Hibernate (JPA flavor), so that resources can directly map to persistent classes. This should allow for a pretty natural development process - decide on your resources, annotate them for persistence, and publish them on the web through Jersey annotations.

I had to jump through three hoops to get this working.

Opening a transaction for web requests

This was fairly simple. If you resources are persistent classes, you’ll need a database session to load and manipulate them. I created a class (“Persistence”) that stores Hibernate Sessions in a ThreadLocal variable, so that resources can access it.

The only trouble was that Jersey’s default ServletContainer class has it’s service() method marked final, so I couldn’t override it to open a transaction before the request and close it afterwards. After some playing with AspectJ, I figured that it’s easier to simply create my own HttpServlet that only wraps Jersey’s servlet. All methods forward to Jersey’s ServletContainer, but the service() method looks like this:

  @Override
  public void service(HttpServletRequest req, HttpServletResponse resp) throws ServletException,
      IOException {
    System.out.println("Start transaction");
    String method = req.getMethod();
    boolean readonly = "GET".equals(method) || "HEAD".equals(method);
    try {
      persistence.open(readonly);
      container.service(req, resp);
      if (readonly) {
        System.out.println("Read only - rollback");
        persistence.rollback();
      } else {
        System.out.println("Transaction commit");
        persistence.commit();
      }
    } catch (RuntimeException e) {
      System.err.println("Transaction rollback");
      persistence.rollback();
      throw e;
    }
  }

Paul Sandoz of Jersey changed service() to be non-final, so in the future you should be able to simply inherit from ServletContainer for this.

Loading persistent resources from Hibernate

By default, Jersey simply instantiates resource classes (those with a @Path annotation) using the default constructor. Of course this doesn’t work when your classes are supposed to be loaded from a persistent store.

There are two possibilities here. First is to always access persistent classes through a container class that represents the whole collection, and that isn’t persistent itself:

Persistent sub-resources

@Path("books")
class Books {
  @Path("{id}")
  Book getBook(@PathParam("id") long id) { ... }
}

Pro: fairly simple. We’ll see how to properly inject the Hibernate session into resource classes in the next session.

Con: your paths are slightly less cool. You’ll have /books/12 instead of /book/12, and also you’ll be in trouble with implicit views where /books/new is supposed to map to Books, but then use some new.jsp to render an HTML input form. Also, if you have many sub resources it might be a bit ugly to force all requests through some initial container class, instead of only manipulating the nested resource.

Hooking into resource creation

The second method is to hook into Jersey’s way of instantiating resources. Instead of using the plain SerlvetContainer from Jersey, you can override it and tell it to use your own ComponentProvider, which is basically your own IoC container.

I created a HibernateComponentProvider like this. Persistence is our persistence class from above:

public class HibernateComponentProvider implements ComponentProvider {

  private final Persistence persistence;

  public HibernateComponentProvider(Persistence persistence) {
    this.persistence = persistence;
  }

  public <T> T getInjectableInstance(T instance) {
    return instance;
  }

  public <T> T getInstance(Scope scope, Class<T> c) throws InstantiationException,
      IllegalAccessException {
    T o = c.newInstance();
    inject(o);
    return o;
  }

  @SuppressWarnings("unchecked")
  public <T> T getInstance(Scope scope, Constructor<T> constructor, Object[] parameters)
      throws InstantiationException, IllegalArgumentException, IllegalAccessException,
      InvocationTargetException {
    Class<T> clazz = constructor.getDeclaringClass();
    Entity entity = clazz.getAnnotation(Entity.class);
    if (entity != null && parameters.length == 1) {
      // assume find by id
      T instance = (T) persistence.getSession().get(clazz, (Serializable) parameters[0]);
      if (instance == null) {
        throw new WebApplicationException(Response.Status.NOT_FOUND.getStatusCode());
      }
      inject(instance);
      return instance;
    } else {
      T instance = constructor.newInstance(parameters);
      inject(instance);
      return instance;
    }
  }

  public void inject(Object instance) {
    // no-op for us
  }

}

This is a bit hackish. If a resource has a visible default constructor, getInstance(Scope, Class) will be called. If we instead have a public constructor that takes a single long parameter, getInstance(Scope, Constructor, parameters) will be called. At that point, we can check if the class is a persistent class (marked by the Entity annotation), and if so, load it from Hibernate instead of instantiating it the normal way.

The trouble is that I don’t quite understand how Jersey picks the proper constructor yet, so this might break. Also, it kind of sucks that we have to provide a constructor that is never going to be used, just for Jersey to give us access to the parameters. But maybe this will get better in the future?

Anyways, you can now write your persistent class like this:

@Path(“book/{id}”)
@Entity
class Book {
  public Book(@PathParam(“id”) long id) { /* no-op, or whatever */ }
}

And you’ll get the properly loaded persistent instance automagically.

Pro: nice URLs, more natural development, somehow seems right to me.

Con: some more work, resource classes might get injected twice, no idea if this will break due to the constructor picking stuff. Also, you may not provide another public constructor, which is kind of against the spirit of non-invasive annotation based frameworks. Hopefully we’ll hear more about this.

Injecting persistent resource classes

One more thing we have to do is inject our persistent classes after load with the fields managed by Jersey. The problem is that if we simply load a class from Hibernate, e.g. as the result of a query, it will bypass Jersey’s injection, and injected properties such as UriInfo or HttpContext will not be set.

Hibernate provides hooks into the object life cycle via the org.hibernate.Interceptor interface. I’ve changed my Persistence class to extend org.hibernate.EmptyInterceptor, and override onLoad() like this:

@Override
public boolean onLoad(Object entity, Serializable id, Object[] state, String[] propertyNames,
    Type[] types) {
  provider.inject(entity);
  return super.onLoad(entity, id, state, propertyNames, types);
}

Provider is a ComponentProvider we obtained from the WebApplication (see next section), and through the inject() call the loaded entity will get it’s UriInfo and friends.

Combining this with the above HibernateComponentProvider will mean that in some cases (directly loaded root resources) classes will get injected twice, but that shouldn’t be a problem. Well. I hope :-)

Wiring it all up

The only thing left is to wire all this stuff together. As I said, we have our own servlet that simple wraps a ServletContainer from Jersey. But we’ll have to extend this ServletContainer to override it’s configuration methods to supply our own component providers etc. Enter HibernatingServletContainer, a nested class of our custom Servlet:

  private final class HibernatingServletContainer extends ServletContainer {
    @Override
    protected void configure(ServletConfig sc, ResourceConfig rc, final WebApplication wa) {
      super.configure(sc, rc, wa);

      wa.addInjectable(Session.class, new Injectable<Context, Session>() {
        @Override
        public Class<Context> getAnnotationClass() {
          return Context.class;
        }

        @Override
        public Session getInjectableValue(Context a) {
          return persistence.getSession();
        }
      });

      wa.addInjectable(ComponentProvider.class, new Injectable<Context, ComponentProvider>() {
        @Override
        public Class<Context> getAnnotationClass() {
          return Context.class;
        }

        @Override
        public ComponentProvider getInjectableValue(Context a) {
          return provider;
        }
      });
    }

    @Override
    protected void initiate(ResourceConfig rc, WebApplication wa) {
      persistence = new Persistence();
      wa.initiate(rc, new HibernateComponentProvider(persistence));
      provider = wa.getComponentProvider();
      persistence.setComponentProvider(provider);
    }
  }

This basically creates our persistence instance (which loads the Hibernate configuration and provides transactions), registers our own HibernateComponentProvider with the web application and retrieves the ComponentProvider from Jersey’s WebApplication (which wraps our own provider) so we can use it later.

First, configure() is called. This method delegates to it’s parent, but then adds two additional injectable classes: the Hibernate Session, retrieved from persistence, and the ComponentProvider itself, which we’ll need later.

Then initiate is called, which sets up our Persistence layer and our ComponentProvider. All happy and fine, now we can start to write our actual application code :-)

Using it

Using this code you will have to remember two things: classes that need Hibernate sessions, e.g. for queries or to persist instances, will need to get them injected like this:

@Path(“books”)
class Books {
  @Context Session session;
}

And when you need to instantiate sub-resources, for example when a resource is created by a POST request, you’ll need to create them through the ComponentProvider. This way, their own @Context annotated fields will be set up properly:

@Path("books")
class Books {
  @Context Session session;
  @Context ComponentProvider provider;
  
  @POST
  @ConsumeMime("...")
  public Book createBook(...) {
    Book book = provider.getInstance(Scope.ApplicationDefined, Book.class)
    book.setFields(...);
    session.persist(book);
    return book;
  }
}

It might be easier to create resources from their representations through a BookProvider or similar, but I’ll have to look into that another time.

I hope this will help people to get started with Jersey and Hibernate. I’m still researching this stuff, so maybe I’ll come up with a better solution later. I’m also not yet sure if the whole thing - directly accessing persistent classes as resources - turns out to be a good idea, but I’m quite confident.

XQuery Pretty Printer, ctd.

Friday, April 18, 2008, 09:53 — 0 comments Edit

I’ve continued to tweak the XQuery pretty printer a bit.

It now properly breaks long lines into overflows that are lined up with the preceding open parenthesis (for now, suggestions for additional “tab markers” are welcome). Overly long lines that cannot be broken properly lead to a simple double-indent, as before.

I’ve also decided to brutally ignore any leading and trailing whitespace (“boundary space”) in direct element constructors (XML literals), and simply format the contents. While this is against the standard, or at least should only be done if the query doesn’t explicitly declare boundary-space preserve, I think it’s beneficial in most cases. Also, I don’t really have an idea on how to nicely format XML literals when whitespace is significant. The result probably always looks somewhat ugly and doesn’t fit with the indented rest of the query.

Next step would be the formatting of comments and xqDoc tags, but I think that will have to wait some time. I’d need to implement another sub-lexer for that, as the comments currently simply come as one large string into the formatter. Not that much effort, but I think I should concentrate on my Master’s thesis a bit.

By the way, it’s possible to simply link to a formatted query by copying the URL, like this. The Formatter Servlet has no side effects, so it uses a stateless GET request.

Threads vs. Processes in Java & Co.

Friday, April 11, 2008, 08:13 — 5 comments Edit

Erik Engbrecht posted a very interesting article on Multiprocess versus Multithreaded System design. It includes a lot of insight, but just one quote:

One one hand on most operating systems threads involve less overhead than processes, so it is more efficient to use multiple threads than multiple processes. On the other hand multiple processes ultimately will give you better reliability because they can be spawned and killed independently from one another.

[…]

My rule of thumb is to look at the amount of shared data or messaging required between concurrent execution paths and balance against how long the “process” (not OS process) is expected to live.

I think one important point is missing here. It’s the question of what your underlying framework is, and how you access it.

Early on, C-written CGI applications were an acceptable solution as the applications mostly used the underlying UNIX system as their framework and API. Spawning a simple process that more or less only accesses the basic C library in a short lived transaction is fast and safe.

But eventually applications get more and more complex, and you will have more and more shared functionality that is accessed in a library fashion. You will want to write this in the same language and environment as your application, as this makes debugging, lifecycle management etc. much easier.

As these libraries start to get more complex, their initialization will start to take significant time and resources. And at some point, you will no longer be able to load them every time a request hits your CGI program.

So now you can either put those into some kind of daemon that is accessed by IPC, or you will have to create a long running server process that handles multiple requests, either sequentially (FCGI) or even truly multithreaded like todays Java servers.

It’s also not just the cost of initializing these libraries, it’s also their footprint. As I’ve written before, running a herd of mongrels can get really expensive, memory wise.

MVM a solution?

That being said, losing the simplicity and reliability of processes is indeed highly problematic. I’m not sure about the state of Sun’s Multi VM project, which was designed to solve some of these issues.

Maybe this thing could be overcome if our programming languages allowed us to define application parts that are effectively stateless, ambient libraries, executing more or less purely functional (well, except for logging and such…) once they are initialized.

If you could define such constraints on program parts, you could have your cake and eat it, too. Multiple instances of your application could share a possibly very large part of their code base, but you could still kill single instances at will, as they don’t share state with other instances.

The problem is with the “more or less” part of “purely functional”. I’m not sure if it’s actually possible to find that niche where you can still have some state. And of course you would have to be ably to statically prove those constraints, or else everything is moot.

Also highly interesting in this context is Microsofts Singularity, which achieves a runtime that doesn’t need processes but still has all the nice properties of processes (isolation, ‘kill -9’), due to some carefully chosen, statically verifiable constraints.

Memory usage, Java vs. Rails

Wednesday, April 9, 2008, 09:52 — 3 comments Edit

This blog is running on a virtual server at Hosteurope. The server is running a plain Apache 2, my Subversion repository, this blog, and since yesterday a Tomcat instance for the XQuery pretty printer.

The interesting part in getting Tomcat to run was (apart from some mod_jk problems) that I exhausted the 128 MB of memory my virtual server has. Which surprised me a bit - the JVM instance should not consume more than 64 MB, so who was taking all that memory?

This is the current list, according to RSS, which is not precise, but still:

The interesting point is that due to Ruby not supporting kernel threads and Rails generally being single threaded (boo!), I end up with a lot more memory consumed for e.g. a simple blog running 3 mongrel instances plus ferret. RSS includes shared libraries and non-writable memory, but pmap -d reports something like 41592K non shared, private writable memory for the mongrel process.

I had to reduce the number of mongrels to one, so that I can run a Java VM on this host. Thus, my server can only process one concurrent request from one user at this time. When the Ruby process is waiting for filesystem or database operations, nothing else gets processed, even though CPU time would be available.

This is indeed pretty annoying. Some 50 MB don’t seem much memory on a regular desktop machine these days, but for (shared) hosting, it still is a large issue. In the case of my puny blog this isn’t really a problem, plus I’ve set up decent caching so most requests will be handled directly by Apache, but still…

Maybe switching to Merb and Ruby 1.9 (or JRuby) would be useful, though I have no idea if that really runs concurrently in one process, or if it’s just generally possible.


New Post