How invokedynamic makes lambdas fast

2024-01-10

Posted by Ingo Kegel

Recently, we have been at work rewriting our website in Kotlin. Instead of a view technology that uses string templates with embedded logic, we now use the Kotlin HTML builder to develop views as pure Kotlin code. This has a number of advantages, like being able to easily refactor common code. Also, the performance of such views is much better than that of string templates, which contain interpreted code snippets.

When measuring the performance, we noticed that a lot of anonymous classes were created for our views and their loading time was significant. Code that uses the Kotlin HTML builder is very lambda-heavy and as of Kotlin 1.9, lambdas are implemented as anonymous classes. The JVM has a sophisticated mechanism to avoid creating classes at compile time that was introduced in Java 8 - the LambdaMetafactory and invokedynamic. The JVM developers also claimed that the performance would be better than anonymous classes. So why does Kotlin not use that?

As it turns out, Kotlin can optionally compile lambdas with invokedynamic in the same way that Java does, by passing -Xlambdas=indy to the Kotlin compiler. This has been supported since Kotlin 1.5 and will become the default in the upcoming Kotlin 2.0 release. The great thing about having both compilation strategies available, is that we can compare how anonymous classes and invokedynamic compare in a real-world example.

First of all, the number of classes for our entire website project was reduced by 60% (!) when compiling with -Xlambdas=indy. Here you can see the list of classes for our store view with both compilation modes:

For that particular view, the cold rendering time was improved by 20%. This was simply measured by wrapping the rendering with a measurement function. How about the warmed-up rendering time? The times there are much shorter, and we need to introduce some statistics by making many invocations. This is easily done with JProfiler and has the added benefit that we can also compare the internal call structure.

With the default compilation to anonymous classes, we recorded 50 invocations after warm-up and got 12.0 ms per rendering:

With invokedynamic compilation, the time per rendering was 10.6 ms:

This is an improvement of 12% which is surprisingly large. This test does not measure the difference between the invocation mechanisms in isolation, but it includes the actual work that is done to render the store view. Against that baseline duration, a speed-up of 12%, - just by changing the compilation mode for lambdas - is quite impressive. Many DSL-based libraries in Kotlin are lambda-heave, so other use cases may also produce similar numbers.

By looking at the call tree, we can see that the version with anonymous classes makes 3 calls, instead of one: First, it instantiates the anonymous class and passes all the captured parameters:

Then it calls a bridge method without the captured parameters, which in turn calls the actual implementation:

Looking at the bytecode, we can see that a number of instructions are required to store the captures parameters into fields, and the bridge method also contains instructions that add to the overhead.

With invokedynamic compilation, the generated lambda methods are in the same class:

This works because the lambda instances are created by invokedynamic calls to so-called bootstrap methods.

Bootstrap methods are structures in the class file that contain signature information for the lambda and a method handle reference to a static method. The LambdaMetafactory then efficiently creates an instance for the lambda.