Thursday, July 25, 2013

The Future of the JVM

Here is a panel discussion on the Future of the JVM:

They cover a lot of stuff: Atomicity and barriers, CPU architectures, garbage collection, compiler optimization, column compressed stores, packed and structs in Java, inlining, embedded systems, Scala, locks, transactional memory, atomic operations, M-CAS, IPv6, ARMv7 vs ARMv8, thread affinity, playstation's cell processor, cache-coherence, parallelism, distributed memory models, memory barriers, relaxed atomics, JVM profiles, native calls, phantom references, concurrent GC, code profiling, streams, etc

It's worth watching for the fact that it has one of the few recorded public appearances of Doug Lea, even though it seems he was on an off day, he still had plenty of insights to share. But IMO, the star of this show is Cliff Click. He's the guy that talks really fast and looks like the main character from "V for Vendetta".

One interesting thing Cliff says around minute 19, is that longs are actually atomic (as a far as he knows) because the HotSopt took care of that some time ago when they had to make references atomic in 64 bit JVMs. Keep in mind that the Java Memory Model enforces references to be atomic even if they're not volatile, and so the result is that the HotSpot guys had to make longs atomic as a byproduct, because on 64 bit machines the references are 64 bit wide (like a long).

Another interesting bit is at minute 10, when Cliff Click says this:
(...) my experience with keywords in language programming that do things involving performance is that they have a technology lifetime of about 5 years and then they become stale, so, there was a register keyword in C, and for a good 5 year window it was crucial for performance, and then all the good C compilers did register allocation well and you didn't care about it any more.
And then there was an inline keyword that had made a longer run of 7 or 8 years were people weren't inlining intelligently enough, and now every JIT for Java does a really good job of inlining and so no one cares about inlining keyword and now we're going to have a packed keyword, and there's probably a 5 to 10 years span where people make useful work out of packed, but at some point people will figure out how to do this automatically and you won't care any more. (...)

This is a powerful insight that stuck with me, and it applies to many things in the programming world.
There is one thing to keep in mind about this idea, some people believe that this is a universal truth, a kind of "just wait a few more years and someone will solve it" mentality, and the problem with that is that concurrent algorithms do not fall into that category. Let me try to explain my reasoning:

The register keyword was created to solve an NP-Hard problem which is the discovery of the optimal allocation of registers to perform a certain computation block, and similarly, the inline keyword was created to solve the optimization problem of when to inline blocks of code (not sure this one is NP-Hard but it should to be at least NP-Complete). For both these problems there are now known algorithms that can give an approximate solution in polynomial time, see for example the wikipedia page:

For concurrent algorithms there will never be anything like this, the reason being that, proving that a certain concurrent algorithm is correct, is an NP-Hard problem, for which an approximate solution will never be satisfactory. You can't just approximately prove that an algorithm is correct, it either is, or it is not.
No matter how smart the future generations of computer scientists and compiler experts will become, solving these kind of problems will always require the usage of an algorithm that grows exponentially in time and memory with the number of atomic operations, (or for the Java developers, it grows with the number of load/stores on volatile variables).
Even finding a modification of a concurrent algorithm that gets the same output seems to be something that is NP-Complete by itself, but maybe for that optimization problem it will be possible to find polynomial approximate algorithms, so I'm not sure. The one I'm certain of is, that for a compiler to optimize a piece of concurrent code such that it replaces atomic variables or changes fences/barriers, it will require the compiler to prove that the newly generated code is still a correct algorithm, which is an NP-Hard problem, and therefore, takes experiential time to solve... and no one wants to wait exponential time for a program to compile.

This doesn't mean that we can't have automatic proofs for concurrent algorithms (they seem to already exist), it's just that running them even on a small algorithm may take a large amount of time and memory, which makes it practical only for proving short algorithms, or some usage to researchers of concurrent algorithms.

Concurrency is hard... in fact, it is NP-Hard!

On a different note, the quote of the day, by Doug Lea (at minute 27): "Pinning threads is for cowards and weaklings!"   :)