2009-06-19

Language-level Immutablity and Shared Memory Concurrency

I've been a fan of Bartosz Milewski’s Programming Café ( or possibly Bartosz Milewski’s Programming Caff; I'm not sure if lack of an accent implies greasy chips and tea outside of the UK ). He's exploring the options of applying the leverage of type checking to concurrency problems within shared-memory systems. In a different camp, there's the Erlang crowd, typified by Steve Vinoski which says that language-level immutability and message passing is the way to go. A third perspective, probably closer to what I would like to see, are CSP based approaches such as Limbo and stackless Python, which are shared nothing and use channels to pass messages by value. The channels and pass-by-value approach I'm experimenting with in kin is close to the Limbo model, except that channels are objects with support different messages. Apart from the D extensions by Bartosz Milewski, these techniques require the language integrates with a scheduler, usually a green threads system rather than the OS scheduler.

One thing which bothers me about Erlang and pure-functional languages in general is the assumption that language level immutability has some bearing on concurrency.

It's fairly obvious that any message shared by machines on a network will be unlikely to include memory addresses in one of the processes, other than possibly as opaque values. In other words, if it's on the net, it's pass by value. Similarly most inter-process communication is pass by value, and the sharing of messages containing values and names of channels is well understood using CSP and pi-calculus. The net is unconcerned about what paradigm the languages which implement the processes communicating by message passing; it only cares that the messages are values and names. It doesn't follow at all that the implementation language for a process which operates on messages needs to present immutable variables to the programmer.

The other area functional languages have been touted has been for problems where there are shared memory concurrency within a process.

A trivial counterexample is given by lazy evaluation - in trivial implementations, expressions are passed around as thunks, consisting of a place to store the value of the expression, and a function to evaluate the value. The expression is evaluated lazily by calling the function the 'first' time the value is needed. Being a pure function, it doesn't matter as far as the value is concerned whether the function is called multiple times in different cores, but if the value returned is a non-primitive there are still write order issues with sharing the value. So you end up with similar barrier ordering problems that you have in non-pure languages, and barriers cost performance.

For strict functional languages, you still need the barriers, and other synchronisation points, if you are going to communicate values between cores. If you are passing messages by value, then it is necessary that the sender or receiver of a message cannot change the value of the message. This isn't necessarily the same as language level immutability - in a shared memory space, values sent on a channel should not change, but that can be achieved by copy-on-send or copy-on-write approaches as well as by general immutability. Having a shared memory space between threads requires synchronisation with the garbage collector - otherwise memory can be reclaimed and rewritten, breaking the immutability. None of this is particularly surprising, and the implementation of any SMP shared-memory language with automatic memory allocation will have to deal with it.

The presence of immutability at the language level allows the optimisation that a message can be passed as an immutable reference rather than by value. This introduces the pessimisation that any garbage collection of the arena that the messages' storage is allocated from has to be coordinated between all threads. If the optimisation of mostly-concurrent memory management is made, where each thread has its own arena, memory for messages is either allocated from a special shared arena when constructed, requiring some synchronisation, or messages are copied when sent.

It's fairly obvious that if you are allocating from a shared arena, then you're having to serialise some of the requests, so are losing concurrency ( as well as having to do escape analysis so you know what objects constitute the message and so require to be allocated from the shared arena ), and that if you're copying messages into a shared arena when sent then you have lost the pass-by-immutable-reference optimisation. Either way, I've yet to see a truly convincing case that immutability in itself actually gives any performance gain for concurrent code and typical message sizes ( if you have very large messages which are not referenced by the sender, then any copying scheme will be inefficient, but that seems to be a case for escape analysis rather than immutability; copy-on-send would be at a disadvantage there, but Erlang style encourages small messages anyway ).


So the problems in terms of performance are acquiring lock on the memory bus, which takes around 40 cycles for a lock cmpxchg, barriers to order access, and copying. Languages with channels as first-class constructs hide the details of locking and barriers, and can trade various copying or shared memory techniques.

I've not seen anything in Erlang that would suggest that its shared nothing messaging is less prone to programmer error than Limbo's channels. There are other reasons to use pure-functional languages in terms of proving theorems about the program, and below the language level some optimisations are facilitated, and some are harder to perform. But there seems to be a confusion between which bits of a particular combination of facilities in a language are actually supplying the gains, and my opinion is that message passing combining values and channel names ( however implemented or enforced ) is both safe, and open to analysis at the systems level by pi-calculus.

Labels: , , ,