A few more thoughts on concurrency and high-level languages
I little while ago, I started playing with a pattern-matching and relational language for meta-programming called 'kin' (it's got relations, and does pattern, so kin/kinship).
An old housemate and good friend of mine did his PhD in an embedded Erlang system; that's as close as I came to it before a year or so ago, when I started using XMPP at work and tried installing ejabberd. It didn't work, and I gave up after a few tries, so I tested our server's federation code against Wildfire instead. That's nothing to do with Erlang; ejabberd appears to scale well in independent tests, and I suspect the Windows installer wasn't a priority (our customers require Windows). I've read comments that there are smart people who have written good software in Erlang. This is true. Smart people also write good software in BASIC or COBOL or ADA; that's because smart people write good software. Writing a language that smart people can write good software in is easy; writing a language that average people write non-crap software in is far harder. For my opinion, for this sort of problem, ruby seems such a language (well, it seems much better than Perl or PHP, and a bit better than python), but the implementation is pants. Conversely, ejabberd scales to many more users than Twitter can, and I suspect is a cleaner expression of the problem, because IM's a concurrency problem and erlang far exceeds ruby's capabilities at concurrency.
Here's some good advice on how a high level language should express a problem in something I started reading earlier:
"The structure of the program should exactly follow the structure of the problem. Each real world concurrent activity should be mapped onto exactly one concurrent process in our programming language. If there is a 1:1 mapping of the problem onto the program we say that the program is isomorphic to the problem. It is extremely important that the mapping is exactly 1:1." -- Joe Armstrong (PhD thesis, Dec 2003).
You have a log file. You scan each line (in any order) for a hit and accumulate the number of times each hit occurs. Then you list the hits in decreasing order of the accumulated hit counts.
I don't see any concurrency intrinsic in the problem. The only reason we're even thinking about concurrency is that we want the implementation of the solution to the problem to be parallelised over many cores for performance reasons, and (as SIMD isn't sufficiently general to scale much beyond 4 times wider paths except in GPUs), many concurrent cores is the only thing we can think that will make computers have more throughput. For this problem, concurrency is as much an accident as memory management. Languages which handle memory management automatically have already eclipsed those which require manual memory management in most areas where it isn't an essential part of the problem. As it's not an essential part of the problem, no language where the concurrency is manifest in the implementation of the solution will be following the structure of the problem. Admittedly, I grew up on BASIC and 6502 assembler in my teens, and if you teach kids pi-calculus in high-school the next generation might see the problem differently.
Neo: What are you trying to tell me? That I can dodge bullets?
Morpheus: No, Neo. I'm trying to tell you that when you're ready, you won't have to.
With Erlang you can express a problem in terms of concurrency.
It may be a long time before ruby's ready.