I've been thinking a bit about garbage collection and stack allocation, and both Robert O'Callahan and phaeron at virtualdub seem to have been.
Chicken scheme seems to have something interesting - a single stack which is collected on exhaustion (rather than C# or Java VMs which have a traditional program stack and a stack based allocator), so I'm thinking about implementing a tail-call-elimination variant of the single stack. That requires escape analysis, which may be costly - in its simplest form, it requires a flag to indicate whether any reference was made to anything on the stack above the point the tail call returns to. Either you can be conservative and set the flag on all function calls where a reference to a stack object is passed as an argument, or you can use a single assignment model and test whether anything was allocated. Making the core single-assignment would possibly complicate supporting other languages, and means you trade space in one condition - consing to hold the assigned value - for space in another - being able to recover some of the stack early in a tail call.
Having just the one stack means that, on x64 at least, you can use the one stack register.
Hmm. These things are all rather hard to do right, and don't have simple 40 line benchmarks.
See also a simple worked example.