[llvm-dev] Replace call stack with an equivalent on the heap?

Fri Dec 15 19:25:45 PST 2017

> 15 dec. 2017 kl. 20:51 skrev (IIIT) Siddharth Bhat via llvm-dev <llvm-dev at lists.llvm.org>:
> 
> One major difference is that GHC uses a "stack-in-the-heap", in the sense that the a is a chunk of heap memory that functions effectively as call stack. It is managed explicitly by GHC. GHC does not use the native call stack _at all_. This is to implement continuation passing, in a sense.
> 
> I want to check that the difference in the performance is truly from this "stack in  the heap" behavior, so I want to teach a backend to generate code that looks like this.

If the indirect jumps via this stack are all made from the same location in the GHC runtime, then perhaps this might kill branch prediction. call/ret addresses are duplicated in a branch predictor’s own hardware stack. If the runtime doesn’t use call/ret, not even indirect calls, then this functionality is lost and general branch prediction logic has to kick in. The predictors may then either fail for a long time before a pattern is recognized via the jump site in the runtime, or may even not recognize it and always mispredict. There may be differences between perceptron and other techniques in this case, so see if it’s equally bad on chips that use perceptrons (some AMD?) if yours doesn’t. 

There should be a smoking gun somewhere in the performance monitoring registers then, I’d hope. It’d be very obvious - persistent branch misprediction at the level of functions costs dearly.

I have rather cursory knowledge in this area so perhaps the state of the art is way ahead of my imagination, or perhaps I’m dead wrong.

Cheers, Kuba