[llvm-dev] Replace call stack with an equivalent on the heap?

Sat Dec 16 04:18:58 PST 2017

Hey,

Thanks for that idea, I had not considered that. I ran perf on the same
examples as before. However, I do not see a large difference in branch
mis-predicts. Link to gist with numbers here
<https://gist.github.com/bollu/7a9989a727ed4bf1c118dbcf386d4fc1>. Is there
something else that I missing? The instruction mix, perhaps? The C version
has roughly 1 more instruction per cycle. I am not sure how significant
that is, however.

Is there some other way to pin down what is going on in terms of slowdown?
(read the asm? profile?)

Thanks,
~Siddharth

On Sat, 16 Dec 2017 at 04:25 Kuba Ober via llvm-dev <llvm-dev at lists.llvm.org>
wrote:

>
>
> > 15 dec. 2017 kl. 20:51 skrev (IIIT) Siddharth Bhat via llvm-dev <
> llvm-dev at lists.llvm.org>:
> >
> > One major difference is that GHC uses a "stack-in-the-heap", in the
> sense that the a is a chunk of heap memory that functions effectively as
> call stack. It is managed explicitly by GHC. GHC does not use the native
> call stack _at all_. This is to implement continuation passing, in a sense.
> >
> > I want to check that the difference in the performance is truly from
> this "stack in  the heap" behavior, so I want to teach a backend to
> generate code that looks like this.
>
> If the indirect jumps via this stack are all made from the same location
> in the GHC runtime, then perhaps this might kill branch prediction.
> call/ret addresses are duplicated in a branch predictor’s own hardware
> stack. If the runtime doesn’t use call/ret, not even indirect calls, then
> this functionality is lost and general branch prediction logic has to kick
> in. The predictors may then either fail for a long time before a pattern is
> recognized via the jump site in the runtime, or may even not recognize it
> and always mispredict. There may be differences between perceptron and
> other techniques in this case, so see if it’s equally bad on chips that use
> perceptrons (some AMD?) if yours doesn’t.
>
> There should be a smoking gun somewhere in the performance monitoring
> registers then, I’d hope. It’d be very obvious - persistent branch
> misprediction at the level of functions costs dearly.
>
> I have rather cursory knowledge in this area so perhaps the state of the
> art is way ahead of my imagination, or perhaps I’m dead wrong.
>
> Cheers, Kuba
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-- 
Sending this from my phone, please excuse any typos!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20171216/13649a78/attachment.html>