[PATCH] D15600: AArch64: Add option to use shared epilogues in compiler-rt

Mon Dec 21 11:00:00 PST 2015

MatzeB added a comment.

In http://reviews.llvm.org/D15600#314648, @kristof.beyls wrote:

> I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
>  Did you also happen to measure the impact on performance?
>
> Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:
>
> - It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.
> - The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.
> - My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.

Yes I agree and I have been thinking about this as well, I disregarded the idea when I realized that we have no infrastructure to place basic blocks into different sections.
However thinking about this now, it may be possible to create pseudo functions on-the-fly just like the pseudo functions I put into compiler-rt, I'll look into this.

> Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.

We can just describe the contents of the block in a unique way (in this implementation the name contains all the restored registers in order of restoration).

> This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?

It may be possible to do something with the prologues as well, but as these require a function call or similar mechanism the performance impact seemed bigger.

Repository:
  rL LLVM

http://reviews.llvm.org/D15600