[PATCH] D15600: AArch64: Add option to use shared epilogues in compiler-rt
Matthias Braun via llvm-commits
llvm-commits at lists.llvm.org
Mon Dec 21 11:00:00 PST 2015
MatzeB added a comment.
In http://reviews.llvm.org/D15600#314648, @kristof.beyls wrote:
> I think the general idea of sharing epilogues is a good idea - at the very least when optimizing for size.
> Did you also happen to measure the impact on performance?
> Overall, I'm wondering if it wouldn't be better to let the compiler put the epilogue functions in comdat sections (or the equivalent for non-ELF object formats), rather than having them in compiler-rt. I think doing so would have the following advantages:
> - It's possible to catch all epilogues, not just the N (16 in the attached patch) most often used ones as seen in a benchmark corpus.
> - The epilogues can more easily be tuned for specific cores when the epilogues are produced by the compiler rather than being stored in compiler-rt. E.g. I've been told that this technique also has been used effectively in other compilers when targeting AArch32. On some AArch32 cores using LDRD tends to be more efficient than using LDM.
> - My gut feel is that if over time we want to modify epilogues; a scheme where the compiler still emits the epilogues is the most flexible. Retaining all versions of epilogues in compiler-rt potentially required by all LLVM revisions ever used may end up being a bookkeeping nightmare.
Yes I agree and I have been thinking about this as well, I disregarded the idea when I realized that we have no infrastructure to place basic blocks into different sections.
However thinking about this now, it may be possible to create pseudo functions on-the-fly just like the pseudo functions I put into compiler-rt, I'll look into this.
> Obviously, a well-defined naming scheme will be needed to define the epilogue functions (e.g. should they contain a version number?), but I think that's true no matter whether the epilogue functions are produced by the compiler or inserted into compiler-rt.
We can just describe the contents of the block in a unique way (in this implementation the name contains all the restored registers in order of restoration).
> This also made me wonder if something similar could be done for function prologues? I couldn't immediately think of why it would be impossible - but the overheads involved probably will be higher than with epilogues, e.g. having to do a call to a prologue function, rather than doing a branch to an epilogue function?
It may be possible to do something with the prologues as well, but as these require a function call or similar mechanism the performance impact seemed bigger.
More information about the llvm-commits