[PATCH][AArch64] Use 8-byte load&store for inlined memcpy() on Cortex A53
james at jamesmolloy.co.uk
Tue Jul 15 12:54:52 PDT 2014
Thanks for working on this! The answer is slightly more involved though, I
As shown in your testcase, your change emits the sequence "ldr; str; ldr;
str". The ideal expansion is "ldp; stp; ldp; stp;". That way we still do
128-bit loads and stores.
In fact, our microarchitects have recommended (through internal channels)
that the "ldp; stp" sequence be used for memcpy-like operations - this will
give portable performance. Therefore, the change should also be made for at
least A57. I'll let Tim or Jim comment on Cyclone.
So to generate "ldp stp", the inline memcpy expander needs to generate
"ldr; ldr; str; str;". The ldp/stp pass will then squish these together.
A similar thing is done in the ARM target (which gets combined into LDRD or
LDM), but it's ARM-only. I think some logic needs to me moved into the
target-independent part of codegen.
On 15 July 2014 09:15, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> wrote:
> Basing on the following information from [this post] by James Molloy:
> * Our inline memcpy expansion pass is emitting "LDR q0, [..]; STR q0,
> [..]" pairs, which is less than ideal on A53. If we switched to
> emitting "LDP x0, x1, [..]; STP x0, x1, [..]", we'd get around 30%
> better inline memcpy performance on A53. A57 seems to deal well with
> the LDR q sequence.
> I've made a patch (attached) that does this for Cortex-A53. Please
> take a look at it.
> Best regards,
> 0: http://article.gmane.org/gmane.comp.compilers.llvm.devel/74269
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-commits