[LLVMdev] Pseudo load and store instructions for AArch64
Renato Golin
renato.golin at linaro.org
Tue Aug 26 15:45:36 PDT 2014
On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> wrote:
> It's needed to make code of inlined memcpy() more efficient.
Hi Sergey,
I was thinking about this and I remember seeing a similar problem to
yours in ARM. Something like:
ldr r1, [sp, #20]
ldr r2, [sp, #24]
ldr r3, [sp, #28]
being reordered to:
ldr r2, [sp, #24]
ldr r1, [sp, #20]
ldr r3, [sp, #28]
and having a big hit on performance.
The ARM back-end has the ARMLoadStoreOptimizer class, which deals with
similar problems and it's generally passed at the right time for
fixing loads and stores, maybe you could add a similar thing to
AArch64?
That'd have the benefit of not polluting the table-gen files, and
could be turned on via a flag, on demand, that only after heavily
tested, could be turned on by default.
James (cc'd) implemented the optimizer, maybe he could hint on some of
the issues for your particular case.
cheers,
--renato
More information about the llvm-dev
mailing list