[LLVMdev] Pseudo load and store instructions for AArch64

Renato Golin renato.golin at linaro.org
Tue Aug 26 15:45:36 PDT 2014


On 22 August 2014 13:44, Sergey Dmitrouk <sdmitrouk at accesssoftek.com> wrote:
> It's needed to make code of inlined memcpy() more efficient.

Hi Sergey,

I was thinking about this and I remember seeing a similar problem to
yours in ARM. Something like:

  ldr r1, [sp, #20]
  ldr r2, [sp, #24]
  ldr r3, [sp, #28]

being reordered to:

  ldr r2, [sp, #24]
  ldr r1, [sp, #20]
  ldr r3, [sp, #28]

and having a big hit on performance.

The ARM back-end has the ARMLoadStoreOptimizer class, which deals with
similar problems and it's generally passed at the right time for
fixing loads and stores, maybe you could add a similar thing to
AArch64?

That'd have the benefit of not polluting the table-gen files, and
could be turned on via a flag, on demand, that only after heavily
tested, could be turned on by default.

James (cc'd) implemented the optimizer, maybe he could hint on some of
the issues for your particular case.

cheers,
--renato



More information about the llvm-dev mailing list