[PATCH] [AArch64] Improve and enable the SeparateConstOffsetFromGEP for AArch64 backend.

Sun Oct 19 23:39:45 PDT 2014

Hi t.p.northover, jingyue,

Hi Tim, Jingyue and other reviewers,

This patch is based on http://reviews.llvm.org/D5863, which fixes some problems in the SeparateConstOffsetFromGEP pass. So please apply that patch first if you want to have a try. 

We find LLVM cannot handle CSE well in GEPs (getelementptrs). The GEP can be very complex, it can has many mixed constant indices and variable indices. The variable indices may also contain other constants. And such complex GEPs are usually kept in CodeGen. But as CodeGen can only see in one basic block. For the GEPs across basic blocks (e.g, two GEPs used in two different basic blocks or one GEP used in two basic blocks), it may have CSE opportunities and will be missed.

Currently there is a pass called SeparateConstOffsetFromGEP, which can separate constant within variable indices and split a complex GEP into two GEPs. But this is not enough for the problem that GEPs across basic blocks I mentioned above.

So I improve this pass. It will separate constant within indices for both sequential and struct types. And most important is that it will also transform complex GEPs into a "ptrtoint + arithmetic + inttoptr" form, so that it is able to find CSE opportunities across basic blocks. Also it can benefit address sinking logic in CodeGenPrepare for those complex GEPs which can not be matched by addressing mode, as part of the address calculation can still be sunk. The address sinking logic can result in better addressing mode. EarlyCSE pass is called after this pass to do CSE. LICM pass is also called to do loop invariant code motion in case any of the address calculations are invariant.

If we don't find CSE opportunities after such arithmetic  transformation, it still has no obvious regression on performance, as it will always do such transformation in CodeGen. We just move such transformation several passes ahead of CodeGen.

I tested the performance for A57 of AArch64 target on SPEC CPU2000 and SPEC CPU2006. It has no obvious regressions and has improvements on following benchmarks:
spec2006.473.astar     4.7%
spec2006.444.namd    3.0%
spec2006.445.gobmk  2.5%

For the benchmarks don't have obvious improvement, we can also see the address calculation and addressing mode are better from the assembly code.

For other targets like NVPTX, I can not test this patch. I think this patch can also benefit the performance, at least it has no regression.

Review please.

Thanks,
-Hao

http://reviews.llvm.org/D5864

Files:
  include/llvm/Transforms/Scalar.h
  lib/Target/AArch64/AArch64TargetMachine.cpp
  lib/Transforms/Scalar/SeparateConstOffsetFromGEP.cpp
  test/CodeGen/AArch64/aarch64-gep-opt.ll
  test/CodeGen/AArch64/arm64-addr-mode-folding.ll
  test/CodeGen/AArch64/arm64-cse.ll
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D5864.15133.patch
Type: text/x-patch
Size: 19250 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20141020/18b11c35/attachment.bin>