[PATCH] D44355: [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str

Sebastian Pop via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 12 08:51:30 PDT 2018


sebpop accepted this revision.
sebpop added a comment.
This revision is now accepted and ready to land.

I think this change is good.

Here is what happens on arm64-tls-execs.ll
before this patch, the dag after instruction selection looks like this:

  t0: ch = EntryToken
        t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
      t13: i64 = ADDXri t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], TargetConstant:i32<0>
    t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t13, TargetConstant:i64<0>, t0
  t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
  t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1

with the patch there is one less ADDXri that got folded into the load:

  t0: ch = EntryToken
      t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
    t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0
  t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
  t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1

That is because aarch64 has a pattern

  defm : ExtLoadTo32ROPat<ro8,  extloadi8,   LDRBBroW, LDRBBroX>;

to match

    t13: i64 = AArch64ISD::ADDlow t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98]
  t4: i32,ch = load<LD4[@local_exec_var](dereferenceable)> t0, t13, undef:i64

and transform that into:

  Morphed node: t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0

> I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.

I think it is impossible to specify a pattern to match a load with a machine node AArch64::ADDXri.
The LHS matching part of a def-pat should be a generic dag node.

The current patch avoids lowering the add into a machine node too early, and keeps the add as a generic addlow node, making the load+addlow ISEL pattern match.
If the addlow node is not folded into a load, it gets caught by the pseudo after regalloc and lowered into a machine node ADDXri.


Repository:
  rL LLVM

https://reviews.llvm.org/D44355





More information about the llvm-commits mailing list