[PATCH] D44355: [AArch64] Fold adds with tprel_lo12_nc and secrel_lo12 into a following ldr/str
Sebastian Pop via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 12 08:51:30 PDT 2018
sebpop accepted this revision.
sebpop added a comment.
This revision is now accepted and ready to land.
I think this change is good.
Here is what happens on arm64-tls-execs.ll
before this patch, the dag after instruction selection looks like this:
t0: ch = EntryToken
t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
t13: i64 = ADDXri t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], TargetConstant:i32<0>
t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t13, TargetConstant:i64<0>, t0
t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1
with the patch there is one less ADDXri that got folded into the load:
t0: ch = EntryToken
t12: i64 = ADDXri MOVbaseTLS:i64, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=71], TargetConstant:i32<0>
t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0
t6: ch,glue = CopyToReg t0, Register:i32 $w0, t4
t7: ch = RET_ReallyLR Register:i32 $w0, t6, t6:1
That is because aarch64 has a pattern
defm : ExtLoadTo32ROPat<ro8, extloadi8, LDRBBroW, LDRBBroX>;
to match
t13: i64 = AArch64ISD::ADDlow t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98]
t4: i32,ch = load<LD4[@local_exec_var](dereferenceable)> t0, t13, undef:i64
and transform that into:
Morphed node: t4: i32,ch = LDRWui<Mem:LD4[@local_exec_var](dereferenceable)> t12, TargetGlobalTLSAddress:i64<i32* @local_exec_var> 0 [TF=98], t0
> I guess I could look at adding other code for matching an ADDXri machine node with LDR/STR, but I don't know if that has got other implications.
I think it is impossible to specify a pattern to match a load with a machine node AArch64::ADDXri.
The LHS matching part of a def-pat should be a generic dag node.
The current patch avoids lowering the add into a machine node too early, and keeps the add as a generic addlow node, making the load+addlow ISEL pattern match.
If the addlow node is not folded into a load, it gets caught by the pseudo after regalloc and lowered into a machine node ADDXri.
Repository:
rL LLVM
https://reviews.llvm.org/D44355
More information about the llvm-commits
mailing list