[AArch64] A question about rematerialization of simple add
Jiangning Liu
liujiangning1 at gmail.com
Wed Jun 25 23:42:35 PDT 2014
Hi Tim,
For the following small case,
{code}
%X = type { i64, i64, i64 }
declare void @f(%X*)
define void @t() {
entry:
%tmp = alloca %X
call void @f(%X* %tmp)
call void @f(%X* %tmp)
ret void
}
{code}
We are generating AArch64 code like below
{code}
stp x20, x19, [sp, #-32]!
stp x29, x30, [sp, #16]
add x29, sp, #16 // =16
sub sp, sp, #32 // =32
.cfi_offset w20, -32
add x19, sp, #8 // =8
mov x0, x19
bl f
mov x0, x19
bl f
sub sp, x29, #16 // =16
ldp x29, x30, [sp, #16]
ldp x20, x19, [sp], #32
ret
{code}
x19 is used as a temp register, and accordingly we need to use a stack slot
to store/load this callee-saved register. Since this case is very small, it
seems to be able to be optimized like below,
{code}
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #32 // =32
add x0, sp, #8 // =8
bl f
add x0, sp, #8 // =8
bl f
mov sp, x29
ldp x29, x30, [sp], #16
ret
{code}
We can do this just because "add x19, sp #8" is very fast like "mov" and
can be rematerialized. So the following patch can make it work.
{code}
diff --git a/lib/Target/AArch64/AArch64InstrFormats.td
b/lib/Target/AArch64/AArch64InstrFormats.td
index 446149b..c8122a6 100644
--- a/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/lib/Target/AArch64/AArch64InstrFormats.td
@@ -1613,6 +1613,7 @@ class AddSubRegAlias<string asm, Instruction inst,
RegisterClass dstRegtype,
multiclass AddSub<bit isSub, string mnemonic,
SDPatternOperator OpNode = null_frag> {
let hasSideEffects = 0 in {
+ let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
// Add/Subtract immediate
def Wri : BaseAddSubImm<isSub, 0, GPR32sp, GPR32sp,
addsub_shifted_imm32,
mnemonic, OpNode> {
@@ -1622,6 +1623,7 @@ multiclass AddSub<bit isSub, string mnemonic,
mnemonic, OpNode> {
let Inst{31} = 1;
}
+ }
// Add/Subtract register - Only used for CodeGen
def Wrr : BaseAddSubRegPseudo<GPR32, OpNode>;
@@ -1680,6 +1682,7 @@ multiclass AddSub<bit isSub, string mnemonic,
multiclass AddSubS<bit isSub, string mnemonic, SDNode OpNode, string cmp> {
let isCompare = 1, Defs = [NZCV] in {
// Add/Subtract immediate
+ let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
def Wri : BaseAddSubImm<isSub, 1, GPR32, GPR32sp, addsub_shifted_imm32,
mnemonic, OpNode> {
let Inst{31} = 0;
@@ -1688,6 +1691,7 @@ multiclass AddSubS<bit isSub, string mnemonic, SDNode
OpNode, string cmp> {
mnemonic, OpNode> {
let Inst{31} = 1;
}
+ }
// Add/Subtract register
def Wrr : BaseAddSubRegPseudo<GPR32, OpNode>;
{code}
But ADDWri/ADDXri represents "Add (immediate) : Rd = Rn + shift(imm)" and
shift can be either #0 or #12. For add with "lsl #12", it may not be the
same cost as mov, so we would have to split Wri/Xri into two separate defs.
Since ADDWri/ADDXri are being used around the back-end code, I not sure if
this change meet your expectation and it seems not a very clean solution
for me as well. Or do you know is there any smarter method in td file to
dynamically assign isReMaterializable to 1 based on shift value?
Any idea?
Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140626/96c7dc46/attachment.html>
More information about the llvm-commits
mailing list