[AArch64] A question about rematerialization of simple add

Wed Jun 25 23:42:35 PDT 2014

Hi Tim,

For the following small case,

{code}
%X = type { i64, i64, i64 }

declare void @f(%X*)
define void @t() {
entry:
  %tmp = alloca %X
  call void @f(%X* %tmp)
  call void @f(%X* %tmp)
  ret void
}
{code}

We are generating AArch64 code like below

{code}
stp x20, x19, [sp, #-32]!
stp x29, x30, [sp, #16]
add x29, sp, #16            // =16
sub sp, sp, #32             // =32
.cfi_offset w20, -32
add x19, sp, #8             // =8
mov x0, x19
bl f
mov x0, x19
bl f
sub sp, x29, #16            // =16
ldp x29, x30, [sp, #16]
ldp x20, x19, [sp], #32
ret
{code}

x19 is used as a temp register, and accordingly we need to use a stack slot
to store/load this callee-saved register. Since this case is very small, it
seems to be able to be optimized like below,

{code}
stp x29, x30, [sp, #-16]!
mov x29, sp
sub sp, sp, #32             // =32
add x0, sp, #8              // =8
bl f
add x0, sp, #8              // =8
bl f
mov sp, x29
ldp x29, x30, [sp], #16
ret
{code}

We can do this just because "add x19, sp #8" is very fast like "mov" and
can be rematerialized. So the following patch can make it work.

{code}

diff --git a/lib/Target/AArch64/AArch64InstrFormats.td
b/lib/Target/AArch64/AArch64InstrFormats.td
index 446149b..c8122a6 100644
--- a/lib/Target/AArch64/AArch64InstrFormats.td
+++ b/lib/Target/AArch64/AArch64InstrFormats.td
@@ -1613,6 +1613,7 @@ class AddSubRegAlias<string asm, Instruction inst,
RegisterClass dstRegtype,
 multiclass AddSub<bit isSub, string mnemonic,
                   SDPatternOperator OpNode = null_frag> {
   let hasSideEffects = 0 in {
+  let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
   // Add/Subtract immediate
   def Wri  : BaseAddSubImm<isSub, 0, GPR32sp, GPR32sp,
addsub_shifted_imm32,
                            mnemonic, OpNode> {
@@ -1622,6 +1623,7 @@ multiclass AddSub<bit isSub, string mnemonic,
                            mnemonic, OpNode> {
     let Inst{31} = 1;
   }
+  }

   // Add/Subtract register - Only used for CodeGen
   def Wrr : BaseAddSubRegPseudo<GPR32, OpNode>;
@@ -1680,6 +1682,7 @@ multiclass AddSub<bit isSub, string mnemonic,
 multiclass AddSubS<bit isSub, string mnemonic, SDNode OpNode, string cmp> {
   let isCompare = 1, Defs = [NZCV] in {
   // Add/Subtract immediate
+  let isReMaterializable = 1, isAsCheapAsAMove = 1 in {
   def Wri  : BaseAddSubImm<isSub, 1, GPR32, GPR32sp, addsub_shifted_imm32,
                            mnemonic, OpNode> {
     let Inst{31} = 0;
@@ -1688,6 +1691,7 @@ multiclass AddSubS<bit isSub, string mnemonic, SDNode
OpNode, string cmp> {
                            mnemonic, OpNode> {
     let Inst{31} = 1;
   }
+  }

   // Add/Subtract register
   def Wrr : BaseAddSubRegPseudo<GPR32, OpNode>;
{code}

But ADDWri/ADDXri represents "Add (immediate) : Rd = Rn + shift(imm)" and
shift can be either #0 or #12. For add with "lsl #12", it may not be the
same cost as mov, so we would have to split Wri/Xri into two separate defs.
Since ADDWri/ADDXri are being used around the back-end code, I not sure if
this change meet your expectation and it seems not a very clean solution
for me as well. Or do you know is there any smarter method in td file to
dynamically assign isReMaterializable to 1 based on shift value?

Any idea?

Thanks,
-Jiangning
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140626/96c7dc46/attachment.html>