[PATCH] D104855: [PowerPC] Change VSRpRC allocation order

Thu Jun 24 07:03:52 PDT 2021

nemanjai accepted this revision.
nemanjai added a comment.
This revision is now accepted and ready to land.

Thanks for fixing this. I think we should go ahead with this fix as it is certainly an improvement. However, there is still some work to do here. I think it would be useful to test a change where we allocate all volatile registers first for `VSFRC, VSSRC, VSRC, VSRpRC` (namely, we allocate `vs34-37, vs32-vs33, vs38-vs51, vs0-vs13, vs63-vs51, vs31-vs14`). Of course, the relative order of the low 32 vs. high 32 might need to be different for the different register classes, but in any case, perhaps go through all the volatiles before we dig into the non-volatiles.
Of course, it would be good if the register coalescer was smart enough that the order doesn't matter, but this is certainly an easier fix.

In any case, this particular change can go in after the nits are addressed.

================
Comment at: llvm/lib/Target/PowerPC/PPCRegisterInfo.td:471

-// Allocate in the same order as the underlying VSX registers.
 def VSRpRC :
   RegisterClass<"PPC", [v256i1], 128,
----------------
Please explain the strange allocation order with something like:
```
// Allocate in the same order as the underlying VSX/Altivec registers
// placing the Altivec ones first in order to reduce the interference
// with accumulator registers (that are overlaid over the 32 low
// VSX registers). This will reduce the number of copies when loading
// accumulator registers - which is a very common use for paired
// VSX registers.
```

================
Comment at: llvm/test/CodeGen/PowerPC/mma-outer-product.ll:81
+; CHECK-NEXT:    xxlor vs0, v2, v2
+; CHECK-NEXT:    vmr v1, v2
+; CHECK-NEXT:    xxlor vs1, v3, v3
----------------
Any idea why this sticks around? This and line 85 seem to be dead copies. Similar artifacts appear in other tests.

================
Comment at: llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll:13
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    stdu 1, -480(1)
-; CHECK-NEXT:    .cfi_def_cfa_offset 480
-; CHECK-NEXT:    .cfi_offset r14, -256
-; CHECK-NEXT:    .cfi_offset r15, -248
-; CHECK-NEXT:    .cfi_offset r16, -240
-; CHECK-NEXT:    .cfi_offset r17, -232
-; CHECK-NEXT:    .cfi_offset r18, -224
-; CHECK-NEXT:    .cfi_offset r19, -216
-; CHECK-NEXT:    .cfi_offset r20, -208
-; CHECK-NEXT:    .cfi_offset r21, -200
-; CHECK-NEXT:    .cfi_offset r22, -192
-; CHECK-NEXT:    .cfi_offset r23, -184
-; CHECK-NEXT:    .cfi_offset r24, -176
-; CHECK-NEXT:    .cfi_offset r25, -168
-; CHECK-NEXT:    .cfi_offset r26, -160
-; CHECK-NEXT:    .cfi_offset r27, -152
-; CHECK-NEXT:    .cfi_offset r28, -144
-; CHECK-NEXT:    .cfi_offset r29, -136
-; CHECK-NEXT:    .cfi_offset r30, -128
-; CHECK-NEXT:    .cfi_offset r31, -120
-; CHECK-NEXT:    .cfi_offset f18, -112
-; CHECK-NEXT:    .cfi_offset f19, -104
-; CHECK-NEXT:    .cfi_offset f20, -96
-; CHECK-NEXT:    .cfi_offset f21, -88
-; CHECK-NEXT:    .cfi_offset f22, -80
-; CHECK-NEXT:    .cfi_offset f23, -72
-; CHECK-NEXT:    .cfi_offset f24, -64
-; CHECK-NEXT:    .cfi_offset f25, -56
-; CHECK-NEXT:    .cfi_offset f26, -48
-; CHECK-NEXT:    .cfi_offset f27, -40
-; CHECK-NEXT:    .cfi_offset f28, -32
-; CHECK-NEXT:    .cfi_offset f29, -24
+; CHECK-NEXT:    stdu 1, -576(1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 576
----------------
This shows a side-effect of this change that should be mentioned in the commit message and maybe the code. Namely, if we end up allocating non-volatile registers, the first 12 underlying registers we allocate are 16 bytes rather than 8 bytes (since they're non-volatile Altivec registers rather than FP registers). This increases the size of stack frames for such functions.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104855/new/

https://reviews.llvm.org/D104855