[PATCH] D104855: [PowerPC] Change VSRpRC allocation order

Sun Jun 27 22:54:03 PDT 2021

qiucf marked an inline comment as done.
qiucf added inline comments.

================
Comment at: llvm/test/CodeGen/PowerPC/mma-intrinsics.ll:27
 ; CHECK-BE:       # %bb.0: # %entry
-; CHECK-BE-NEXT:    xxlor vs1, v2, v2
-; CHECK-BE-NEXT:    xxlor vs0, vs1, vs1
-; CHECK-BE-NEXT:    xxlor vs4, vs0, vs0
-; CHECK-BE-NEXT:    xxlor vs5, vs1, vs1
-; CHECK-BE-NEXT:    xxlor vs6, vs0, vs0
-; CHECK-BE-NEXT:    xxlor vs7, vs1, vs1
-; CHECK-BE-NEXT:    xxlor vs0, vs4, vs4
-; CHECK-BE-NEXT:    xxlor vs1, vs5, vs5
-; CHECK-BE-NEXT:    xxlor vs2, vs6, vs6
-; CHECK-BE-NEXT:    xxlor vs3, vs7, vs7
+; CHECK-BE-NEXT:    vmr v3, v2
+; CHECK-BE-NEXT:    xxlor vs0, v2, v2
----------------
shchenz wrote:
> Why need to copy to `v3`? I think we can also use `v2` for follow 4 `xxlor`?
Yes. This extra copy is from below:

```
undef %5.sub_vsx1:vsrprc = COPY $v2 <--
%0:g8rc_and_g8rc_nox0 = COPY $x3
%5.sub_vsx0:vsrprc = COPY %5.sub_vsx1:vsrprc
%5:vsrprc = KILL_PAIR %5:vsrprc(tied-def 0)
undef %8.sub_pair0:uaccrc = COPY %5:vsrprc
%8.sub_pair1:uaccrc = COPY %5:vsrprc
%11:accrc = BUILD_UACC %8:uaccrc
%11:accrc = XXMTACC %11:accrc(tied-def 0)
%11:accrc = XXMFACC %11:accrc(tied-def 0)
```

Here the copy cannot be eliminated since we specified that the assigned register is `sub_vsx1` (VSX register with odd number). We need to fix how MMA intrinsics are expanded.

================
Comment at: llvm/test/CodeGen/PowerPC/mma-outer-product.ll:81
+; CHECK-NEXT:    xxlor vs0, v2, v2
+; CHECK-NEXT:    vmr v1, v2
+; CHECK-NEXT:    xxlor vs1, v3, v3
----------------
shchenz wrote:
> nemanjai wrote:
> > Any idea why this sticks around? This and line 85 seem to be dead copies. Similar artifacts appear in other tests.
> +1, there are some unnecessary `vmr` after this change. Maybe we need to have a look at why. Before the change, there are no such vector copies.
This copy is not due to this patch. It's just scheduled up. `pmxvf64gernn acc0, vsp32, v2, 0, 0` used it. But yes, it can be eliminated. I'll follow it up

================
Comment at: llvm/test/CodeGen/PowerPC/more-dq-form-prepare.ll:13
 ; CHECK:       # %bb.0: # %entry
-; CHECK-NEXT:    stdu 1, -480(1)
-; CHECK-NEXT:    .cfi_def_cfa_offset 480
-; CHECK-NEXT:    .cfi_offset r14, -256
-; CHECK-NEXT:    .cfi_offset r15, -248
-; CHECK-NEXT:    .cfi_offset r16, -240
-; CHECK-NEXT:    .cfi_offset r17, -232
-; CHECK-NEXT:    .cfi_offset r18, -224
-; CHECK-NEXT:    .cfi_offset r19, -216
-; CHECK-NEXT:    .cfi_offset r20, -208
-; CHECK-NEXT:    .cfi_offset r21, -200
-; CHECK-NEXT:    .cfi_offset r22, -192
-; CHECK-NEXT:    .cfi_offset r23, -184
-; CHECK-NEXT:    .cfi_offset r24, -176
-; CHECK-NEXT:    .cfi_offset r25, -168
-; CHECK-NEXT:    .cfi_offset r26, -160
-; CHECK-NEXT:    .cfi_offset r27, -152
-; CHECK-NEXT:    .cfi_offset r28, -144
-; CHECK-NEXT:    .cfi_offset r29, -136
-; CHECK-NEXT:    .cfi_offset r30, -128
-; CHECK-NEXT:    .cfi_offset r31, -120
-; CHECK-NEXT:    .cfi_offset f18, -112
-; CHECK-NEXT:    .cfi_offset f19, -104
-; CHECK-NEXT:    .cfi_offset f20, -96
-; CHECK-NEXT:    .cfi_offset f21, -88
-; CHECK-NEXT:    .cfi_offset f22, -80
-; CHECK-NEXT:    .cfi_offset f23, -72
-; CHECK-NEXT:    .cfi_offset f24, -64
-; CHECK-NEXT:    .cfi_offset f25, -56
-; CHECK-NEXT:    .cfi_offset f26, -48
-; CHECK-NEXT:    .cfi_offset f27, -40
-; CHECK-NEXT:    .cfi_offset f28, -32
-; CHECK-NEXT:    .cfi_offset f29, -24
+; CHECK-NEXT:    stdu 1, -576(1)
+; CHECK-NEXT:    .cfi_def_cfa_offset 576
----------------
jsji wrote:
> nemanjai wrote:
> > This shows a side-effect of this change that should be mentioned in the commit message and maybe the code. Namely, if we end up allocating non-volatile registers, the first 12 underlying registers we allocate are 16 bytes rather than 8 bytes (since they're non-volatile Altivec registers rather than FP registers). This increases the size of stack frames for such functions.
> Maybe we should provide alternative allocation order if user care about stack frame size ? Or even better if we can detect the usage of ACC or other potential conflicts before choosing this alternative allocation order? But yeah, that can be a FIXME and follow up.
Yes. Maybe we can provide an option to control it first and then work on to resolve it properly.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104855/new/

https://reviews.llvm.org/D104855