[PATCH] D81076: [PowerPC] Custom lower rotl v1i128 to vector_shuffle.

Mon Jun 8 00:30:06 PDT 2020

Esme marked 7 inline comments as done.
Esme added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9622
+  if (Op.getValueType() != MVT::v1i128)
+    return SDValue();
+  SDLoc dl(Op);
----------------
shawnl wrote:
> steven.zhang wrote:
> > It will have problem if return SDValue() when lower the ROTL. I would change it as assertion.
> `assert(Op.getValueType() == MVT::v1i128 && "unexpected MVP type")`
Thx~ assertions have been added.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9624
+  SDLoc dl(Op);
+  SDValue N0 = Op.getOperand(0);
+  SDValue N1 = peekThroughBitcasts(Op.getOperand(1));
----------------
lkail wrote:
> I suppose we still need to peekThroughBitcasts Op0.
Thanks for reminding.

================
Comment at: llvm/test/CodeGen/PowerPC/pr45628.ll:142
 ; CHECK-NOVSX:       # %bb.0: # %entry
-; CHECK-NOVSX-NEXT:    addis r3, r2, .LCPI7_0 at toc@ha
-; CHECK-NOVSX-NEXT:    addi r3, r3, .LCPI7_0 at toc@l
-; CHECK-NOVSX-NEXT:    lvx v3, 0, r3
-; CHECK-NOVSX-NEXT:    addis r3, r2, .LCPI7_1 at toc@ha
-; CHECK-NOVSX-NEXT:    addi r3, r3, .LCPI7_1 at toc@l
-; CHECK-NOVSX-NEXT:    vslo v4, v2, v3
-; CHECK-NOVSX-NEXT:    vspltb v3, v3, 15
-; CHECK-NOVSX-NEXT:    vsl v3, v4, v3
-; CHECK-NOVSX-NEXT:    lvx v4, 0, r3
-; CHECK-NOVSX-NEXT:    vsro v2, v2, v4
-; CHECK-NOVSX-NEXT:    vspltb v4, v4, 15
-; CHECK-NOVSX-NEXT:    vsr v2, v2, v4
-; CHECK-NOVSX-NEXT:    vor v2, v3, v2
+; CHECK-NOVSX-NEXT:    addi r3, r1, -32
+; CHECK-NOVSX-NEXT:    stvx v2, 0, r3
----------------
shawnl wrote:
> steven.zhang wrote:
> > lkail wrote:
> > > New sequence looks getting more memory ops than original one.
> > I believe they are using to move the VSR to two GPR. And yes, we need to take a more look to see if there is any better way to handle it.
> Is there any way to only lower to scalar if the value does not need to be in a vector register like it does here? Also, why does the vector code here look better than the version in my PR?:
> ```
> 
> swap_with_shift:                        # @swap_with_shift
>         xxspltd 35, 34, 1
>         xxswapd 34, 34
>         xxlxor 0, 0, 0 <=== the version on left does not need this xor
>         xxpermdi 35, 35, 0, 1
>         xxpermdi 34, 0, 34, 1
>         xxlor 34, 35, 34
>         blr
> ```
> Is there any way to only lower to scalar if the value does not need to be in a vector register like it does here?
I will take a look into it, thx.
> why does the vector code here look better than the version in my PR?
Because the llc option -mcpu=pwr9 was added.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D81076/new/

https://reviews.llvm.org/D81076