[PATCH] D63624: [PowerPC] Exploit single instruction load-and-splat for word and doubleword

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 10 17:19:27 PDT 2019


nemanjai marked 9 inline comments as done.
nemanjai added inline comments.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:1771
 bool PPC::isSplatShuffleMask(ShuffleVectorSDNode *N, unsigned EltSize) {
-  assert(N->getValueType(0) == MVT::v16i8 &&
-         (EltSize == 1 || EltSize == 2 || EltSize == 4));
+  assert(N->getValueType(0) == MVT::v16i8 && isPowerOf2_32(EltSize) &&
+         EltSize <= 8 && "Can only handle 1,2,4,8 byte element sizes");
----------------
jsji wrote:
> Looks like you are repurposing this function: instead of using it just for 'VSPLT/VSPLTH/VSPLTW' , use it for `lxvdsx` as well.
> I don't think it is a great idea to just update the assert here.
> 
> We should either rename the function, or re-structure the code to two different functions, in another NFC patch?
> 
Updating the comment, I can see. But why do you not feel this function is adequate for 64-bit splats? The name is adequeate - we are looking for a splat shuffle mask. And we are in a legalized DAG which will make all shuffles `v16i8`.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8138
 
+bool sourceIsPlainLoad(SDValue Op, SDValue &InputLoad) {
+  if (Op.getOpcode() == ISD::BITCAST)
----------------
jsji wrote:
> We can't support indexed load as well, and we are also trying to return the `InputLoad`, 
> so maybe something like `getNormalLoadInput` and add comments about returning true for success?
This is a good point. I can actually make it return a `const SDValue *` so success is simply signaled by the returned value not being `nullptr`.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8142
+  if (Op.getOpcode() == ISD::SCALAR_TO_VECTOR)
+    Op = Op.getOperand(0);
+  if (Op.getOpcode() == ISD::LOAD) {
----------------
jsji wrote:
> Any other ISDs we can/should peek through? How about `ANY_EXTEND_VECTOR_INREG`/`EXTRACT_SUBVECTOR`? 
> 
I don't think we want to peek through any vector operations since we are looking for a scalar non-extending, non-indexed load.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8146
+    InputLoad = Op;
+    return !LD->isIndexed() && ISD::isNON_EXTLoad(LD);
+  }
----------------
jsji wrote:
> Why not just `LD->isNormalLoad()`?
I think you mean `ISD::isNormalLoad(LD)` and my answer is, there is no good reason. I first added the indexed check and realized later that I also have to ensure it is non-extending :)
I will definitely change it, thank you.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8278
+    if (DAG.isSplatValue(Op, true) &&
+        sourceIsPlainLoad(Op.getOperand(0), InputLoad)) {
+      LoadSDNode *LD = cast<LoadSDNode>(InputLoad);
----------------
jsji wrote:
> Do we need to check `hasOneUse` here as well?
I can certainly add it since this wouldn't be profitable if there are other uses of the load.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.cpp:8793
+    bool IsFourByte = PPC::isSplatShuffleMask(SVOp, 4);
+    int SplatIdx = PPC::getVSPLTImmediate(SVOp, IsFourByte ? 4 : 8, DAG);
+
----------------
jsji wrote:
> Similar to `isSplatShuffleMask`, we are repurposing `getVSPLTImmediate` as well, VSPLT* has only 3 forms (1/2/4), we should either rename this function, or use another wrapper.
Yeah, you're absolutely right. I should rename the function.
I initially left it alone as I thought that `VSPLT` adequately abbreviates `Vector Splat`, but since it is part of a mnemonic and capitalized, I agree with you that it is very misleading.


================
Comment at: lib/Target/PowerPC/PPCISelLowering.h:463
 
+      /// LD_SPLAT - a splatting load memory instruction (LXVDSX, LXVWSX).
+      LD_SPLAT,
----------------
jsji wrote:
> This ISD has chain, please update the comments to describe it.
Sure, I omitted it since all of these nodes have a chain and all these comments seem superfluous. But you're right, consistency is more important.


================
Comment at: lib/Target/PowerPC/PPCInstrVSX.td:62
+def SDT_PPCldsplat : SDTypeProfile<1, 1, [
+  SDTCisVec<0>, SDTCisSameAs<0, 1>
+]>;
----------------
jsji wrote:
> Why `SDTCisSameAS<0,1>`? Shouldn't it be `SDTCisPtrTy<1>`?
Wow, that's right. I neither know why I wrote it this way nor why table gen doesn't complain!


================
Comment at: test/CodeGen/PowerPC/load-and-splat.ll:3
+; RUN: llc -mcpu=pwr9 -ppc-asm-full-reg-names -ppc-vsr-nums-as-vr \
+; RUN:   -mtriple=powerpc64-unknown-unknown < %s | FileCheck %s \
+; RUN:   -check-prefix=P9
----------------
jsji wrote:
> Any specific reason that we want to use `powerpc64` instead of `powerpc64le` for `pwr9`?
I wanted at least one of them to be big endian and I figured why not the one that has more hits in the test case. I don't think we really need four RUN lines for P8/P9 and LE/BE.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D63624/new/

https://reviews.llvm.org/D63624





More information about the llvm-commits mailing list