[PATCH] D106555: [PowerPC] handle more splat loads

Thu Nov 4 05:08:34 PDT 2021

nemanjai added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:5837
+      EVT Type = N->getValueType(0);
+      if (Type == MVT::v16i8 || Type == MVT::v8i16) {
+        // v16i8 LD_SPLAT addr
----------------
Seems like all the code is also inside this block. Please flip the condition and early exit. You should really look for opportunities to do this and reduce the level of nesting.

I think it would also be useful to provide a much simpler code path for loads that are aligned at 16 bytes. In that case, all you need is `LVX + SPLAT` (where the splat index depends on endianness.

Example:
```
typedef signed short __attribute__((aligned(16))) AlignedShort;
vector signed short test(AlignedShort *Ptr) {
  return (vector signed short)*(Ptr + 3);
}
```

================
Comment at: llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp:5816
+    // this without using stack.
+    if (Subtarget->hasAltivec() && !Subtarget->hasDirectMove()) {
+      unsigned ZeroReg = Subtarget->isPPC64() ? PPC::ZERO8 : PPC::ZERO;
----------------
Sorry, I had this comment but didn't submit because I hand't finished the review. Please flip the condition and turn it into an early exit.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9080
+  if (Ty == MVT::v2i64) {
+    // check the extend type if the input is i32 while the output vector type is
+    // v2i64.
----------------
This is of course a minor nit, but please be mindful of the convention to make comments proper sentences with capitalization and punctuation.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9165

-      // We have handling for 4 and 8 byte elements.
-      unsigned ElementSize = LD->getMemoryVT().getScalarSizeInBits();
+      unsigned ElementSize = LD->getMemoryVT().getScalarSizeInBits() *
+                             ((NewOpcode == PPCISD::LD_SPLAT) ? 1 : 2);
----------------
This is not trivially obvious and therefore requires a comment. Something like:
```
// If the input load is an extending load, it will be an i32 -> i64
// extending load and isValidSplatLoad() will update NewOpcode.
```

Generally, I think this pattern is somewhat dangerous. It makes an assumption that a different function has specific behaviour that isn't clearly documented. There is little assurance that someone won't update `isValidSplatLoad()` to accept extending loads from `i16` to `i64`, etc. So I think you should add an assert here to ensure the types are what you expect them to be.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9176
+
+      // Execlude somes case where LD_SPLAT is worse than scalar_to_vector:
+      // Below cases should also happen for "lfiwzx/lfiwax + LE target + index
----------------
s/Execlude/Exclude

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9206
+      // these cases.
+      if (NumUsesOfInputLD == 1 &&
+          ((Op->getValueType(0) == MVT::v2i64 &&
----------------
jsji wrote:
> Can we split this into multiple early returns, with comments to corresponding ones.
> ```
>     //case 1 ...
>     if(...) 
>        return SDValue();
> 
>     //case 2 ...
>     if(...) 
>        return SDValue();
>     ...
> 
> ```
I agree with separating out the `LFIWAX/LFIWZX` case because the condition is very different. But for the `LXVRHX/LXVRBX`, the conditions are essentially the same. Those should be combined in an obvious way:
```
// case 2 - lxvr[bh]x
// 2.1: load result is at most 16 bits;
// 2.2: build a vector with above loaded value;
// 2.3: the vector has only one value at index 0, others are all undef;
// 2.4: on LE target, so that lxvr[bh]x does not need any permute.
if (NumUsesOfInputLD == 1 && Subtarget.isLittleEndian() &&
    Subtarget.isISA3_1() && Op->getValueType(0).getSizeInBits() <= 16)
```

================
Comment at: llvm/lib/Target/PowerPC/PPCInstrVSX.td:3556
+def : Pat<(v8i16 (PPCldsplat ForceXForm:$A)),
+          (v8i16 (VSPLTHs 3, (MTVSRD (INSERT_SUBREG (i64 (IMPLICIT_DEF)), (LHZX ForceXForm:$A), sub_32))))>;
+def : Pat<(v16i8 (PPCldsplat ForceXForm:$A)),
----------------
shchenz wrote:
> jsji wrote:
> > Do we need to predicate 64 bit before using `MTVSRD`?
> Thanks, this is a very good question. Yes, normally, we need to predicate 64 bit if MTVSRD emits any value in the high 32bits of the first double world.
> 
> However, for this case, seems we don't need the predicate, because the useful bits for this case are the lowest 16/8 bits, so even on the 32 bit arch, we still can ensure the lowest 16/8 bits are having the right value.
> 
> I add a comment to indicate why we don't need to predicate 64 bit.
I understand that we don't really care about the undefined bits because they'll be overwritten when we splat, but why not just use `MTVSRWZ` and not even need the comment?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106555/new/

https://reviews.llvm.org/D106555