[llvm] [PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types. (PR #80784)

Thu Apr 18 09:07:40 PDT 2024

================
@@ -15371,33 +15381,52 @@ SDValue PPCTargetLowering::combineVectorShuffle(ShuffleVectorSDNode *SVN,
     // the value into element zero. Since scalar size of LHS and RHS may differ
     // after isScalarToVec, this should be checked using their own sizes.
     if (SToVLHS) {
-      if (!IsLittleEndian && SToVLHS.getValueType().getScalarSizeInBits() >= 64)
+      int LHSScalarSize = SToVLHS.getValueType().getScalarSizeInBits();
+      if (!IsLittleEndian && LHSScalarSize >= 64)
         return Res;
       // Set up the values for the shuffle vector fixup.
-      LHSMaxIdx = NumEltsOut / NumEltsIn;
+      LHSNumValidElts =
+          LHSScalarSize / LHS.getValueType().getScalarSizeInBits();
+      // The last element that comes from the LHS. For example:
+      // (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
+      // The last element that comes from the LHS is actually 0, not 3
+      // because elements 1 and higher of a scalar_to_vector are undefined.
+      LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
----------------
amy-kwan wrote:

This is just for adjustment for the computation to ensure we get the correct last element.

For example,
```
          t36: i32,ch = load<(load (s32) from %ir.a)> t0, t2, undef:i64
        t37: v4i32 = scalar_to_vector t36
      t38: v16i8 = bitcast t37
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t39: i64,ch = load<(load (s64) from %ir.b)> t0, t4, undef:i64
        t40: v2i64 = scalar_to_vector t39
      t41: v16i8 = bitcast t40
    t17: v16i8 = vector_shuffle<0,1,16,17,18,19,20,21,22,23,u,u,u,u,u,u> t38, t41
```
For this, I believe we should have:
```
LHSFirstElt = 0
LHSLastElt = 3
RHSFirstElt = 16
RHSLastElt = 23
```
However, if we did `LHSLastElt = LHSScalarSize / (ShuffleEltWidth);`, we would have ` 32 / 8 = 4`, which isn't correct. If we did `32 / 8 + 1 = 32 / 9`, we would get 3, which is what we expect.

https://github.com/llvm/llvm-project/pull/80784