[PATCH] D130487: [PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types.

Nemanja Ivanovic via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Feb 24 04:36:42 PST 2023


nemanjai accepted this revision.
nemanjai added a comment.
This revision is now accepted and ready to land.

Although I have a fair number of comments, they're mostly stylistic comments that probably don't really require another revision. So LGTM and please address the comments prior to committing.



================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14989
 
+static bool isShuffleMaskInRange(SmallVectorImpl<int> &ShuffV, int HalfVec,
+                                 int LHSLastByteDefined,
----------------
I don't think we modify `ShuffV` so it should be a `const` reference.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14990-14991
+static bool isShuffleMaskInRange(SmallVectorImpl<int> &ShuffV, int HalfVec,
+                                 int LHSLastByteDefined,
+                                 int RHSLastByteDefined) {
+  for (int i = 0, elt = ShuffV.size(); i < elt; i++) {
----------------
These two values are actually the last *element* rather than *byte* aren't they? If so, please rename accordingly.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14997
+    // Handle first input vector of the vector_shuffle.
+    if (Index < HalfVec && LHSLastByteDefined >= 0) {
+      if (!(Index <= LHSLastByteDefined))
----------------
Why do we only check here if `LHSLastByteDefined >= 0` and similarly for the RHS below? Do we really want to pretend that a shuffle mask is in range if the last byte/element defined is undefined (i.e. presumably no bytes are defined)?


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15066-15067
+    // SCALAR_TO_VECTOR nodes.
+    unsigned LHSValidLaneWidth = HalfVec;
+    unsigned RHSValidLaneWidth = HalfVec;
 
----------------
The name `LaneWidth` is misleading here. I kept thinking the width (i.e. the number of bits) of the lane that contains a defined value. But it is actually the number of valid elements in the vector.
For a node:
```
(shuff (v4i32 s_to_v i32), arbitrary_v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 4
```
And for a node:
```
(shuff (v4i32 s_to_v i32), (bitcast (s_to_v i64), v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 2
```

If I'm interpreting it correctly, please rename them to something like `NumValidElts`.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15088
+                          LHS.getValueType().getScalarSizeInBits();
+      int LHSScalarSize = SToVLHS.getValueType().getScalarSizeInBits();
+      LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
----------------
I think we should have an early exit here if the valid lane width is zero:
```
if (LHSValidLaneWidth == 0)
  return false;
```
Since it is not really reasonable to do this transform if we are pulling in more bits than the original `scalar_to_vector` actually defined. Similarly with the RHS below.


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15089
+      int LHSScalarSize = SToVLHS.getValueType().getScalarSizeInBits();
+      LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
       SToVLHS = getSToVPermuted(SToVLHS, DAG, Subtarget);
----------------
Nit: maybe a comment to make this clearer:
```
// The last element that comes from the LHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the LHS is actually 0, not 3
// because elements 1 and higher of a scalar_to_vector are undefined.
```


================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15101
+      int RHSScalarSize = SToVRHS.getValueType().getScalarSizeInBits();
+      RHSLastElt = RHSScalarSize / (ShuffleEltWidth + 1) + RHSFirstElt;
       SToVRHS = getSToVPermuted(SToVRHS, DAG, Subtarget);
----------------
Similar nit as above. A comment along the lines of:
```
// The last element that comes from the RHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the RHS is actually 5, not 7
// because elements 1 and higher of a scalar_to_vector are undefined.
// It is also not 4 because the original scalar_to_vector is wider and
// actually contains two i32 elements.
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D130487/new/

https://reviews.llvm.org/D130487



More information about the llvm-commits mailing list