[PATCH] D130487: [PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types.
Nemanja Ivanovic via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Feb 24 04:36:42 PST 2023
nemanjai accepted this revision.
nemanjai added a comment.
This revision is now accepted and ready to land.
Although I have a fair number of comments, they're mostly stylistic comments that probably don't really require another revision. So LGTM and please address the comments prior to committing.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14989
+static bool isShuffleMaskInRange(SmallVectorImpl<int> &ShuffV, int HalfVec,
+ int LHSLastByteDefined,
----------------
I don't think we modify `ShuffV` so it should be a `const` reference.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14990-14991
+static bool isShuffleMaskInRange(SmallVectorImpl<int> &ShuffV, int HalfVec,
+ int LHSLastByteDefined,
+ int RHSLastByteDefined) {
+ for (int i = 0, elt = ShuffV.size(); i < elt; i++) {
----------------
These two values are actually the last *element* rather than *byte* aren't they? If so, please rename accordingly.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14997
+ // Handle first input vector of the vector_shuffle.
+ if (Index < HalfVec && LHSLastByteDefined >= 0) {
+ if (!(Index <= LHSLastByteDefined))
----------------
Why do we only check here if `LHSLastByteDefined >= 0` and similarly for the RHS below? Do we really want to pretend that a shuffle mask is in range if the last byte/element defined is undefined (i.e. presumably no bytes are defined)?
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15066-15067
+ // SCALAR_TO_VECTOR nodes.
+ unsigned LHSValidLaneWidth = HalfVec;
+ unsigned RHSValidLaneWidth = HalfVec;
----------------
The name `LaneWidth` is misleading here. I kept thinking the width (i.e. the number of bits) of the lane that contains a defined value. But it is actually the number of valid elements in the vector.
For a node:
```
(shuff (v4i32 s_to_v i32), arbitrary_v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 4
```
And for a node:
```
(shuff (v4i32 s_to_v i32), (bitcast (s_to_v i64), v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 2
```
If I'm interpreting it correctly, please rename them to something like `NumValidElts`.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15088
+ LHS.getValueType().getScalarSizeInBits();
+ int LHSScalarSize = SToVLHS.getValueType().getScalarSizeInBits();
+ LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
----------------
I think we should have an early exit here if the valid lane width is zero:
```
if (LHSValidLaneWidth == 0)
return false;
```
Since it is not really reasonable to do this transform if we are pulling in more bits than the original `scalar_to_vector` actually defined. Similarly with the RHS below.
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15089
+ int LHSScalarSize = SToVLHS.getValueType().getScalarSizeInBits();
+ LHSLastElt = LHSScalarSize / (ShuffleEltWidth + 1);
SToVLHS = getSToVPermuted(SToVLHS, DAG, Subtarget);
----------------
Nit: maybe a comment to make this clearer:
```
// The last element that comes from the LHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the LHS is actually 0, not 3
// because elements 1 and higher of a scalar_to_vector are undefined.
```
================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:15101
+ int RHSScalarSize = SToVRHS.getValueType().getScalarSizeInBits();
+ RHSLastElt = RHSScalarSize / (ShuffleEltWidth + 1) + RHSFirstElt;
SToVRHS = getSToVPermuted(SToVRHS, DAG, Subtarget);
----------------
Similar nit as above. A comment along the lines of:
```
// The last element that comes from the RHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the RHS is actually 5, not 7
// because elements 1 and higher of a scalar_to_vector are undefined.
// It is also not 4 because the original scalar_to_vector is wider and
// actually contains two i32 elements.
```
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D130487/new/
https://reviews.llvm.org/D130487
More information about the llvm-commits
mailing list