[PATCH] D77448: [PowerPC] Canonicalize shuffles to match more single-instruction masks on LE

Mon Jun 8 19:55:48 PDT 2020

nemanjai marked 5 inline comments as done.
nemanjai added inline comments.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:9617
+      // the base pointer. This happens with (splat (s_to_v_permuted (ld))).
+      if (LD->getMemoryVT().getSizeInBits() <= (IsFourByte ? 32 : 64))
+        Offset = 0;
----------------
RolandF wrote:
> I don't think it is safe to use a load and splat for a smaller than word sized load.  The larger load might now cross a page boundary.
I think this code is a bit confusing because of the `<=`. I actually don't really know why I used `<=` rather than `==` since the memory value type can never be narrower than the splat width since we are checking for normal (unindexed, non-extending) loads.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14037
+  // Find first non-undef input.
+  for (int i = 0, e = Op.getNumOperands(); i < e; i++) {
+    FirstOp = Op.getOperand(i);
----------------
steven.zhang wrote:
> lei wrote:
> > Can this and the for-loop below be a range based for-loop?
> > ```
> > for (auto FirstOp : Op->op_values())
> >   if (!FirstOp.isUndef())
> >     break;
> > ```
> Can we use llvm::any_of to make the code more compact ?
How? I need to find the first operand that is *not* `undef`.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14037
+  // Find first non-undef input.
+  for (int i = 0, e = Op.getNumOperands(); i < e; i++) {
+    FirstOp = Op.getOperand(i);
----------------
nemanjai wrote:
> steven.zhang wrote:
> > lei wrote:
> > > Can this and the for-loop below be a range based for-loop?
> > > ```
> > > for (auto FirstOp : Op->op_values())
> > >   if (!FirstOp.isUndef())
> > >     break;
> > > ```
> > Can we use llvm::any_of to make the code more compact ?
> How? I need to find the first operand that is *not* `undef`.
But then `FirstOp` is out of scope after the loop and we need it.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.cpp:14077
+  EVT VT = OrigSToV.getValueType();
+  SDValue SToVPerm =
+      DAG.getNode(PPCISD::S_TO_V_PERMUTED, dl, VT, OrigSToV.getOperand(0));
----------------
RolandF wrote:
> This code is only used on the early, conditional returns, and not on the final return.  The function could be refactored such that the final return case is moved first, or the early returns could return the result of a function that generates the instruction, to prevent unused code.
I am not sure this would be an improvement. The early exit conditions are the common case. The final return is only for the pattern:
`(<n x Ty> (scalar_to_vector (Ty (extract_elt <n x Ty> %a, C))))`.

I'll refactor it as per your suggestion and if it is more readable we'll go with that.
Unused nodes aren't really a problem - the SDAG will just get rid of them, but I agree that it is less than ideal to create nodes that will just be discarded.

================
Comment at: llvm/lib/Target/PowerPC/PPCISelLowering.h:228
+    /// place (i.e. does not involve a swap to put it at element zero on LE).
+    S_TO_V_PERMUTED,
+
----------------
RolandF wrote:
> lei wrote:
> > Maybe consider using existing naming used for scalar and vector ISD nodes:  SCALAR_TO_VEC_PERMUTED
> Can we rename this opcode?  S and V are too short to have meaning and permute is too general.  Maybe SCALAR_TO_VECTOR_BE or _RIGHT or _UPPER or something?
`SCALAR_TO_VECTOR_BE` is close but the problem is that `SCALAR_TO_VECTOR` assumes the scalar is placed into vector element zero. However, (almost) none of the instructions we have actually put the scalar into vector element zero for either endianness. The value always goes to the least significant portion of the most significant doubleword. It is very hard to encapsulate that level of weirdness in a name.

I think I will opt for either `SCALAR_TO_VECTOR_PERMUTED` or `SCALAR_TO_VECTOR_PPC` with a more detailed comments:

```
/// PowerPC instructions that have SCALAR_TO_VECTOR semantics tend to
/// place the value into the least significant element of the most significant
/// doubleword in the vector. This is not element zero for anything smaller
/// than a doubleword on either endianness. This node has the same semantics
/// as SCALAR_TO_VECTOR except that the value remains in the
/// aforementioned location in the vector register.
```

If you have a preference for either of those, please let me know.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77448/new/

https://reviews.llvm.org/D77448