[PATCH] D45522: [PowerPC] fix incorrect vectorization of abs() on POWER9

Tue Apr 17 17:40:25 PDT 2018

nemanjai added a comment.

I imagine the constant materialization and splatting direct moves would not cost anything additional to what I suggested in an amortized sense (i.e. if LICM takes them out of the loop). However, I think it's still useful to produce code with lower path length.

Also, I can't really think of a better way to do it for the halfword version. I imagine there isn't a better way.
Finally for consistency, I would just use `vxor` instead of the various add opcodes. I think that is semantically equivalent to the modulo adds (correct me if I'm wrong here).

================
Comment at: lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4805
+    if (N->getOperand(0).getOpcode() == ISD::SUB) {
+      if (N->getOperand(0)->getOperand(0).getOpcode() == ISD::ZERO_EXTEND &&
+          N->getOperand(0)->getOperand(1).getOpcode() == ISD::ZERO_EXTEND)
----------------
Shouldn't this check `ZERO_EXTEND_VECTOR_INREG` as well? Or is that a node we can't have this late?

================
Comment at: lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4810
+
+    if (VecVT == MVT::v4i32) {
+      AbsOpcode = PPC::VABSDUW;
----------------
It seems that for the `v4i32` type, we should be able to just use `xvnegsp` rather than loading the immediate, moving and adding.

================
Comment at: lib/Target/PowerPC/PPCISelDAGToDAG.cpp:4828
+    }
+    else if (VecVT == MVT::v16i8) {
+      AbsOpcode = PPC::VABSDUB;
----------------
We should just be able to do something like:
```
xxspltib 35, 128 # Mask
vxor 0, 3, 2    # Flip sign
vabsduh ...      # The actual absdiff
```

https://reviews.llvm.org/D45522