[PATCH] D105632: [LV] Use lookThroughAnd with logical reductions

Mon Jul 12 09:51:13 PDT 2021

dmgreen requested changes to this revision.
dmgreen added a comment.
This revision now requires changes to proceed.

I still don't think this code is making correct assumption about smin at least.

================
Comment at: llvm/test/Transforms/LoopVectorize/trunc-reductions.ll:186
+  %ext = zext i16 %load to i32
+  %icmp = icmp sgt i32 %sum.02, %ext
+  %min = select i1 %icmp, i32 %sum.02, i32 %ext
----------------
kmclaughlin wrote:
> dmgreen wrote:
> > Are we sure that smin/smax work correctly with this? We go from cutting off the top bits (potentially turning a signed number into an unsigned number) to not doing that.  Or maybe going from a signed min/max to a unsigned one, from the look of this test? I may be wrong, but I think something like this transform needs to be valid for the op: https://alive2.llvm.org/ce/z/5XJ4ZU.
> > 
> > umin/umax may still be OK, if the initial value from the phi is within range. But it doesn't reduce the vector width.
> I think the smax/smin tests are working correctly, as we are still performing an `and` on the phi before the load & compare instructions to mask the top bits. In this test for smax I was previously using an inital value for the phi which was smaller than the mask, which meant the `and` was removed by instcombine. I've changed the initial value to make sure we can check for the `and` in this test which hopefully makes it a bit clearer.
I'm still not sure this is correct. It's about the trunc(reduce.smin.v8i8(..)) performed on an i8, not an i32 like the original. A value with the '7th' bit set will change from being treated like a large positive value to a negative value. It seems that the code is assuming that the reduction can happen in the smaller type, even with signed min/max.

As for the initial value,
```
  %sum.02p = phi i32 [ %min, %for.body ], [ 256, %entry ]
  %sum.02 = and i32 %sum.02p, 255
```
with only one use on the phi is the same as using an input value of 0 (`= 0ff & 0x100`) from entry. I don't think any optimization will do that at the moment, but in principle it seems like it could.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105632/new/

https://reviews.llvm.org/D105632