[PATCH] D105632: [LV] Use lookThroughAnd with logical reductions

Tue Jul 13 01:02:01 PDT 2021

david-arm added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/trunc-reductions.ll:186
+  %ext = zext i16 %load to i32
+  %icmp = icmp sgt i32 %sum.02, %ext
+  %min = select i1 %icmp, i32 %sum.02, i32 %ext
----------------
dmgreen wrote:
> kmclaughlin wrote:
> > dmgreen wrote:
> > > Are we sure that smin/smax work correctly with this? We go from cutting off the top bits (potentially turning a signed number into an unsigned number) to not doing that.  Or maybe going from a signed min/max to a unsigned one, from the look of this test? I may be wrong, but I think something like this transform needs to be valid for the op: https://alive2.llvm.org/ce/z/5XJ4ZU.
> > > 
> > > umin/umax may still be OK, if the initial value from the phi is within range. But it doesn't reduce the vector width.
> > I think the smax/smin tests are working correctly, as we are still performing an `and` on the phi before the load & compare instructions to mask the top bits. In this test for smax I was previously using an inital value for the phi which was smaller than the mask, which meant the `and` was removed by instcombine. I've changed the initial value to make sure we can check for the `and` in this test which hopefully makes it a bit clearer.
> I'm still not sure this is correct. It's about the trunc(reduce.smin.v8i8(..)) performed on an i8, not an i32 like the original. A value with the '7th' bit set will change from being treated like a large positive value to a negative value. It seems that the code is assuming that the reduction can happen in the smaller type, even with signed min/max.
> 
> As for the initial value,
> ```
>   %sum.02p = phi i32 [ %min, %for.body ], [ 256, %entry ]
>   %sum.02 = and i32 %sum.02p, 255
> ```
> with only one use on the phi is the same as using an input value of 0 (`= 0ff & 0x100`) from entry. I don't think any optimization will do that at the moment, but in principle it seems like it could.
Hi @dmgreen, ok I think I understand your concerns now with specific reference to the intrinsic after the loop. It wasn't obvious from your initial comment that's all. :) You're specifically worried not about the vector body (which I'm sure is correct), but the ordering of trunc and reduce.smin.v8i8. So basically for this to be correct it needs a minor tweak so that we do:

  trunc(reduce.smin.v8i32) -> i8

instead, right? I think that would ensure we get the right answer.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105632/new/

https://reviews.llvm.org/D105632