[PATCH] D91690: [LoopFlatten] Widen IV, cont'd

Thu Nov 19 04:33:13 PST 2020

dmgreen added inline comments.

================
Comment at: llvm/lib/Transforms/Scalar/LoopFlatten.cpp:563
+    if (FI.Widened)
+      if (Value *U = HasSZExtUser(V))
+        V = U;
----------------
SjoerdMeijer wrote:
> dmgreen wrote:
> > SjoerdMeijer wrote:
> > > dmgreen wrote:
> > > > SjoerdMeijer wrote:
> > > > > dmgreen wrote:
> > > > > > I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ?
> > > > > > 
> > > > > > Can you add a test where it (the o*I+i value) is not zext or sext?
> > > > > > I'm not sure I understand any more. Should we not be replacing it with trunc(OuterInductionPHI) ?
> > > > > 
> > > > > After widening we have e.g. this pattern:
> > > > > 
> > > > >   %indvar = phi i64 [ %indvar.next, %for.body3.us ], [ 0, %for.cond1.preheader.us ]
> > > > >   %3 = trunc i64 %indvar to i32
> > > > >   %add.us = add i32 %3, %mul.us
> > > > >   %idxprom.us = zext i32 %add.us to i64
> > > > > 
> > > > > The linear IV user in this example is:
> > > > > 
> > > > >   %add.us = add i32 %3, %mul.us
> > > > > 
> > > > > We don't want to be replacing this `%add.us` value which is a i32 value, because it will indeed be replaced by OuterInductionPhi, which is i64 value after widening. This was the assertion is was talking about. 
> > > > > After widening, the value that we should be replacing is zext user which is `%idxprom.us` in this case.
> > > > > After widening, we have this IV -> Trunc->LinearIV ->Ext pattern, which is what we are matching here.
> > > > > 
> > > > > I will add a comment to clarify this.
> > > > > 
> > > > > > Can you add a test where it (the o*I+i value) is not zext or sext?
> > > > > 
> > > > > I think these cases are present in the original test test/Transforms/LoopFlatten/loop-flatten.ll.
> > > > > 
> > > > > 
> > > > Hmm. But, don't we start with something that looks like:
> > > >   for i32 outer = 0..n
> > > >     for i32 inner = 0..m
> > > >       use(outer * m + inner)
> > > > We widen the IV's so they become:
> > > >   for i64 outer = 0..zext(n)
> > > >     for i64 inner = 0..zext(m)
> > > >       use(trunc(outer) * m + trunc(inner))
> > > > And we want to replace that with a single
> > > >   for i64 outer = 0..zext(n)*zext(m)
> > > >     use(trunc(outer))
> > > > We have not proved that the original did not overflow, so if it does we need to use the original truncated i32 value, not the i64 version of it directly.
> > > Do you mean overflow in the original
> > > 
> > >   outer * m + inner
> > > 
> > > Expression? Is that relevant?
> > > 
> > > Will check when I am back at my desk, but I think after widening we have:
> > > 
> > >   Use(outer)
> > > 
> > > Without the trunc.
> > Hmm. But Use in this case is an i32. We need to do something to outer (an i64) to get is back to an i32. This patch seems to be assuming that Use will be either a sext or a zext to the widened type. I think if it's not, it will still hit the assert (and if it is, would use the wrong value once `outer * m + inner` doesn't fit into the smaller type.)
> This pass is very restrictive in what it currently supports; it is pattern matching very specific patterns. If there are other users this pass will bail, for example 
> 
>   Found use of inner induction variable:   
>   Did not match expected pattern, bailing
> 
> or 
> 
>   Found use of outer induction variable
>   Did not match expected pattern, bailing
> 
> The assumptions are covered with checks. And when it comes to replacing values, we are safe because we are only replacing values in the loop update which have been widened.
What happens in the zext test if it is changed from:
  %arrayidx.us = getelementptr inbounds i16, i16* %A, i64 %idxprom.us
to
  %arrayidx.us = getelementptr inbounds i16, i16* %A, i32 %add.us

Also, consider a simpler example where we have:
  for i8 outer = 0..n
    for i8 inner = 0..m
      use(i32 zext(outer * m + inner))
If we widen the IV's to i32's for example, the call to use() should still get a value between [0..255], even if the n*m was higher than that (and so outer * m + inner overflows). It would wrap in the original, and still needs to wrap in the final version. For i32->i64 the values will be a lot higher, but the same principle applies.

I'm guessing that if it was a trunc(outer * m + inner) instead, the sext(trunc(..)) in the original case would not naturally simplify nicely?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91690/new/

https://reviews.llvm.org/D91690