[PATCH] D15559: [SCEVExpander] Make findExistingExpansion smarter

Wed Jan 27 18:00:43 PST 2016

mzolotukhin added inline comments.

================
Comment at: lib/Transforms/Utils/LoopUnrollRuntime.cpp:346-347
@@ -343,4 +345,4 @@
   //  extra iterations = run-time trip count % (loop unroll factor + 1)
   Value *TripCount = Expander.expandCodeFor(TripCountSC, TripCountSC->getType(),
                                             PreHeaderBR);
   Value *BECount = Expander.expandCodeFor(BECountSC, BECountSC->getType(),
----------------
sanjoy wrote:
> mzolotukhin wrote:
> > We can wait for Sanjoy's opinion too, but the patch can't be landed without this issue fixed first anyway.
> > 
> > I slightly modified your testcase to expose the issue:
> > ```
> > define i32 @test2(i64* %loc, i64 %conv7) {
> > entry:
> >   %rem0 = load i64, i64* %loc, align 8
> >   %ExpensiveComputation = udiv i64 %rem0, 42 ; <<< Extra computations are added to the trip-count expression
> >   br label %bb1
> > bb1:
> >   %div11 = udiv i64 %ExpensiveComputation, %conv7
> >   %cmp.i38 = icmp ugt i64 %div11, 1
> >   %div12 = select i1 %cmp.i38, i64 %div11, i64 1
> >   br label %for.body
> > for.body:
> >   %rem1 = phi i64 [ %rem0, %bb1 ], [ %rem2, %for.body ]
> >   %k1 = phi i64 [ %div12, %bb1 ], [ %dec, %for.body ]
> >   %mul1 = mul i64 %rem1, 48271
> >   %rem2 = urem i64 %mul1, 2147483647
> >   %dec = add i64 %k1, -1
> >   %cmp = icmp eq i64 %dec, 0
> >   br i1 %cmp, label %exit, label %for.body
> > exit:
> >   %rem3 = phi i64 [ %rem2, %for.body ]
> >   store i64 %rem3, i64* %loc, align 8
> >   ret i32 0
> > }
> > ```
> > 
> > If we apply the patch and run this test thru `opt`, the loop will be successfully unrolled, *but* we'll generate the code for "expensive" computations. This is IR after unrolling:
> > ```
> > ...
> > entry:
> >   %rem0 = load i64, i64* %loc, align 8
> >   %ExpensiveComputation = udiv i64 %rem0, 42
> >   br label %bb1
> > 
> > bb1:                                              ; preds = %entry
> >   %div11 = udiv i64 %ExpensiveComputation, %conv7
> >   %cmp.i38 = icmp ugt i64 %div11, 1
> >   %div12 = select i1 %cmp.i38, i64 %div11, i64 1
> >   %0 = udiv i64 %rem0, 42      ; <<<< This is our %ExpensiveComputation, and we just duplicated it!
> >   %1 = udiv i64 %0, %conv7     ; <<<
> >   %2 = icmp ugt i64 %1, 1
> >   %umax = select i1 %2, i64 %1, i64 1
> >   %3 = add i64 %umax, -1
> >   %xtraiter = and i64 %umax, 7
> >   %lcmp.mod = icmp ne i64 %xtraiter, 0
> >   br i1 %lcmp.mod, label %for.body.prol, label %bb1.split
> > ...
> > ```
> > Please note inserted instruction `%0` and `%1` - they're completely redundant (as we compute the same value, that we have in `%div11`).
> > 
> > So, as I said, it's not sufficient to just fix `findExistingExpransion`, we also need to fix `SCEVExpander::expand`.
> I think the original idea was that if we know that a certain value is already being computed, then we can emit code to compute it again, and rely on later passes to get rid of the redundancy (so in your example above we'd expect `%0` and `%1` to be CSE'ed with `%ExpensiveComputation` and `%div11`).
> 
> But I understand your concern that this is a pessimization unless we run some cleanup passes afterwards, and even then we rely on the cleanup passes being smart enough to undo our mess.  Perhaps `SCEVExpander::expand` can be fixed in a separate change (since the scope of that fix is bigger than what is addressed here) that goes in before this one?
My example was pretty simple, just to demonstrate the problem, and in this case indeed a simple clean-up would be sufficient to remove the introduced redundancy. But it's easy to construct a more complicated test, which existing clean-up passes might have problems to deal with, and since SCEV expressions might be very complicated, expanding all of them might be not a good idea, especially taking into account that some of them are actually expensive.

As for separating the patches - I'm entirely for it, but I just want to first make sure that there are no fundamental issues that can prevent us from fixing `expand`. My biggest concern is that this change leads to somewhat inconsistent state, where `findExistingExpansion` and `expand` have different assumptions. I'll feel much easier about it once I see the `expand` part of the patch.


http://reviews.llvm.org/D15559