[PATCH] IndVarSimplify: Don't let LFTR compare against a poison value

David Majnemer david.majnemer at gmail.com
Fri Sep 5 09:20:41 PDT 2014


On Fri, Sep 5, 2014 at 11:39 AM, Moritz Roth <moritz.roth at arm.com> wrote:

> Hi all,
>
> We've been seeing some performance regressions in our internal AArch64
> tests due to this patch. I'll preface this by saying that I'm not an expert
> in this matter at all so take everything I say with a grain of salt :)
>
> I've reduced the test case (this one is from LNT) to what's attached
> ({F154646}). Cleaned up, the IR looks something like this:
>
> ```
> for.cond1.preheader:                              ; preds = %middle.block,
> %entry
>   %indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18,
> %middle.block ]
>   %indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1
>   ...
>
> middle.block:                                     ; preds = %vector.body
>   %exitcond19 = icmp eq i64 %indvars.iv17, 255
>   br i1 %exitcond19, label %for.end10, label %for.cond1.preheader
> ```
> Using -target aarch64-linux-gnu -mcpu=cortex-a57 -O3, this compiles to the
> following assembly, also shortened:
>
> ```
> .LBB0_1:                                // %for.cond1.preheader
>                                         // =>This Loop Header: Depth=1
>                                         //     Child Loop BB0_2 Depth 2
>         add     x11, x10, #1            // =1
>         ...
> // BB#3:                                // %middle.block
>                                         //   in Loop: Header=BB0_1 Depth=1
>         cmp      x10, #255              // =255
>         mov      x10, x11
>         b.ne    .LBB0_1
> ```
> It seems like the exact problem the patch addresses occurs: since the
> calculation for the next indvar has NUW/NSW flags set, it can't be used to
> check the exit condition. This leads to an extra register being used for
> the induction variable.
>
> >>! In D5174#13, @atrick wrote:
> > I'll say LGTM because fixing a bug is an improvement. This is
> unfortunate though because LFTR does not care about overflow and we don't
> want to pessimize code just because SCEV was able to infer NSW. The problem
> is that we can't express that NSW applies to some uses an not others.
>
> Am I right in thinking that the behaviour in this specific test case is
> too conservative as a result of the patch? (An i64 won't overflow, since we
> know the trip count is only 256.)
>

A low trip count isn't enough, overflow may depend on where the IV starts
counting from.

The example that this fixes has a trip count of 13, the SCEV
is {-13,+,1}<nuw><nsw><%for.cond2.preheader>. On the very last iteration of
the loop, the *next* indvar will be poison because both NUW and NSW will be
violated.

I'll take a look.


>
> If yes, do you have any suggestions on how we could fix this? I'm assuming
> this might also affect other targets, at least similar ones such as
> ARM/AArch32 or Thumb. Any input would be welcome.
>
> Cheers
> Moritz
>
> REPOSITORY
>   rL LLVM
>
> http://reviews.llvm.org/D5174
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140905/d3882f4c/attachment.html>


More information about the llvm-commits mailing list