<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Sep 5, 2014 at 11:39 AM, Moritz Roth <span dir="ltr"><<a href="mailto:moritz.roth@arm.com" target="_blank">moritz.roth@arm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Hi all,<br>

<br>

We've been seeing some performance regressions in our internal AArch64 tests due to this patch. I'll preface this by saying that I'm not an expert in this matter at all so take everything I say with a grain of salt :)<br>

<br>

I've reduced the test case (this one is from LNT) to what's attached ({F154646}). Cleaned up, the IR looks something like this:<br>

<br>

```<br>

for.cond1.preheader:                              ; preds = %middle.block, %entry<br>

  %indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18, %middle.block ]<br>

  %indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1<br>

  ...<br>

<br>

middle.block:                                     ; preds = %vector.body<br>

  %exitcond19 = icmp eq i64 %indvars.iv17, 255<br>

  br i1 %exitcond19, label %for.end10, label %for.cond1.preheader<br>

```<br>

Using -target aarch64-linux-gnu -mcpu=cortex-a57 -O3, this compiles to the following assembly, also shortened:<br>

<br>

```<br>

.LBB0_1:                                // %for.cond1.preheader<br>

                                        // =>This Loop Header: Depth=1<br>

                                        //     Child Loop BB0_2 Depth 2<br>

        add     x11, x10, #1            // =1<br>

        ...<br>

// BB#3:                                // %middle.block<br>

                                        //   in Loop: Header=BB0_1 Depth=1<br>

        cmp      x10, #255              // =255<br>

        mov      x10, x11<br>

        <a href="http://b.ne" target="_blank">b.ne</a>    .LBB0_1<br>

```<br>

It seems like the exact problem the patch addresses occurs: since the calculation for the next indvar has NUW/NSW flags set, it can't be used to check the exit condition. This leads to an extra register being used for the induction variable.<br>

<span class=""><br>

>>! In D5174#13, @atrick wrote:<br>

> I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.<br>

<br>

</span>Am I right in thinking that the behaviour in this specific test case is too conservative as a result of the patch? (An i64 won't overflow, since we know the trip count is only 256.)<br></blockquote><div><br></div><div>A low trip count isn't enough, overflow may depend on where the IV starts counting from.</div><div><br></div><div>The example that this fixes has a trip count of 13, the SCEV is {-13,+,1}<nuw><nsw><%for.cond2.preheader>. On the very last iteration of the loop, the *next* indvar will be poison because both NUW and NSW will be violated.</div><div><br></div><div>I'll take a look.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">

<br>

If yes, do you have any suggestions on how we could fix this? I'm assuming this might also affect other targets, at least similar ones such as ARM/AArch32 or Thumb. Any input would be welcome.<br>

<br>

Cheers<br>

<span class=""><font color="#888888">Moritz<br>

</font></span><div class=""><div class="h5"><br>

REPOSITORY<br>

  rL LLVM<br>

<br>

<a href="http://reviews.llvm.org/D5174" target="_blank">http://reviews.llvm.org/D5174</a><br>

<br>

<br>

</div></div></blockquote></div><br></div></div>