[PATCH] IndVarSimplify: Don't let LFTR compare against a poison value

Fri Sep 5 08:39:52 PDT 2014

Hi all,

We've been seeing some performance regressions in our internal AArch64 tests due to this patch. I'll preface this by saying that I'm not an expert in this matter at all so take everything I say with a grain of salt :)

I've reduced the test case (this one is from LNT) to what's attached ({F154646}). Cleaned up, the IR looks something like this:

```
for.cond1.preheader:                              ; preds = %middle.block, %entry                                                                           
  %indvars.iv17 = phi i64 [ 0, %entry ], [ %indvars.iv.next18, %middle.block ]                                      
  %indvars.iv.next18 = add nuw nsw i64 %indvars.iv17, 1                                                             
  ...                                                                                                                                                       

middle.block:                                     ; preds = %vector.body                                                                                            
  %exitcond19 = icmp eq i64 %indvars.iv17, 255                                                                      
  br i1 %exitcond19, label %for.end10, label %for.cond1.preheader
```
Using -target aarch64-linux-gnu -mcpu=cortex-a57 -O3, this compiles to the following assembly, also shortened:

```
.LBB0_1:                                // %for.cond1.preheader                                                     
                                        // =>This Loop Header: Depth=1                                              
                                        //     Child Loop BB0_2 Depth 2                                             
        add     x11, x10, #1            // =1
        ...
// BB#3:                                // %middle.block                                                            
                                        //   in Loop: Header=BB0_1 Depth=1                                                                                           
        cmp      x10, #255              // =255                                                                     
        mov      x10, x11                                                                                           
        b.ne    .LBB0_1    
```
It seems like the exact problem the patch addresses occurs: since the calculation for the next indvar has NUW/NSW flags set, it can't be used to check the exit condition. This leads to an extra register being used for the induction variable.

>>! In D5174#13, @atrick wrote:
> I'll say LGTM because fixing a bug is an improvement. This is unfortunate though because LFTR does not care about overflow and we don't want to pessimize code just because SCEV was able to infer NSW. The problem is that we can't express that NSW applies to some uses an not others.

Am I right in thinking that the behaviour in this specific test case is too conservative as a result of the patch? (An i64 won't overflow, since we know the trip count is only 256.)

If yes, do you have any suggestions on how we could fix this? I'm assuming this might also affect other targets, at least similar ones such as ARM/AArch32 or Thumb. Any input would be welcome.

Cheers
Moritz

REPOSITORY
  rL LLVM

http://reviews.llvm.org/D5174