[PATCH] D21620: [LV] Don't widen trivial induction variables

Wed Jun 22 14:56:31 PDT 2016

mssimpso added a comment.

Michael,

Thanks for the quick feedback. But I definitely wasn't trying to clean up after you! Your work creating vector phi's is certainly a step in the right direction. I just came across some examples where we were currently generating bad code (even before your changes). Please see my responses inline. Thanks!

Matt.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:2189
@@ +2188,3 @@
+
+  // We can't create a vector with less than two elements.
+  assert(VF > 1 && "VF should be greater than one");
----------------
mkuper wrote:
> This looks a bit weird to me.Do you have a test-case where this works, but instcombine fails to simplify the regular step vector?
> Perhaps it would be better to fix instcombine, I can look into it.
> 
> In any case, that shouldn't block the rest of the patch - (a) not widening the phi when the uses are trivial, and (b) using the scalarized step vector instead of the regular one should be independently good, right?
> This looks a bit weird to me.Do you have a test-case where this works, but instcombine fails to simplify the regular step vector?

Sure. Here's a simple example with interleaved access vectorization enabled.

In the test case below, if we use the regular step vector, we end up creating unneeded splat inserts and extracts inside the loop. A scalar induction variable is definitely preferable here since we don't do anything exciting with it.

```

; opt -S < %s -loop-vectorize -force-vector-width=2 -force-vector-interleave=2 -enable-interleaved-mem-accesses -instcombine
target datalayout = "e-m:e-i64:64-i128:128-n32:64-S128"
target triple = "aarch64--linux-gnu"

%pair = type { i64, i64 }
define void @interleaved(%pair *%p, i64 %y, i64 %n) {
entry:
  br label %for.body

for.body:
  %i  = phi i64 [ %i.next, %for.body ], [ 0, %entry ]
  %f1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
  %r0 = load i64, i64* %f1, align 8
  %r3 = xor i64 %r0, %y
  store i64 %r3, i64* %f1, align 8
  %i.next = add nuw nsw i64 %i, 1
  %cond = icmp slt i64 %i.next, %n
  br i1 %cond, label %for.body, label %for.end

for.end:
  ret void
}
```

I've seen other cases with what I've been working on that lead me to believe fixing this kind of thing in InstCombine is not going to be that fruitful. Things get very ugly when interleaved access vectorization is enabled with conditional stores vectorization, for example. I think it's better to do the right thing at the outset.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:4145
@@ +4144,3 @@
+
+  // Look at the users of the induction variable inside the loop, and determine
+  // if any of them are interesting. If we find interesting users (e.g., loads
----------------
mkuper wrote:
> We are already doing something very similar in collectValuesToIgnore(), for basically the same reason - to estimate whether we'll end up needing a vector IV or not (and when InstCombine can't do the cleanup, my patch breaks that logic as well).
> 
> Can you use collectValuesToIgnore() instead? Either by checking whether all the phi users are in ValuesToIgnore, or by marking the PHI itself as "should/should not be widened" during collectValuesToIgnore(), if that makes sense?
> 
Ah, right. I forgot we already checked for this in collectValuesToIgnore. Thanks for pointing this out! I will update the patch.

http://reviews.llvm.org/D21620