[PATCH] D19694: [LV] Only bail on interleaved accesses in predicated blocks

Tue May 3 09:35:09 PDT 2016

sbaranga added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5056
@@ -5055,3 +5055,3 @@
   collectConstStridedAccesses(StrideAccesses, Strides);
 
   if (StrideAccesses.empty())
----------------
mssimpso wrote:
> sbaranga wrote:
> > With that change, analyzeInterleaving won't see all the loads/stores (as the predicated ones get skipped). Would this have any impact on the correctness of the analysis (it might be ok, but it's worth thinking about it).
> Hey Silviu, thanks for commenting! Right, this prevents accesses from being placed into an interleaved group if they are in a predicated block, which seems like the right thing to do to me. This is what the test case demonstrates. Currently, no accesses are recognized as interleaved if the loop contains a single predicated block. If an access is non-consecutive and not in an interleaved group, it should be scalarized (unless gather/scatter is supported).
The idea generally makes sense to me.

However we also have some logic here to handle the Write-After-Write dependences (in fact more than just these) which seems to rely on seeing all accesses. For example, we seen to be able to get the vectorizer confused by doing something like (i was able to modify your example to do this):

1. load a[i]
2. load a[i+1]
3. conditional: store a[i]
4. load a[i]  -> it sees this as the first load, because we've ignored 3

Here is my modification of the example:

define void @load_gap_with_pred_store(%pair *%p, i64 %x, i64 %n) {
  entry:
    br label %for.body
  
  for.body:
    %i  = phi i64 [ %i.next, %if.merge ], [ 0, %entry ]
    %f1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
    %f2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
  
    %r0 = load i64, i64* %f1, align 8
    %r1 = and i64 %r0, %x
    %r2 = icmp eq i64 %r1, %x
    %r4 = add i64 %r1, 1
  
    br i1 %r2, label %if.then, label %if.merge
  
  if.then:
    store i64 0, i64* %f2, align 8
    br label %if.merge
  
  if.merge:
    %r3 = load i64, i64* %f2, align 8
  
    store i64 %r3, i64 *%f1, align 8 // Here we end up storing an element from the wide load, instead of having to reload
  
    %i.next = add nuw nsw i64 %i, 1
    %cond = icmp slt i64 %i.next, %n
    br i1 %cond, label %for.body, label %for.end
  
  for.end:
    ret void
  }





http://reviews.llvm.org/D19694