[PATCH] D19694: [LV] Only bail on interleaved accesses in predicated blocks

Wed May 4 04:35:54 PDT 2016

sbaranga added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5056
@@ -5055,3 +5055,3 @@
   collectConstStridedAccesses(StrideAccesses, Strides);
 
   if (StrideAccesses.empty())
----------------
mssimpso wrote:
> mssimpso wrote:
> > sbaranga wrote:
> > > mssimpso wrote:
> > > > sbaranga wrote:
> > > > > With that change, analyzeInterleaving won't see all the loads/stores (as the predicated ones get skipped). Would this have any impact on the correctness of the analysis (it might be ok, but it's worth thinking about it).
> > > > Hey Silviu, thanks for commenting! Right, this prevents accesses from being placed into an interleaved group if they are in a predicated block, which seems like the right thing to do to me. This is what the test case demonstrates. Currently, no accesses are recognized as interleaved if the loop contains a single predicated block. If an access is non-consecutive and not in an interleaved group, it should be scalarized (unless gather/scatter is supported).
> > > The idea generally makes sense to me.
> > > 
> > > However we also have some logic here to handle the Write-After-Write dependences (in fact more than just these) which seems to rely on seeing all accesses. For example, we seen to be able to get the vectorizer confused by doing something like (i was able to modify your example to do this):
> > > 
> > > 1. load a[i]
> > > 2. load a[i+1]
> > > 3. conditional: store a[i]
> > > 4. load a[i]  -> it sees this as the first load, because we've ignored 3
> > > 
> > > Here is my modification of the example:
> > > 
> > > define void @load_gap_with_pred_store(%pair *%p, i64 %x, i64 %n) {
> > >   entry:
> > >     br label %for.body
> > >   
> > >   for.body:
> > >     %i  = phi i64 [ %i.next, %if.merge ], [ 0, %entry ]
> > >     %f1 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 1
> > >     %f2 = getelementptr inbounds %pair, %pair* %p, i64 %i, i32 0
> > >   
> > >     %r0 = load i64, i64* %f1, align 8
> > >     %r1 = and i64 %r0, %x
> > >     %r2 = icmp eq i64 %r1, %x
> > >     %r4 = add i64 %r1, 1
> > >   
> > >     br i1 %r2, label %if.then, label %if.merge
> > >   
> > >   if.then:
> > >     store i64 0, i64* %f2, align 8
> > >     br label %if.merge
> > >   
> > >   if.merge:
> > >     %r3 = load i64, i64* %f2, align 8
> > >   
> > >     store i64 %r3, i64 *%f1, align 8 // Here we end up storing an element from the wide load, instead of having to reload
> > >   
> > >     %i.next = add nuw nsw i64 %i, 1
> > >     %cond = icmp slt i64 %i.next, %n
> > >     br i1 %cond, label %for.body, label %for.end
> > >   
> > >   for.end:
> > >     ret void
> > >   }
> > > 
> > > 
> > > 
> > Thanks for the counterexample, Silviu. I see what you're getting at. Let me think over this change a little more carefully. We will definitely need to do a little more work here.
> Hi Silviu,
> 
> I think there may be a bug in the interleaved access analysis, independent of this patch, that your test case led me to. It seems to be related to the same issue of reusing a loaded value instead of reloading due to an intervening store. I've created PR27626 to track the issue.
Thanks for digging further! Yes, this appears to also be a problem.


http://reviews.llvm.org/D19694