[PATCH] D25276: [LoopVectorizer] Interleaved-mem-accesses analysis and getPtrStride
email@example.com via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 24 02:45:26 PDT 2016
sbaranga added inline comments.
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:5872
+ Value *Ptr = getPointerOperand(Member);
+ int Stride = getPtrStride(PSE, Ptr, TheLoop, Strides, /*Assume=*/false,
> sbaranga wrote:
> > dorit wrote:
> > > dorit wrote:
> > > > sbaranga wrote:
> > > > > I think there is another case here when we don't need to check the wrapping: if the group has the first and last pointers, and we know they both don't wrap.
> > > > >
> > > > > We should be able to apply this for versioning later and have at most two no-wrap checks per interleaved group.
> > > > So, trying to summarize:
> > > > 1) Initial check with Assume=true, ShouldCheckWrap=false;
> > > > 2) Revisit only groups with gaps as follows:
> > > > - if first and last elements are accessed: skip further checking.
> > > > - otherwise – check the first and last pointers of the group with Assume=false (for now), ShouldCheckWrap=true;
> > > >
> > > > Silviu, is that what you meant?
> > > >
> > > 3 and 4 above were meant to be sub-bullets of item 2...:
> > >
> > > 1.Initial check with Assume=true, ShouldCheckWrap=false;
> > > 2.Revisit only groups with gaps as follows:
> > > - if first and last elements are accessed: skip further checking.
> > > - otherwise – check the first and last pointers of the group with Assume=false (for now), ShouldCheckWrap=true;
> > >
> > I agree with point 1)
> > For point 2) I meant something slightly different:
> > - if first and last elements are accessed *and* we know they don't wrap, skip further checking (because it implies that all the pointers in the group don't wrap).
> > - if the group is full, skip further checking
> > - otherwise (as you said) check the first and last pointers of the group with Assume=false (for now), ShouldCheckWrap=true;
> > I think optimizing this is a bit open-ended, but we could definitely go further (feel free to omit this for this commit):
> > If we are peeling the loop and we know that the first pointer doesn't wrap then we can deduce that all pointers in the group don't wrap.
> > This means that we can forcefully peel the loop in order to only have to check the first pointer for no-wrap. When
> > we'll change to use Assume=true,ShouldCheckWrap=true we'll only need at most one runtime check per interleaved group (I think this is probably optimal).
> about “*and* we know they don't wrap”….:
> 1) exactly what do you mean by “know they don’t wrap”? that calling getPtrStride with Assume=false, ShouldCheckWrap=true is successful?
> 2) I would think that “group is full” and “first and last elements are accessed” are equivalent in terms of the effect of the transformation on creating new potentially-wrapping accesses (in both cases the transformation will not make things worse than the original code in that respect); we shouldn’t care if the accesses to the first/last elements wrap or not because the wide loads that we’ll create will have the same wrap-around behavior as the original code, no? (in short, why the extra "*and* we know they don't wrap"...?)
> (this is assuming that potential wrapping that may be caused by extra accesses in the vector code created due to misalignment are accounted for else-where).
1) Correct, in this case I mean Assume=false, ShouldCheckWrap=true. Although when switching to using Assume = true it is preferable to call getPtrStride on just the two accesses instead of on all the pointers.
2). Here is my current reasoning:
The justification for why the "group is full" case doesn't require checking is that if the wide load does wrap we would do an access to nullptr in the non-transformed code (and therefore have undefined behaviour). This reasoning doesn't hold when the group is not full (since we could be skipping the access to nullptr).
I didn't fully understand the part about alignment, but AFAICT considering here that the accesses all the accesses are aligned to the element type should be enough.
More information about the llvm-commits