[PATCH] D19984: [LV] Preserve order of dependences in interleaved accesses analysis

Wed Jun 15 10:03:10 PDT 2016

mssimpso added inline comments.

================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5135-5142
@@ -5134,10 +5228,2 @@
   //   3. B belongs to the group according to the distance.
-  //
-  // The bottom-up order can avoid breaking the Write-After-Write dependences
-  // between two pointers of the same base.
-  // E.g.  A[i]   = a;   (1)
-  //       A[i]   = b;   (2)
-  //       A[i+1] = c    (3)
-  // We form the group (2)+(3) in front, so (1) has to form groups with accesses
-  // above (1), which guarantees that (1) is always above (2).
   for (auto I = StrideAccesses.rbegin(), E = StrideAccesses.rend(); I != E;
----------------
mssimpso wrote:
> anemet wrote:
> > Is my understanding correct, that you removed this because we no longer support this case?  I.e. the dep 1->2 does not currently allow for this.
> > 
> > If this is true, we should probably add a FIXME.
> > 
> > Also, I thought the WAW case was the only reason for the bottom-up ordering.  That is a bit confusing now too.
> No, we do still support this case, for essentially the reason the existing comment states. When the outer loop of the analysis is on (3) and we visit (2) and (1) in the inner loop, we will form the (3,2) group. (1) can't be added to (3,2) because a member already exists in group (3,2) with the same offset. When the outer loop is on (2), we will again visit (1) in the inner loop. This time we will see the dependence between (1) and (2) and give up trying to add additional accesses to (2)'s group. (1) is not yet in a group, so the outer loop moves on to (1). Thus, (1) must form a group with accesses that precede it and won't be sunk, as the existing comment says.
> 
> I removed this comment because I thought it was somewhat misleading. The bottom-up ordering is not //sufficient// to prevent us from breaking WAW dependencies. We still have to explicitly check for them. For example, if we had something like the following for a factor 2 group:
> 
> ```
> A[i]   = x  (1)
> A[i+2] = y  (2)
> A[i+1] = z  (3)
> ```
> 
> The analysis would proceed as before. When the outer loop is on (3): this time (2) is //not// added to (3)'s group because its index is too large. (1) is temporarily added to (3)'s group, creating a (3,1) group. When the outer loop is on (2): we see the dependence between (1) and (2). As before, we stop trying to add additional instructions to (2)'s group. But now since (1) is already in a group, and we now know it shouldn't be reordered, we release the (3,1) group. So the bottom-up ordering isn't sufficient as we still have to check for the dependence.
> 
> It's probably worth adding a comment that clarifies why the analysis is bottom-up.
I got the second example here slightly wrong, but the idea is the same. I was intending to describe something like:

```
A[i-1] = x  (1)
A[i-3] = y  (2)
A[i]   = z  (3)
```

Now, the dependence between (1) and (2) will cause the (3,1) group to be released, as I originally described. I'm adding test cases for both of these WAW examples. 


================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5254-5257
@@ -5162,1 +5253,6 @@
 
+      // If A and B cannot be reordered, we can't add additional instructions
+      // to A's group since this may cause illegal code motion. If B is already
+      // in a group release it.
+      if (!canReorderMemAccesses(&*II, &*I)) {
+        if (isInterleaved(B)) {
----------------
mssimpso wrote:
> anemet wrote:
> > anemet wrote:
> > > mssimpso wrote:
> > > > anemet wrote:
> > > > > Matt, I am not sure I follow, which of these accesses is not constant distance in my example?
> > > > > 
> > > > > I just tried this example:
> > > > > 
> > > > > 
> > > > > ```
> > > > > void
> > > > > f(char *a, char * __restrict c, int n) {
> > > > >   for (int i = 0; i < n; i+=3) {
> > > > >     c[i] = a[i];
> > > > >     a[i+4] = c[i+4];
> > > > >     c[i+1] = a[i+1];
> > > > >   }
> > > > > }
> > > > > 
> > > > > ```
> > > > > If you compile this on x86_64 with:
> > > > > 
> > > > > ```
> > > > > -O3 -mllvm -enable-load-pre=0 -mllvm -store-to-load-forwarding-conflict-detection=0 -mllvm -enable-interleaved-mem-accesses -mllvm -force-vector-width=2
> > > > > ```
> > > > > 
> > > > > Then the loads from 'a' will be merged which breaks the forward dependence between  the store of a[i+4] and the load of a[i+1].
> > > > > 
> > > > > No?
> > > > Adam,
> > > > 
> > > > We may be talking past each other a bit. I think your original request (correct me if I'm wrong) was for a test case showing that the existing analysis doesn't consider checking memory accesses that are not "candidates" for interleaved groups. Candidate accesses are the ones collected in StrideAccesses prior to the analysis. However, every access in your example actually is a candidate access because they are all strided. They are not ignored. Yes, we currently do the wrong thing here, but that's because the current analysis isn't correctly checking dependences between the actual candidate accesses, not because it isn't considering a dependence between a candidate access and a non-candidate access.
> > > > 
> > > > Are you asking for something different?
> > > Ah, the disconnect was that I didn't realize that accesses with offsets larger than the stride were considered as candidates.  The idea was to have these mimic non-strided accesses.
> > > 
> > > I guess it makes sense to consider these candidates as well because the offset is not really an offset but rather the distance between two accesses.
> > > 
> > > So I guess we can't have a testcase for this case at the moment...
> > > 
> > > I'll continue reviewing the patch.
> > It would be nice to use A and B related iterators in this part because that is what you mention in the comment.  E.g. AI, BI or IA, IB.
> I agree. I think we should also swap A and B in this function to match what we did to the canReorderMemAccesses function. There, A precedes B, but here B precedes A. They should be consistent.
Actually, swapping A with B in this function will make the review very difficult. I think I'd rather swap A and B in canReorderMemAccesses to make things consistent. This will mean B precedes A both places. We can then swap them both in an NFC patch if we want A to precede B.


http://reviews.llvm.org/D19984