[PATCH] D19984: [LV] Preserve order of dependences in interleaved accesses analysis
Adam Nemet via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 14 14:39:43 PDT 2016
anemet added a comment.
This is very close I just want to take extra care to update/improve the comments while this is all fresh in our memory.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:967-968
@@ +966,4 @@
+
+ /// \brief Returns true if memory accesses \p A and \p B can be reordered.
+ ///
+ /// At least one of the accesses should be strided, and \p A must precede \B
----------------
Actually sorry, let's make this even more precise. At this point the function is called with *any* accesses. (The function later checks that if both accesses are non-strided we don't bother checking deps since we would never reorder those.)
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:971
@@ +970,3 @@
+ /// in program order.
+ bool canReorderMemAccesses(StrideEntry *A, StrideEntry *B) const {
+
----------------
We should probably have some qualification in the name, something like canReorderAccessesForInterleavedGroups or something like that. This is not a general canReorder predicate but takes into consideration the actual code motion strategy.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5135-5142
@@ -5134,10 +5228,2 @@
// 3. B belongs to the group according to the distance.
- //
- // The bottom-up order can avoid breaking the Write-After-Write dependences
- // between two pointers of the same base.
- // E.g. A[i] = a; (1)
- // A[i] = b; (2)
- // A[i+1] = c (3)
- // We form the group (2)+(3) in front, so (1) has to form groups with accesses
- // above (1), which guarantees that (1) is always above (2).
for (auto I = StrideAccesses.rbegin(), E = StrideAccesses.rend(); I != E;
----------------
Is my understanding correct, that you removed this because we no longer support this case? I.e. the dep 1->2 does not currently allow for this.
If this is true, we should probably add a FIXME.
Also, I thought the WAW case was the only reason for the bottom-up ordering. That is a bit confusing now too.
================
Comment at: lib/Transforms/Vectorize/LoopVectorize.cpp:5254-5257
@@ -5162,1 +5253,6 @@
+ // If A and B cannot be reordered, we can't add additional instructions
+ // to A's group since this may cause illegal code motion. If B is already
+ // in a group release it.
+ if (!canReorderMemAccesses(&*II, &*I)) {
+ if (isInterleaved(B)) {
----------------
anemet wrote:
> mssimpso wrote:
> > anemet wrote:
> > > Matt, I am not sure I follow, which of these accesses is not constant distance in my example?
> > >
> > > I just tried this example:
> > >
> > >
> > > ```
> > > void
> > > f(char *a, char * __restrict c, int n) {
> > > for (int i = 0; i < n; i+=3) {
> > > c[i] = a[i];
> > > a[i+4] = c[i+4];
> > > c[i+1] = a[i+1];
> > > }
> > > }
> > >
> > > ```
> > > If you compile this on x86_64 with:
> > >
> > > ```
> > > -O3 -mllvm -enable-load-pre=0 -mllvm -store-to-load-forwarding-conflict-detection=0 -mllvm -enable-interleaved-mem-accesses -mllvm -force-vector-width=2
> > > ```
> > >
> > > Then the loads from 'a' will be merged which breaks the forward dependence between the store of a[i+4] and the load of a[i+1].
> > >
> > > No?
> > Adam,
> >
> > We may be talking past each other a bit. I think your original request (correct me if I'm wrong) was for a test case showing that the existing analysis doesn't consider checking memory accesses that are not "candidates" for interleaved groups. Candidate accesses are the ones collected in StrideAccesses prior to the analysis. However, every access in your example actually is a candidate access because they are all strided. They are not ignored. Yes, we currently do the wrong thing here, but that's because the current analysis isn't correctly checking dependences between the actual candidate accesses, not because it isn't considering a dependence between a candidate access and a non-candidate access.
> >
> > Are you asking for something different?
> Ah, the disconnect was that I didn't realize that accesses with offsets larger than the stride were considered as candidates. The idea was to have these mimic non-strided accesses.
>
> I guess it makes sense to consider these candidates as well because the offset is not really an offset but rather the distance between two accesses.
>
> So I guess we can't have a testcase for this case at the moment...
>
> I'll continue reviewing the patch.
It would be nice to use A and B related iterators in this part because that is what you mention in the comment. E.g. AI, BI or IA, IB.
http://reviews.llvm.org/D19984
More information about the llvm-commits
mailing list