[LLVMdev] SIV tests in LoopDependence Analysis, Sanjoy's patch

Mon Apr 2 22:25:14 PDT 2012

Hi Sanjoy,

I wondered:
>> In LoopDependenceAnalysis::AnalyzePair, what's going to happen if we
>> have something like this
>>
>> for (i = 0; i < n; i++)
>>   for (j = 0; j < n; j++)
>>     A[i][j]++;
>>
>> versus
>>
>> for (i = 0; i < n; i++)
>>   for (j = 0; j < n; j++)
>>     A[j][i]++;

> I think this can be fixed by ordering the subscripts on the loop
> nesting.  This should be doable, since the add recurrences remember
> the loop whose induction variable they're representing.

Yes, we need to record the results of the individual subscript tests in
a vector indexed by the loop-nesting depth.
The order of the subscripts is irrelevant.

> Secondly, I think we can specialize on dealing only with linear add
> recurrences and constants (symbolic or actual constants) at this
> point.  Since LLVM represents indices like
>
> for (i = 0; i < n; i++)
>  for (j = 0; j < n; j++)
>    a[i + j]++
>
> as quite nicely as add recurrences of add recurrences we should be
> able to get quite far along with this approach.

Yes, this will be fine. Indeed, I don't think we'll ever be able to do more.

>> And what's going to happen here?
>>
>> for (i = 0; i < n; i++)
>>     A[i+1][i+2] = A[i][i];
>
> I left out handling coupled subscripts at this point because ignoring
> them only results in more conservative safer inferences about
> dependence.  More sophisticated analysis like this can be added
> incrementally, I think.

Yes, though my point was that you are currently recording
two different results, when there's actually only one loop level.

>> And here?
>>
>> for (i = 0; i < n; i++)
>>   for (j = 0; j < n; j++)
>>     A[i] = A[i] + B[i][j];
>
> I'm not too clear about this.  Assuming A and B point to two different
> locations, there seems to be only one loop independent dependence --
> from A[i] to A[i].

Again, I'm trying to point out that the number and order of subscripts has
nothing to do with the loop levels. What we want to find here is that
there are 3 dependences:
1) a loop-carried anti dependence from the load of A to the store,
with a direction vector of [= S|<]
2) a loop-carried flow dependence from the store to the load of A,
with direction vector [= S|0]
3) a loop-carried output dependence from the store to the store, with
direction vector [= S|0]
(see Section 8.2.3 in Allen & Kennedy)

> I think the following would be a good start at modelling dependencies
> between two Values:
>
> class Dependence {
> public:
>  struct Level {
>    enum { LT  = 1, EQ  = 2, GT  = 4, ALL = 7 } direction;
>    bool scalar;
>    const SCEV *distance; // NULL implies no distance available
>    const Loop *loop;
>  };

What is the loop value used for?
Seems redundant, in that we can get to the statements given the
source and destination pointers, and from there to the loop nest.
The relative position in the loop nest is given by the index of
the levels vector.

>  enum Kind { Flow, Anti, Output, Input };
>
>  Kind getKind() const {
>    return kind;
>  }
>
>  const Value *getSource() const {
>    return source;
>  }
>
>  const Value *getDestination() const {
>    return destination;
>  }
>
>  typedef SmallVector<const Level *, 4>::const_iterator const_iterator;
>  const_iterator begin() const { return levels.begin(); }
>  const_iterator end() const { return levels.end(); }
>
> private:
>  Value *source, *destination;
>  Kind kind;
>  SmallVector<const Level *, 4> levels;

I'd malloc an ordinary vector of the appropriate length,
since we know the length at allocation time.

>  friend class LoopDependenceAnalysis;
> };
>
> More methods, like `isConsistent` and `isConfused` can be added easily
> to this interface, since the raw data required to compute them is
> available.

You'll also need flags for loop-independent, consistent, and confused.
I thought earlier that the latter two could be computed, but I was wrong.
Can't compute them when the source or destination isn't in the loop.

When a dependence is completely confused, we needn't represent the
direction vector (what you call levels), so we can perhaps save a bit of
storage in that common case.

> Given two Values, then, we can first break them up into their
> respective indices.  We can then order them based on their loop
> nesting.  To actually compute the dependence, we can traverse down the
> subscript lists of the two Values and keep checking individual
> subscript pairs.

Sort of.  I would break them up into two lists if subscripts.
Don't sort these lists!  Instead, find the common nesting depth for
the 2 statements and allocate distance vectors of the appropriate length.
Then, as we get results from testing individual subscript pairs,
we record them in the direction vector at the appropriate level.

Initialize each level in the direction vector with ALL.
Set the Scalar flag for each level.

Imagine we're starting simple, with no attempt to take advantage of
coupled subscripts.
Examine each pair of subscripts in sequence.
If a pair is ZIV, we test hoping to prove independence.
If we can't, we move on to the next subscript.

If a pair is MIV, we ignore it and move on to the next subscript.

If a pair is SIV, we check to see if it's something we can handle
(strongSIV, weakzeroSIV, ...). If not, skip to next subscript.

If we can handle it, compute direction (if StrongSIV, compute distance).
Find the appropriate level based on the index variable, turn off the
Scalar flag,
and intersect the new direction with whatever is already stored there.
If the result is NONE, then there's no dependence and we're done.
If we have a distance, set it (if a distance already exists, compare to
see if they are the same. If not, then no dependence exists, and we're done).
Probably don't bother to record symbolic distances.
Move on to the next subscript.

That's the general idea.
I'll spend some time over the next week or so
trying to get it written up in more detail on my
website: https://sites.google.com/site/parallelizationforllvm/dependence-test

Does this make sense.
If not, please get back to me.

> I'm not sure on how to deal with cases such as these:
>
> for (i = 0; i < n; i++) {
>  for (j = 0; j < n; j++) {
>    t = a[i][j];
>    a[i + 1][j + 1] = t;
>  }
>  for (j = 0; j < n; j++) {
>    t = a[i][j];
>    a[i + 1][j + 1] = t;
>  }
> }
>
> My gut feeling is that we should stop checking individual subscripts
> when the loop nesting have diverged.  But don't know how to concretely
> represent such dependencies.

We can compute dependence for imperfectly nested loops like this;
indeed, we must if we want to try and parallelize the outer loop.
But there's also the possibility of fusing the two inner loops.
If we want to do that (and we often do), then we need to look
for dependences between them.

Generally, dependences between 2 statements only have to do with
the common loops. So dependences between these two inner loops
should pay attention to subscripts that mention "i".
So we'd say that values flow from store in the first loop to
load in the second loop, with a distance vector [1|<]

However, in this special case, where we have two loops next to each
other with identical loop bounds, we'd like to fuse the loops, if it's legal.
Often very useful as a way to save loads and cache misses.
Now someother part of the compiler would notice the potential
for fusing (the adjacent loops with identical headers), but it would rely
on the dependence analyzer to make sure fusion was legal, i.e., that
no dependences would be violated. To do so, we need to pretend
that the fusion has already taken place, and check subscripts for
both the i and j loops.

Does this make sense?

I'd arrange for it by providing an extra entry point with an extra parameter
specifying how many loop levels to check.

> I'm assuming in cases such as these:
>
> for (i = 0; i < n; i++)
>  for (j = 0; j < n; j++)
>    a[i][0] = a[j][i]
>
> we can have a dependence edge [S U | 0].

Scalar directions occur when a loop level isn't mentioned in any subscript.
For example,

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    a[i + 1] = a[i] + 1;

Here we see two dependences
1) a loop-carried flow dependence from the store to A and the load,
with dv of [1 S|0]
2) a loop-carried output dependence from the store to the store, with
dv of [0 S|0]
I don't think there's an anti dependence in this case.

Your example is more complicated. Let's look at it in detail

for (i = 0; i < n; i++)
 for (j = 0; j < n; j++)
   a[i][0] = a[j][i]

Comparing the store against itself, we find a loop-carried output
dependence with dv of [0 S|0].
Do you see how it happens? We have two stores with subscripts [i][0] and [i][0].
The j loop isn't mentioned, so its level is S.
The i loop is mentioned in the first subscript, so that level is 0 (or =).
And since we're talking about the same instruction, it can't be a
loop-independent dependence.

Comparing the load against the store, we see [j][i] and [i][0].
The subscripts are coupled, with an MIV and an SIV.
Looking at the SIV, we see that there can only be a dependence when i = 0.
Propagating that fact into the other subscript pair,
we find that the dependence can only exist when j = i = 0.
So there can't be a loop-carried dependence, so no flow dependence.
On the other hand, we can have an inconsistent loop-independent anti
dependence from the load to the store, with dv [0 0|<]

Preston