RFC: Modeling horizontal vector reductions

Thu Sep 12 06:59:38 PDT 2013

On Sep 11, 2013, at 7:17 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> 
> On Sep 11, 2013, at 6:24 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> 
>> 
>> On Sep 11, 2013, at 5:54 PM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>>> On Wed, Sep 11, 2013 at 3:49 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>>> 
>>> On Sep 11, 2013, at 5:30 PM, Chandler Carruth <chandlerc at google.com> wrote:
>>> 
>>>> 
>>>> On Wed, Sep 11, 2013 at 3:17 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
>>>> Therefore, I would like to model horizontal reductions as either versions depending on which is deemed cheaper by the cost model.
>>>> 
>>>> What would make the first pattern cheaper? I'd like to better understand why we don't just all ways do the second form…
>>> 
>>> Less shuffles (because shuffle vec, <0,1, undef, undef> is free) so when you don’t have pairwise vector operations the first pattern is preferable.
>>> 
>>> I thought so, but thanks for confirming. I don't trust myself entirely on the cost models here.
>>> 
>>>> It is a bit unfortunate to not have one canonical form but I don’t think this justifies adding fast-math flags to isel (which will eventually go away).
>>>> 
>>>> I don't really understand this part.
>>>> 
>>>> We have some reason at the IR level to know that we can choose either association and get equivalent results. Why isn't the correct answer to pick a canonical form, but preserve that information long enough to reassociate when it is needed?
>>> 
>>> We want to pick the right form at ISel time - which is too late to reassociate.
>>> 
>>> At some point - before ISel - we have to reasscociate because at ISel time we don’t have the fast-math flags that would tell us that it is legal to reassociate.
>>> 
>>> So, we could for example reassociate in CodeGen prepare. We would still need an interface to tell us when to do so.
>>> 
>>> Why don't you want to propagate the flags through isel? That really seems like the correct long-term solution: that ISel looks at the pattern, knows that it would be cheaper to use the horizontal instructions and emits the code that way. It doesn't even have to actually *do* the reassociation, it can match the reduction pattern and implicitly re-associate by forming the horizontal instruction pattern. It just needs to know that this is allowed.
>> 
>> Yes sure. You only actually have to reassociate if you don’t have the flags.
>> 
>> 
>>> You mentioned not wanting to thread these flags through because some of the machinery is slowly going away, but I think that "slowly" is going to be a *lot* of time. I think threading the flags through is a much better interim cost (fixed, no design overhead) than having 2 patterns in the IR for the same vector operation
> 
> I don’t think there is an extra cost that I would incur. If we don’t have unsafe-math flags we want to generate either of those two patterns anyways:
> 
> Say somebody really wrote:
> 
> v0 = 7 * A[i];
> v1 = 7 * A[i+1];
> v2 = 7 * A[i+2];
> v3 = 7 * A[i+3];
> r += (v0 + v1) +  (v2 + v3);
> 
> 
> VS.
> 
> v0 = 7 * A[i];
> v1 = 7 * A[i+1];
> v2 = 7 * A[i+2];
> v3 = 7 * A[i+3];
> r += (v0 + v2) +  (v1 + v3);
> 
> In this case the order dictates which pattern to use. It is just in the fast-math case that the order does not matter.

To summarize the point I am trying to make here:

- Without unsafe-math “fadd fast …", that is floating point arithmetic must not reassociate, the SLP vectorizer would (if the cost model deemed it beneficial) generate either of those patterns depending on the input addition tree. Therefore, subsequent passes better know how to deal with them (they are probably not touching them anyways because of the shuffles).

- With unsafe-math/“fadd fast …”, we will have two forms to represent (within a “unsafe math universe") the same semantics, this is a little ugly but given the previous point I don’t think there is a cost to this.

What do you think?

I agree in a perfect world I would go and fix ISel, I am not sure I have time to do this, though. And I don’t think I am making things worse by not doing so (it is just that I am not improving things in ISel land).

Best,
Arnold