RFC: Modeling horizontal vector reductions

Wed Sep 11 16:24:02 PDT 2013

On Sep 11, 2013, at 5:54 PM, Chandler Carruth <chandlerc at google.com> wrote:

> On Wed, Sep 11, 2013 at 3:49 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> 
> On Sep 11, 2013, at 5:30 PM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> >
> > On Wed, Sep 11, 2013 at 3:17 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:
> > Therefore, I would like to model horizontal reductions as either versions depending on which is deemed cheaper by the cost model.
> >
> > What would make the first pattern cheaper? I'd like to better understand why we don't just all ways do the second form…
> 
> Less shuffles (because shuffle vec, <0,1, undef, undef> is free) so when you don’t have pairwise vector operations the first pattern is preferable.
> 
> I thought so, but thanks for confirming. I don't trust myself entirely on the cost models here.
>  
> > It is a bit unfortunate to not have one canonical form but I don’t think this justifies adding fast-math flags to isel (which will eventually go away).
> >
> > I don't really understand this part.
> >
> > We have some reason at the IR level to know that we can choose either association and get equivalent results. Why isn't the correct answer to pick a canonical form, but preserve that information long enough to reassociate when it is needed?
> 
> We want to pick the right form at ISel time - which is too late to reassociate.
> 
> At some point - before ISel - we have to reasscociate because at ISel time we don’t have the fast-math flags that would tell us that it is legal to reassociate.
> 
> So, we could for example reassociate in CodeGen prepare. We would still need an interface to tell us when to do so.
> 
> Why don't you want to propagate the flags through isel? That really seems like the correct long-term solution: that ISel looks at the pattern, knows that it would be cheaper to use the horizontal instructions and emits the code that way. It doesn't even have to actually *do* the reassociation, it can match the reduction pattern and implicitly re-associate by forming the horizontal instruction pattern. It just needs to know that this is allowed.

Yes sure. You only actually have to reassociate if you don’t have the flags.

> You mentioned not wanting to thread these flags through because some of the machinery is slowly going away, but I think that "slowly" is going to be a *lot* of time. I think threading the flags through is a much better interim cost (fixed, no design overhead) than having 2 patterns in the IR for the same vector operation

I am not so sure that threading those flags to ISel is so cheap in terms of implementing it. If there is not other client I am not sure it is worth doing.

CC’ing Owen who I have been told might have started this at one point. Owen is my thinking that threading fast-math flags into ISel is a bigger undertaking a correct estimate or is it really just “creating and maintaining a BinOp->FastMathFlag map”? I am guessing the maintaining part might be hairy.