[llvm-commits] Reassociating for vectors

Sat May 26 08:15:21 PDT 2012

On Sat, 26 May 2012 15:46:10 +0200
Duncan Sands <baldrick at free.fr> wrote:

> Hi Hal,
> 
> On 26/05/12 15:13, Hal Finkel wrote:
> > On Sat, 26 May 2012 00:06:29 -0700
> > Andrew Trick<atrick at apple.com>  wrote:
> >
> >> On May 25, 2012, at 7:38 PM, Hal Finkel wrote:
> >>> This interests me because I also need some procedure for
> >>> reassociating in order to have basic-block vectorization do
> >>> something interesting for reductions. To start, I'd want
> >>> a+b+c+d+e+f+g+h, regardless of the original association, to be
> >>> transformed into: (a+b)+(c+d)+(e+f)+(g+h) or (a+b+c+d)+(e+f+g+h)
> >>> [the number of groups should depend on the target's vector length,
> >>> and maybe some other things as well].
> >>>
> >>> I'm not sure whether I should try to bake this into Reassociate,
> >>> or refactor Reassociate so that parts of it can be used by
> >>> BBVectorize, or something else. Do you have an opinion?
> >>
> >> This sounds to me like something BBVectorize should do only after
> >> determining the expression is vectorizable.
> >
> > It seems like, in general, such a transformation exposes
> > ILP, and that would be good even if the result did not vectorize. Do
> > you think I'm wrong about that?
> 
> it would be easy enough to have reassociate always write things out as
> some kind of balanced tree rather than linearly,

Duncan,

Can you sketch out how this would be done? It looks like you'd just
want to change the recursion used by Reassociate::RewriteExprTree, is
that right?

> though I'm not sure
> what the best output order is.  For example in a+b+c+d+e+f+g+h with
> these being ever more complicated (i.e. ordered by increasing 'rank'
> as computed by reassociate), how should they be grouped?  Should it
> be
>    ((a+b)+(c+d))+((e+f)+(g+h))
> ?  That would put high rank elements together (eg g+h).  Or should it
> be something like
>    ((a+e)+(c+g))+((d+h)+(b+f))
> which tries to spread low rank around.  Or something else?

I'm not sure there is a best answer here, we'd probably want some kind
of relative-disbalance metric to decide. It probably also depends on
what else is going on (if there is other work to be scheduled
concurrently, then spreading the low rank around would probably be good
because it will leave plenty of interspersed spare cycles in which to
put other things). Does instruction scheduling also understand about
associative instructions? If it does, then the choice may not matter in
the same way.

If it is easy enough to choose, we could implement both and then
experiment.

Also, do we need to be careful about changing the default ordering as it
is used now because doing so might hurt CSE?

Thanks again,
Hal

> 
> Ciao, Duncan.

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory