RFC: Modeling horizontal vector reductions

Renato Golin renato.golin at linaro.org
Thu Sep 12 06:52:07 PDT 2013


On 12 September 2013 01:17, Arnold Schwaighofer <aschwaighofer at apple.com>wrote:

> Say somebody really wrote:
>
> v0 = 7 * A[i];
> v1 = 7 * A[i+1];
> v2 = 7 * A[i+2];
> v3 = 7 * A[i+3];
> r += (v0 + v1) +  (v2 + v3);
>
> VS.
>
> v0 = 7 * A[i];
> v1 = 7 * A[i+1];
> v2 = 7 * A[i+2];
> v3 = 7 * A[i+3];
> r += (v0 + v2) +  (v1 + v3);
>
> In this case the order dictates which pattern to use. It is just in the
> fast-math case that the order does not matter.
>

Arnold,

When you compared the two IR pieces in your original email I thought that
the second was only more explicit, but they should generate the same
machine code in the end, ie. the back-end should see that the instruction
is redundant and not even emit it (or remove it afterwards).

Your example above reinforces it, since the user could write a number of
combinations, all of them correct, only some of them redundant, and it'd be
a shame to not vectorize most patterns just because they're assumed to be
free-or-nothing in the vectorizer.

So, maybe there could be a DCE pass that would look at "shuffle vec, <0,1,
undef, undef>" and know that it's free, and don't emit anything, just alias
the register, no?

Does any of that makes sense?

cheers,
--renato
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130912/d7b16b84/attachment.html>


More information about the llvm-commits mailing list