RFC: Modeling horizontal vector reductions

Stephen Canon scanon at apple.com
Thu Sep 12 08:18:37 PDT 2013


On Sep 11, 2013, at 3:17 PM, Arnold Schwaighofer <aschwaighofer at apple.com> wrote:

> However, this form cannot be matched to the shortest sequence of instructions on a platform where we have pairwise vector fadds (haddps - intel, VPADD - arm, faddp - aarch64) because we don’t have fast-math instruction flags at the selection dag level and therefore cannot reassociate the tree:

The general thrust of this work is great.  I do want to point out that using HADDPS is a codesize win only for horizontal reductions; the fastest idiom is actually two shuffles + two adds (reciprocal throughput of 2 cycles vs. 4 for HADDPS).  So we really do want to have a means to generate both of these, not only the pairwise ops.

– Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130912/b94f9f65/attachment.html>


More information about the llvm-commits mailing list