[Patch] [SLPVectorizer] Vectorize Horizontal Reductions from Consecutive Loads
Nadav Rotem
nrotem at apple.com
Thu Dec 11 09:02:10 PST 2014
Hi Suyog,
The change looks good to me. I think that it would be a good idea to run the LLVM test suite and check if there there are any performance regressions.
Thanks,
Nadav
> On Dec 11, 2014, at 4:38 AM, Suyog Kamal Sarda <suyog.sarda at samsung.com> wrote:
>
> Hi All,
>
> This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
> and vectorizes it. Earlier as discussed in LLVM mail threads, we didn't vectorize such horizontal reductions.
>
> Test case :
>
> float hadd(float* a) {
> return (a[0] + a[1]) + (a[2] + a[3]);
> }
>
>
> AArch64 assembly before patch :
>
> ldp s0, s1, [x0]
> ldp s2, s3, [x0, #8]
> fadd s0, s0, s1
> fadd s1, s2, s3
> fadd s0, s0, s1
> ret
>
> AArch64 assembly after patch :
>
> ldp d0, d1, [x0]
> fadd v0.2s, v0.2s, v1.2s
> faddp s0, v0.2s
> ret
>
> More work of recognizing (+(+(+ v0, v1) v2) v3) still remains. I will come up with this in another patch.
>
> Please help in reviewing the patch. No 'make-check' failures observed with this patch.
>
> (Would have preferred Phabricator, but its not working and hence sending via e-mail)
>
> Regards,
> Suyog <SLP1.patch>
More information about the llvm-commits
mailing list