[Patch] [SLPVectorizer] Vectorize Horizontal Reductions from Consecutive Loads

Fri Dec 12 05:03:44 PST 2014

Committed in r224119. 

Thanks a lot for the review.

- Suyog

------- Original Message -------
Sender : Suyog Kamal Sarda<suyog.sarda at samsung.com> Senior Software Engineer/SRI-Bangalore-TZN/Samsung Electronics
Date : Dec 12, 2014 21:38 (GMT+09:00)
Title : Re: Re: [Patch] [SLPVectorizer] Vectorize Horizontal Reductions from Consecutive Loads

Hi Nadav,

I ran LNT on x86 with 10 iterations and saw only one regression in performance in 
test case : MultiSource/Benchmarks/Prolangs-C++/fsm/fsm

However, this test case doesn't seem to be relevant to my vectorization patch 
(I checked the TC) and hence I am ignoring it and going ahead with the commit 
as suggested by you.

Attaching Screenshots of LNT results.

Regards,
Suyog

------- Original Message -------
Sender : suyog sarda
Date : Dec 12, 2014 04:10 (GMT+09:00)
Title : Re: [Patch] [SLPVectorizer] Vectorize Horizontal Reductions from Consecutive Loads

Hi Nadav,

Thanks for reviewing the patch. I will upload the performance results by tomorrow.

Just to be sure, you meant LNT test suite performance results, right?

On Thu, Dec 11, 2014 at 10:32 PM, Nadav Rotem wrote:

Hi Suyog,

The change looks good to me.  I think that it would be a good idea to run the LLVM test suite and check if there there are any performance regressions.

Thanks,
Nadav

> On Dec 11, 2014, at 4:38 AM, Suyog Kamal Sarda wrote:
>
> Hi All,
>
> This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
> and vectorizes it. Earlier as discussed in LLVM mail threads, we didn't vectorize such horizontal reductions.
>
> Test case :
>
>       float hadd(float* a) {
>           return (a[0] + a[1]) + (a[2] + a[3]);
>        }
>
>
> AArch64 assembly before patch :
>
>                 ldp   s0, s1, [x0]
>       ldp     s2, s3, [x0, #8]
>       fadd    s0, s0, s1
>       fadd    s1, s2, s3
>       fadd    s0, s0, s1
>       ret
>
> AArch64 assembly after patch :
>
>                 ldp   d0, d1, [x0]
>       fadd    v0.2s, v0.2s, v1.2s
>       faddp   s0, v0.2s
>       ret
>
> More work of recognizing (+(+(+ v0, v1) v2) v3) still remains. I will come up with this in another patch.
>
> Please help in reviewing the patch. No 'make-check' failures observed with this patch.
>
> (Would have preferred Phabricator, but its not working and hence sending via e-mail)
>
> Regards,

> Suyog 

_______________________________________________
llvm-commits mailing list
llvm-commits at cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- 

With regards,
Suyog Sarda