[Patch] [SLPVectorizer] Vectorize Horizontal Reductions from Consecutive Loads

Thu Dec 11 09:02:10 PST 2014

Hi Suyog, 

The change looks good to me.  I think that it would be a good idea to run the LLVM test suite and check if there there are any performance regressions.   

Thanks,
Nadav

> On Dec 11, 2014, at 4:38 AM, Suyog Kamal Sarda <suyog.sarda at samsung.com> wrote:
> 
> Hi All,
> 
> This patch recognizes (+ (+ v0, v1) (+ v2, v3)), reorders them for bundling into vector of loads,
> and vectorizes it. Earlier as discussed in LLVM mail threads, we didn't vectorize such horizontal reductions.
> 
> Test case :
> 
>       float hadd(float* a) {
>           return (a[0] + a[1]) + (a[2] + a[3]);
>        }
> 
> 
> AArch64 assembly before patch :
> 
>                 ldp	s0, s1, [x0]
> 	ldp	s2, s3, [x0, #8]
> 	fadd	s0, s0, s1
> 	fadd	s1, s2, s3
> 	fadd	s0, s0, s1
> 	ret
> 
> AArch64 assembly after patch :
> 
>                 ldp	d0, d1, [x0]
> 	fadd	v0.2s, v0.2s, v1.2s
> 	faddp	s0, v0.2s
> 	ret
> 
> More work of recognizing (+(+(+ v0, v1) v2) v3) still remains. I will come up with this in another patch.
> 
> Please help in reviewing the patch. No 'make-check' failures observed with this patch.
> 
> (Would have preferred Phabricator, but its not working and hence sending via e-mail)
> 
> Regards,
> Suyog <SLP1.patch>