[PATCH] Extend SLPVectorizer for cases where insertelement instructions must be rescheduled

Sun Mar 23 14:09:30 PDT 2014

Hi Arch, 

Thanks for working on this. It is not clear to me what is the difference between the InsertElement and the IgnoreScheduling list that is used by reduction variables. The InsertElements that we care about build a vector, which is a kind of a reduction. Is there anything unique about the build vector, or is it just another reduction?

Also, please split the cost-model fixes and the typo fix into separate patches. 

Thanks,
Nadav

On Mar 21, 2014, at 3:02 PM, Arch D. Robison <arch.robison at intel.com> wrote:

> 
>  Yes, it vectorizes Matt Arsenault's example.  Here's a transcript:
> 
>    $ opt -S -slp-vectorizer /tmp/ex.ll
>    ; ModuleID = '/tmp/ex.ll'
>    target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
>    target triple = "x86_64-apple-macosx10.8.0"
> 
>    ; Function Attrs: nounwind readnone ssp uwtable
>    define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
>    entry:
>      %0 = icmp ne <4 x i32> %c, zeroinitializer
>      %1 = select <4 x i1> %0, <4 x float> %a, <4 x float> %b
>      %2 = extractelement <4 x float> %1, i32 0
>      %3 = insertelement <4 x float> undef, float %2, i32 0
>      %4 = extractelement <4 x float> %1, i32 1
>      %5 = insertelement <4 x float> %3, float %4, i32 1
>      %6 = extractelement <4 x float> %1, i32 2
>      %7 = insertelement <4 x float> %5, float %6, i32 2
>      %8 = extractelement <4 x float> %1, i32 3
>      %9 = insertelement <4 x float> %7, float %8, i32 3
>      ret <4 x float> %9
>    }
> 
> http://llvm-reviews.chandlerc.com/D3143