[PATCH] Extend SLPVectorizer for cases where insertelement instructions must be rescheduled
Nadav Rotem
nrotem at apple.com
Sun Mar 23 14:09:30 PDT 2014
Hi Arch,
Thanks for working on this. It is not clear to me what is the difference between the InsertElement and the IgnoreScheduling list that is used by reduction variables. The InsertElements that we care about build a vector, which is a kind of a reduction. Is there anything unique about the build vector, or is it just another reduction?
Also, please split the cost-model fixes and the typo fix into separate patches.
Thanks,
Nadav
On Mar 21, 2014, at 3:02 PM, Arch D. Robison <arch.robison at intel.com> wrote:
>
> Yes, it vectorizes Matt Arsenault's example. Here's a transcript:
>
> $ opt -S -slp-vectorizer /tmp/ex.ll
> ; ModuleID = '/tmp/ex.ll'
> target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"
> target triple = "x86_64-apple-macosx10.8.0"
>
> ; Function Attrs: nounwind readnone ssp uwtable
> define <4 x float> @simple_select(<4 x float> %a, <4 x float> %b, <4 x i32> %c) #0 {
> entry:
> %0 = icmp ne <4 x i32> %c, zeroinitializer
> %1 = select <4 x i1> %0, <4 x float> %a, <4 x float> %b
> %2 = extractelement <4 x float> %1, i32 0
> %3 = insertelement <4 x float> undef, float %2, i32 0
> %4 = extractelement <4 x float> %1, i32 1
> %5 = insertelement <4 x float> %3, float %4, i32 1
> %6 = extractelement <4 x float> %1, i32 2
> %7 = insertelement <4 x float> %5, float %6, i32 2
> %8 = extractelement <4 x float> %1, i32 3
> %9 = insertelement <4 x float> %7, float %8, i32 3
> ret <4 x float> %9
> }
>
> http://llvm-reviews.chandlerc.com/D3143
More information about the llvm-commits
mailing list