[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?
Arnold Schwaighofer
aschwaighofer at apple.com
Thu Oct 16 15:04:05 PDT 2014
I quickly took a look again. Instcombine is removing the fast-math flag on the reduction so the vectorizer does not touch it.
bin/clang -O3 -ffast-math -mllvm -extra-vectorizer-passes aobench.cpp -emit-llvm -S -o aobench.2.ll -mllvm -debug-only=loop-vectorize -mllvm -print-before-all -mllvm -print-after-all 2> aobench.2.debug.ll
*** IR Dump Before Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 {
entry:
...
br label %for.body
for.body: ; preds = %for.inc.for.body_crit_edge, %entry
%8 = phi float [ %conv.i, %entry ], [ %.pre, %for.inc.for.body_crit_edge ]
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ]
%occlusion.017 = phi float [ 0.000000e+00, %entry ], [ %conv4, %for.inc.for.body_crit_edge ]
%arrayidx = getelementptr inbounds [8 x float]* %rand1, i64 0, i64 %indvars.iv
%9 = load float* %arrayidx, align 4, !tbaa !5
%mul3 = fmul fast float %8, %9
%mul.i.i = fmul fast float %mul3, %mul3
%cmp.i = fcmp olt float %mul.i.i, 0x3C670EF540000000
%conv4 = fadd fast float %occlusion.017, 1.000000e+00 <============= FAST
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%exitcond = icmp eq i64 %indvars.iv, 63
br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
for.inc.for.body_crit_edge: ; preds = %for.body
%arrayidx2.phi.trans.insert = getelementptr inbounds [8 x float]* %rand2, i64 0, i64 %indvars.iv.next
%.pre = load float* %arrayidx2.phi.trans.insert, align 4, !tbaa !5
br label %for.body
for.end: ; preds = %for.body
%t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
store float %conv4, float* %t5, align 4, !tbaa !7
ret void
}
*** IR Dump After Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 {
entry:
br label %for.body
for.body: ; preds = %for.inc.for.body_crit_edge, %entry
%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ]
%occlusion.017 = phi float [ 1.000000e+00, %entry ], [ %phitmp, %for.inc.for.body_crit_edge ]
%exitcond = icmp eq i64 %indvars.iv, 63
br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
for.inc.for.body_crit_edge: ; preds = %for.body
%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
%phitmp = fadd float %occlusion.017, 1.000000e+00 <=============== NOT FAST
br label %for.body
for.end: ; preds = %for.body
%t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
store float %occlusion.017, float* %t5, align 4, !tbaa !1
ret void
}
> On Oct 16, 2014, at 11:52 AM, Chandler Carruth <chandlerc at google.com> wrote:
>
>
> On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> He had posted one earlier here:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html
> (Arnold had posted some analysis here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html - which I imagine you saw)
>
> Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a complete working test case wasn't available which is why Arnold expected loop-rotate to fix this when it didn't. My bad.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
More information about the llvm-dev
mailing list