[LLVMdev] RFC: Should we have (something like) -extra-vectorizer-passes in -O2?

Arnold Schwaighofer aschwaighofer at apple.com
Thu Oct 16 15:04:05 PDT 2014


I quickly took a look again. Instcombine is removing the fast-math flag on the reduction so the vectorizer does not touch it.

bin/clang -O3 -ffast-math -mllvm -extra-vectorizer-passes aobench.cpp -emit-llvm -S -o aobench.2.ll -mllvm -debug-only=loop-vectorize -mllvm -print-before-all -mllvm -print-after-all 2> aobench.2.debug.ll

*** IR Dump Before Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 {
entry:
  ...
  br label %for.body

for.body:                                         ; preds = %for.inc.for.body_crit_edge, %entry
  %8 = phi float [ %conv.i, %entry ], [ %.pre, %for.inc.for.body_crit_edge ]
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ]
  %occlusion.017 = phi float [ 0.000000e+00, %entry ], [ %conv4, %for.inc.for.body_crit_edge ]
  %arrayidx = getelementptr inbounds [8 x float]* %rand1, i64 0, i64 %indvars.iv
  %9 = load float* %arrayidx, align 4, !tbaa !5
  %mul3 = fmul fast float %8, %9
  %mul.i.i = fmul fast float %mul3, %mul3
  %cmp.i = fcmp olt float %mul.i.i, 0x3C670EF540000000
  %conv4 = fadd fast float %occlusion.017, 1.000000e+00  <============= FAST
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %exitcond = icmp eq i64 %indvars.iv, 63
  br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge
for.inc.for.body_crit_edge:                       ; preds = %for.body
  %arrayidx2.phi.trans.insert = getelementptr inbounds [8 x float]* %rand2, i64 0, i64 %indvars.iv.next
  %.pre = load float* %arrayidx2.phi.trans.insert, align 4, !tbaa !5
  br label %for.body

for.end:                                          ; preds = %for.body
  %t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
  store float %conv4, float* %t5, align 4, !tbaa !7
  ret void
}

*** IR Dump After Combine redundant instructions ***
; Function Attrs: noinline nounwind ssp uwtable
define void @_Z21ambient_occlusion_vecP6_IsectR5vrandILm8EE(%struct._Isect* nocapture %isect, %class.vrand* nocapture readonly dereferenceable(32) %rng) #0 {
entry:
  br label %for.body

for.body:                                         ; preds = %for.inc.for.body_crit_edge, %entry
  %indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.inc.for.body_crit_edge ]
  %occlusion.017 = phi float [ 1.000000e+00, %entry ], [ %phitmp, %for.inc.for.body_crit_edge ]
  %exitcond = icmp eq i64 %indvars.iv, 63
  br i1 %exitcond, label %for.end, label %for.inc.for.body_crit_edge

for.inc.for.body_crit_edge:                       ; preds = %for.body
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
  %phitmp = fadd float %occlusion.017, 1.000000e+00 <=============== NOT FAST
  br label %for.body

for.end:                                          ; preds = %for.body
  %t5 = getelementptr inbounds %struct._Isect* %isect, i64 0, i32 0
  store float %occlusion.017, float* %t5, align 4, !tbaa !1
  ret void
}


> On Oct 16, 2014, at 11:52 AM, Chandler Carruth <chandlerc at google.com> wrote:
> 
> 
> On Thu, Oct 16, 2014 at 11:38 AM, Hal Finkel <hfinkel at anl.gov> wrote:
> He had posted one earlier here:
> http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/238660.html
> (Arnold had posted some analysis here: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20141006/239144.html - which I imagine you saw)
> 
> Doh, sorry. I saw Arnold's analysis but (wrongly) assumed that a complete working test case wasn't available which is why Arnold expected loop-rotate to fix this when it didn't. My bad.
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev





More information about the llvm-dev mailing list