[LLVMbugs] [Bug 23645] New: pow calls are not vectorised on Windows
bugzilla-daemon at llvm.org
bugzilla-daemon at llvm.org
Sun May 24 13:06:51 PDT 2015
https://llvm.org/bugs/show_bug.cgi?id=23645
Bug ID: 23645
Summary: pow calls are not vectorised on Windows
Product: new-bugs
Version: 3.6
Hardware: PC
OS: Windows NT
Status: NEW
Severity: normal
Priority: P
Component: new bugs
Assignee: unassignedbugs at nondot.org
Reporter: nick at indigorenderer.com
CC: llvmbugs at cs.uiuc.edu
Classification: Unclassified
In the Auto-vectorization doc it seems to be claimed that pow() calls will be
vectorised:
http://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls
However, some pow calls I'm making in a loop aren't being vectorised.
Platform: Windows 8 64 bit, host compiler VS2012. LLVM 3.6. I'm JITing LLVM
code.
Optimised IR:
-----------------------------------------------------------------
; Function Attrs: nounwind
define internal void @work_function([268435456 x float]* noalias nocapture
align 32, [268435456 x float]* noalias nocapture readonly align 32, float
(float)* nocapture readnone, i64, i64) #0 {
entry:
%backedge.overflow = icmp eq i64 %4, 0
br i1 %backedge.overflow, label %loop, label %overflow.checked
overflow.checked: ; preds = %entry
%n.vec = and i64 %4, -4
%cmp.zero = icmp eq i64 %n.vec, 0
br i1 %cmp.zero, label %middle.block, label %vector.body
vector.body: ; preds = %overflow.checked,
%vector.body
%index = phi i64 [ %index.next, %vector.body ], [ 0, %overflow.checked ]
%induction14 = or i64 %index, 1
%induction25 = or i64 %index, 2
%induction36 = or i64 %index, 3
%5 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %index
%6 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction14
%7 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction25
%8 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction36
%9 = load float* %5, align 16
%10 = load float* %6, align 4
%11 = load float* %7, align 8
%12 = load float* %8, align 4
%13 = call float @llvm.pow.f32(float %9, float 0x40019999A0000000)
%14 = call float @llvm.pow.f32(float %10, float 0x40019999A0000000)
%15 = call float @llvm.pow.f32(float %11, float 0x40019999A0000000)
%16 = call float @llvm.pow.f32(float %12, float 0x40019999A0000000)
%17 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %index
%18 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction14
%19 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction25
%20 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction36
store float %13, float* %17, align 16
store float %14, float* %18, align 4
store float %15, float* %19, align 8
store float %16, float* %20, align 4
%index.next = add i64 %index, 4
%21 = icmp eq i64 %index.next, %n.vec
br i1 %21, label %middle.block, label %vector.body, !llvm.loop !0
middle.block: ; preds = %vector.body,
%overflow.checked
%resume.val = phi i64 [ 0, %overflow.checked ], [ %n.vec, %vector.body ]
%cmp.n = icmp eq i64 %resume.val, %4
br i1 %cmp.n, label %afterloop, label %loop
loop: ; preds = %entry,
%middle.block, %loop
%loop_index_var = phi i64 [ %next_var, %loop ], [ 0, %entry ], [ %resume.val,
%middle.block ]
%22 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64
%loop_index_var
%23 = load float* %22, align 4
%24 = tail call float @llvm.pow.f32(float %23, float 0x40019999A0000000) #0
%25 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64
%loop_index_var
store float %24, float* %25, align 4
%next_var = add i64 %loop_index_var, 1
%loopcond = icmp eq i64 %next_var, %4
br i1 %loopcond, label %afterloop, label %loop, !llvm.loop !3
afterloop: ; preds = %loop,
%middle.block
ret void
}
----------------------------------------------------------------
Autovectorisation is enabled.
A similar loop with calls to e.g. the sqrt intrinsic generate the expected
vectorised sqrt instructions.
d0k says on IRC: "calling the vc++ implementation would be the right way, but
that's not implemented"
VC++ has a vectorised pow implementation: __vdecl_powf4, which calls
__sse2_powf4, see also https://msdn.microsoft.com/en-us/library/dt5dakze.aspx
--
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20150524/256c2db3/attachment.html>
More information about the llvm-bugs
mailing list