<html>
<head>
<base href="https://llvm.org/bugs/" />
</head>
<body><table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>Bug ID</th>
<td><a class="bz_bug_link
bz_status_NEW " title="NEW --- - pow calls are not vectorised on Windows" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D23645&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=j68keydb1RnQ4lPNc_gpiVKqiyV2tgLKhnnuFNVnuDs&e=">23645</a>
</td>
</tr>
<tr>
<th>Summary</th>
<td>pow calls are not vectorised on Windows
</td>
</tr>
<tr>
<th>Product</th>
<td>new-bugs
</td>
</tr>
<tr>
<th>Version</th>
<td>3.6
</td>
</tr>
<tr>
<th>Hardware</th>
<td>PC
</td>
</tr>
<tr>
<th>OS</th>
<td>Windows NT
</td>
</tr>
<tr>
<th>Status</th>
<td>NEW
</td>
</tr>
<tr>
<th>Severity</th>
<td>normal
</td>
</tr>
<tr>
<th>Priority</th>
<td>P
</td>
</tr>
<tr>
<th>Component</th>
<td>new bugs
</td>
</tr>
<tr>
<th>Assignee</th>
<td>unassignedbugs@nondot.org
</td>
</tr>
<tr>
<th>Reporter</th>
<td>nick@indigorenderer.com
</td>
</tr>
<tr>
<th>CC</th>
<td>llvmbugs@cs.uiuc.edu
</td>
</tr>
<tr>
<th>Classification</th>
<td>Unclassified
</td>
</tr></table>
<p>
<div>
<pre>In the Auto-vectorization doc it seems to be claimed that pow() calls will be
vectorised:
<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_Vectorizers.html-23vectorization-2Dof-2Dfunction-2Dcalls&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=jDXZCu7SFm_dsv9al0ILHkdSSqW0JOgLY1mZ7VaYnx4&e=">http://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls</a>
However, some pow calls I'm making in a loop aren't being vectorised.
Platform: Windows 8 64 bit, host compiler VS2012. LLVM 3.6. I'm JITing LLVM
code.
Optimised IR:
-----------------------------------------------------------------
; Function Attrs: nounwind
define internal void @work_function([268435456 x float]* noalias nocapture
align 32, [268435456 x float]* noalias nocapture readonly align 32, float
(float)* nocapture readnone, i64, i64) #0 {
entry:
%backedge.overflow = icmp eq i64 %4, 0
br i1 %backedge.overflow, label %loop, label %overflow.checked
overflow.checked: ; preds = %entry
%n.vec = and i64 %4, -4
%cmp.zero = icmp eq i64 %n.vec, 0
br i1 %cmp.zero, label %middle.block, label %vector.body
vector.body: ; preds = %overflow.checked,
%vector.body
%index = phi i64 [ %index.next, %vector.body ], [ 0, %overflow.checked ]
%induction14 = or i64 %index, 1
%induction25 = or i64 %index, 2
%induction36 = or i64 %index, 3
%5 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %index
%6 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction14
%7 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction25
%8 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction36
%9 = load float* %5, align 16
%10 = load float* %6, align 4
%11 = load float* %7, align 8
%12 = load float* %8, align 4
%13 = call float @llvm.pow.f32(float %9, float 0x40019999A0000000)
%14 = call float @llvm.pow.f32(float %10, float 0x40019999A0000000)
%15 = call float @llvm.pow.f32(float %11, float 0x40019999A0000000)
%16 = call float @llvm.pow.f32(float %12, float 0x40019999A0000000)
%17 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %index
%18 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction14
%19 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction25
%20 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction36
store float %13, float* %17, align 16
store float %14, float* %18, align 4
store float %15, float* %19, align 8
store float %16, float* %20, align 4
%index.next = add i64 %index, 4
%21 = icmp eq i64 %index.next, %n.vec
br i1 %21, label %middle.block, label %vector.body, !llvm.loop !0
middle.block: ; preds = %vector.body,
%overflow.checked
%resume.val = phi i64 [ 0, %overflow.checked ], [ %n.vec, %vector.body ]
%cmp.n = icmp eq i64 %resume.val, %4
br i1 %cmp.n, label %afterloop, label %loop
loop: ; preds = %entry,
%middle.block, %loop
%loop_index_var = phi i64 [ %next_var, %loop ], [ 0, %entry ], [ %resume.val,
%middle.block ]
%22 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64
%loop_index_var
%23 = load float* %22, align 4
%24 = tail call float @llvm.pow.f32(float %23, float 0x40019999A0000000) #0
%25 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64
%loop_index_var
store float %24, float* %25, align 4
%next_var = add i64 %loop_index_var, 1
%loopcond = icmp eq i64 %next_var, %4
br i1 %loopcond, label %afterloop, label %loop, !llvm.loop !3
afterloop: ; preds = %loop,
%middle.block
ret void
}
----------------------------------------------------------------
Autovectorisation is enabled.
A similar loop with calls to e.g. the sqrt intrinsic generate the expected
vectorised sqrt instructions.
d0k says on IRC: "calling the vc++ implementation would be the right way, but
that's not implemented"
VC++ has a vectorised pow implementation: __vdecl_powf4, which calls
__sse2_powf4, see also <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_dt5dakze.aspx&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=qiKFIo4JoNANLrp_2mXNfAUFyH9FnEh58EQwoWgjdro&e=">https://msdn.microsoft.com/en-us/library/dt5dakze.aspx</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>