<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW " title="NEW --- - pow calls are not vectorised on Windows" href="https://urldefense.proofpoint.com/v2/url?u=https-3A__llvm.org_bugs_show-5Fbug.cgi-3Fid-3D23645&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=j68keydb1RnQ4lPNc_gpiVKqiyV2tgLKhnnuFNVnuDs&e=">23645</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>pow calls are not vectorised on Windows

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>3.6

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Windows NT

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>nick@indigorenderer.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>In the Auto-vectorization doc it seems to be claimed that pow() calls will be

vectorised:

<a href="https://urldefense.proofpoint.com/v2/url?u=http-3A__llvm.org_docs_Vectorizers.html-23vectorization-2Dof-2Dfunction-2Dcalls&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=jDXZCu7SFm_dsv9al0ILHkdSSqW0JOgLY1mZ7VaYnx4&e=">http://llvm.org/docs/Vectorizers.html#vectorization-of-function-calls</a>

However, some pow calls I'm making in a loop aren't being vectorised.

Platform: Windows 8 64 bit, host compiler VS2012. LLVM 3.6.  I'm JITing LLVM

code.

Optimised IR:

-----------------------------------------------------------------

; Function Attrs: nounwind

define internal void @work_function([268435456 x float]* noalias nocapture

align 32, [268435456 x float]* noalias nocapture readonly align 32, float

(float)* nocapture readnone, i64, i64) #0 {

entry:

  %backedge.overflow = icmp eq i64 %4, 0

  br i1 %backedge.overflow, label %loop, label %overflow.checked

overflow.checked:                                 ; preds = %entry

  %n.vec = and i64 %4, -4

  %cmp.zero = icmp eq i64 %n.vec, 0

  br i1 %cmp.zero, label %middle.block, label %vector.body

vector.body:                                      ; preds = %overflow.checked,

%vector.body

  %index = phi i64 [ %index.next, %vector.body ], [ 0, %overflow.checked ]

  %induction14 = or i64 %index, 1

  %induction25 = or i64 %index, 2

  %induction36 = or i64 %index, 3

  %5 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %index

  %6 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction14

  %7 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction25

  %8 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64 %induction36

  %9 = load float* %5, align 16

  %10 = load float* %6, align 4

  %11 = load float* %7, align 8

  %12 = load float* %8, align 4

  %13 = call float @llvm.pow.f32(float %9, float 0x40019999A0000000)

  %14 = call float @llvm.pow.f32(float %10, float 0x40019999A0000000)

  %15 = call float @llvm.pow.f32(float %11, float 0x40019999A0000000)

  %16 = call float @llvm.pow.f32(float %12, float 0x40019999A0000000)

  %17 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %index

  %18 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction14

  %19 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction25

  %20 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64 %induction36

  store float %13, float* %17, align 16

  store float %14, float* %18, align 4

  store float %15, float* %19, align 8

  store float %16, float* %20, align 4

  %index.next = add i64 %index, 4

  %21 = icmp eq i64 %index.next, %n.vec

  br i1 %21, label %middle.block, label %vector.body, !llvm.loop !0

middle.block:                                     ; preds = %vector.body,

%overflow.checked

  %resume.val = phi i64 [ 0, %overflow.checked ], [ %n.vec, %vector.body ]

  %cmp.n = icmp eq i64 %resume.val, %4

  br i1 %cmp.n, label %afterloop, label %loop

loop:                                             ; preds = %entry,

%middle.block, %loop

  %loop_index_var = phi i64 [ %next_var, %loop ], [ 0, %entry ], [ %resume.val,

%middle.block ]

  %22 = getelementptr inbounds [268435456 x float]* %1, i64 0, i64

%loop_index_var

  %23 = load float* %22, align 4

  %24 = tail call float @llvm.pow.f32(float %23, float 0x40019999A0000000) #0

  %25 = getelementptr inbounds [268435456 x float]* %0, i64 0, i64

%loop_index_var

  store float %24, float* %25, align 4

  %next_var = add i64 %loop_index_var, 1

  %loopcond = icmp eq i64 %next_var, %4

  br i1 %loopcond, label %afterloop, label %loop, !llvm.loop !3

afterloop:                                        ; preds = %loop,

%middle.block

  ret void

}

----------------------------------------------------------------

Autovectorisation is enabled.

A similar loop with calls to e.g. the sqrt intrinsic generate the expected

vectorised sqrt instructions.

d0k says on IRC: "calling the vc++ implementation would be the right way, but

that's not implemented"

VC++ has a vectorised pow implementation: __vdecl_powf4, which calls

__sse2_powf4, see also <a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__msdn.microsoft.com_en-2Dus_library_dt5dakze.aspx&d=AwMBaQ&c=8hUWFZcy2Z-Za5rBPlktOQ&r=pF93YEPyB-J_PERP4DUZOJDzFVX5ZQ57vQk33wu0vio&m=w4rNV7JMM8clnDZJQQqVUdegxdPN5EBgBnpViFIsow4&s=qiKFIo4JoNANLrp_2mXNfAUFyH9FnEh58EQwoWgjdro&e=">https://msdn.microsoft.com/en-us/library/dt5dakze.aspx</a></pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>