<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - clang/llvm 3.3 produces much slower loops than gcc 4.7.2"

   href="http://llvm.org/bugs/show_bug.cgi?id=16358">16358</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>clang/llvm 3.3 produces much slower loops than gcc 4.7.2

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>clang

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>LLVM Codegen

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedclangbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>conradsand.arma@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=10696" name="attach_10696" title="addspeed.cpp">attachment 10696</a> <a href="attachment.cgi?id=10696&action=edit" title="addspeed.cpp">[details]</a></span>

addspeed.cpp

When using the Armadillo template matrix library (<a href="http://arma.sourceforge.net">http://arma.sourceforge.net</a>),

gcc 4.7.2 consistently produces faster code than clang/llvm 3.3.

I've attached a simple program which demonstrates the problem. Below is a 

relevant extract:

// 'size' is specified on the command line

mat A; A.randu(size,size);

mat B; B.randu(size,size);

mat C; C.zeros(size,size);

C = 0.1*A + 0.2*B;

The inner loop stemming from line "C = 0.1*A + 0.2*B" is converted by gcc 4.7.2

(using -O3) to the following x86-64 assembly code:

.L132:

    movapd    (%rcx,%rax), %xmm0

    addl    $1, %esi

    movapd    (%rdi,%rax), %xmm3

    mulpd    %xmm2, %xmm0

    mulpd    %xmm1, %xmm3

    addpd    %xmm3, %xmm0

    movapd    %xmm0, (%rdx,%rax)

    addq    $16, %rax

    cmpl    %r9d, %esi

    jb    .L132

In contrast, clang/llvm 3.3 (also using -O3) converts the inner loop to this

code:

.LBB0_45:

    leal    -1(%rcx), %ebp

    movsd    .LCPI0_1(%rip), %xmm0

    movsd    (%rsi,%rbp,8), %xmm3

    mulsd    %xmm0, %xmm3

    movsd    .LCPI0_2(%rip), %xmm1

    movsd    (%rdi,%rbp,8), %xmm2

    mulsd    %xmm1, %xmm2

    addsd    %xmm3, %xmm2

    movl    %ecx, %ecx

    mulsd    (%rsi,%rcx,8), %xmm0

    mulsd    (%rdi,%rcx,8), %xmm1

    movsd    %xmm2, (%rax,%rbp,8)

    addsd    %xmm0, %xmm1

    movsd    %xmm1, (%rax,%rcx,8)

    addl    $2, %ecx

    cmpl    %edx, %ecx

    jb    .LBB0_45

Even when using -O2, gcc still produces a more efficient loop.

Below are timing results for running the attached program on an Intel Core 2

Duo E8600 @ 3.33GHz, when compiled using various optimisation flags.  Each

instance was run via:

time ./addspeed 50 2000000

where 50 specifies the matrix size and 2000000 is the number of repetitions.

gcc 4.7.2

-Os: 6.265u

-O1: 6.275u

-O2: 5.134u

-O3: 4.282u

clang 3.3

-Os: 6.299u

-O1: 9.030u

-O2: 6.347u   

-O3: 6.295u</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>