<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Missed optimization: inefficient codegen for __builtin_addc"

   href="https://bugs.llvm.org/show_bug.cgi?id=36243">36243</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Missed optimization: inefficient codegen for __builtin_addc

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Scalar Optimizations

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>koriakin@0x04.net

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>clang has __builtin_addc* functions, which are supposed to emit hardware

add-with-carry instructions.  However, there is no corresponding intrinsic on

LLVM side, so clang emits a sequence of instructions that is only recognized

and folded to a single hw instruction in two cases:

- carry input is 0, or

- carry output is unused

This means that any carry chains longer than 2 result in inefficient code:

void add3(

    unsigned long long *restrict a,

    unsigned long long *restrict b,

    unsigned long long *restrict c

) {

    unsigned long long cf = 0;

    c[0] = __builtin_addcll(a[0], b[0], cf, &cf);

    c[1] = __builtin_addcll(a[1], b[1], cf, &cf);

    c[2] = __builtin_addcll(a[2], b[2], cf, &cf);

}

Compiles to:

add3:                                   # @add3

        .cfi_startproc

# BB#0:

        movq    (%rdi), %rax

        movq    (%rsi), %r8

        leaq    (%rax,%r8), %rcx

        movq    %rcx, (%rdx)

        movq    8(%rdi), %rcx

        addq    8(%rsi), %rcx

        setb    %r9b

        addq    %r8, %rax

        adcq    $0, %rcx

        setb    %al

        orb     %r9b, %al

        movzbl  %al, %eax

        movq    %rcx, 8(%rdx)

        movq    16(%rsi), %rcx

        addq    16(%rdi), %rcx

        addq    %rax, %rcx

        movq    %rcx, 16(%rdx)

        retq

I suppose we're going to need a new target-independent generic intrinsic,

say { iX, i1 } @llvm.uadd.with.overflow.carry.iX(iX, iX, i1) (and a

corresponding one for subtraction as well) and map it to ISD::ADDE /

ISD::ADDCARRY.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>