<html>

    <head>

      <base href="https://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Inline asm expr argument order affects optimization"

   href="https://llvm.org/bugs/show_bug.cgi?id=25389">25389</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Inline asm expr argument order affects optimization

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>new-bugs

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>3.7

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>new bugs

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>irony42@me.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>I've encountered a case where llvm makes an incorrect (or at least

inconsistent) optimization choice.

The following two llvmir functions are functionally equivalent, and textually

almost identical. They differ only on their second-to-last line, where the

argument order to the asm statement is slightly different. However, when

compiled to x86 using the commands below, the first produces significantly

better code than the second.

Function A: Optimized well, compiles to "mov ecx, dword ptr [ecx]; add ecx,

11223344; ret" 

echo 'define void @main() naked minsize noinline nounwind {

%regs = tail call { i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "",

"={eax},={ecx},={edx},={ebx},={esi},={edi},={ebp}"()

%eax = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 0

%ecx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 1

%edx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 2

%ebx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 3

%esi = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 4

%edi = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 5

%ebp = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 6

%x.ptr = inttoptr i32 %ecx to i32*

%x = load i32, i32* %x.ptr

%y = add i32 %x, 11223344

tail call void asm sideeffect "",

"{eax},{ecx},{edx},{ebx},{esi},{edi},{ebp}"(i32 %eax, i32 %y, i32 %edx, i32

%ebx, i32 %esi, i32 %edi, i32 %ebp)

ret void

}' | llvm-as | opt -Os | llc -march=x86 -mcpu=core2 -mattr=-rdrnd -O2

-filetype=obj -x86-asm-syntax=intel -enable-pie -relocation-model=pic - -o - |

llvm-objdump -disassemble -x86-asm-syntax=intel -

Function B: Not optimized well. It should be the same as Function A, but the

resulting code unnecessarily spills to stack: "mov dword ptr [esp], eax; mov

eax, 11223344; add eax, dword ptr [ecx]; mov ecx, dword ptr [esp]; ret" 

echo 'define void @main() naked minsize noinline nounwind {

%regs = tail call { i32, i32, i32, i32, i32, i32, i32 } asm sideeffect "",

"={eax},={ecx},={edx},={ebx},={esi},={edi},={ebp}"()

%eax = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 0

%ecx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 1

%edx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 2

%ebx = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 3

%esi = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 4

%edi = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 5

%ebp = extractvalue { i32, i32, i32, i32, i32, i32, i32 } %regs, 6

%x.ptr = inttoptr i32 %ecx to i32*

%x = load i32, i32* %x.ptr

%y = add i32 %x, 11223344

tail call void asm sideeffect "",

"{ecx},{eax},{edx},{ebx},{esi},{edi},{ebp}"(i32 %y, i32 %eax, i32 %edx, i32

%ebx, i32 %esi, i32 %edi, i32 %ebp)

ret void

}' | llvm-as | opt -Os | llc -march=x86 -mcpu=core2 -mattr=-rdrnd -O2

-filetype=obj -x86-asm-syntax=intel -enable-pie -relocation-model=pic - -o - |

llvm-objdump -disassemble -x86-asm-syntax=intel -

I believe that the above demonstrates a bug (probably in some optimization pass

within the x86 backend) wherein certain optimizations fail to get applied

depending on the order of arguments to an asm expression.

These test cases can be reduced by removing edx thru ebp (leaving only eax and

ecx); in that reduced case, Function A stays the same whereas Function B uses

an extra scratch register. I did not reduce them to such because I thought

unnecessarily spilling to stack was more evident of a bug than merely using an

extra register.

I'm using llvm 3.7, although with s/load i32,/load/ it should reproduce on at

least 3.5 and 3.6 as well.</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>