<html>

    <head>

      <base href="http://llvm.org/bugs/" />

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - Misoptimization of combining movl+addl int leal"

   href="http://llvm.org/bugs/show_bug.cgi?id=20776">20776</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Misoptimization of combining movl+addl int leal

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>normal

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Backend: X86

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>wujingyue@gmail.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvmbugs@cs.uiuc.edu

          </td>

        </tr>

        <tr>

          <th>Classification</th>

          <td>Unclassified

          </td>

        </tr></table>

      <p>

        <div>

        <pre>The symptom is similar to <a class="bz_bug_link 

          bz_status_NEW "

   title="NEW --- - TwoAddressInstructionPass fails to optimize mov+add to lea"

   href="show_bug.cgi?id=20701">http://llvm.org/bugs/show_bug.cgi?id=20701</a> where the

backend misses the opportunity of combining addl+movl into leal. However, I

feel the root cause is different, so I decided to file a separate bug. 

The test case is reduced from loop-strength-reduce8.ll

@G = external global float*                                                     

declare i32* @foo()                                                             

define i32* @bar(i1 %cond) {                                                    

entry:                                                                          

  %v0 = call i32* @foo() ; v0 = eax                                             

  %v1 = bitcast i32* %v0 to float* ; v1 = v0                                    

  br i1 %cond, label %then, label %merge                                        

then:                                                                           

  %v2 = getelementptr float* %v1, i64 16                                        

  store float* %v2, float** @G ; just use %v2. doesn't have to be a store

  br label %merge                                                               

merge:                                                                          

  ret i32* %v0                                                                  

}

running "llc -mtriple=i386-apple-darwin" on it gives the following machine

code:

## BB#0:                                ## %entry

        subl    $12, %esp

Ltmp0:

        .cfi_def_cfa_offset 16

        calll   L_foo$stub

        testb   $1, 16(%esp)

        je      LBB0_2

## BB#1:                                ## %then

        movl    %eax, %ecx

        addl    $64, %ecx

        movl    L_G$non_lazy_ptr, %edx

        movl    %ecx, (%edx)

LBB0_2:                                 ## %merge

        addl    $12, %esp

        retl

where

movl %eax, %ecx

addl $64, %ecx

could have been combined into

leal 64(%eax), %ecx

The only place I am aware of that could combine movl+addl into leal is in

TwoAddressInstructionPass at around Line 1157. 

if (!regBKilled || isProfitableToConv3Addr(regA, regB)) {

  if (convertInstTo3Addr(...)) {

But both heuristics (i.e., !regBKilled and isProfitableToConv3Addr) failed. 

The pseudo machine code before the TwoAddressInstructionPass is

vreg0 = %eax

vreg1 = vreg0

if (cond) {

  vreg2 = vreg1 + 64

  use(vreg2)

}

%eax = vreg0

isProfitableToConv3Addr failed because vreg1 is not a direct copy of %eax.

!regBKilled failed because "vreg2 = vreg1 + 16" is the last use of vreg1, and

the pass seems to think RegisterCoalescer would coalesce vreg1 and vreg2 and

end up with simply vreg1/vreg2 += 16. However, while RegisterCoalescer does

coalesce them later, it cannot coalesce vreg0 and vreg2 because vreg0 is used

after then if-then

vreg0 = %eax

if (cond) {

  vreg2 = vreg0

  vreg2 += 64

  use(vreg2)

}

%eax = vreg0

leaving

vreg2 = vreg0

vreg2 += 64

not combined. 

I am not sure which part of the backend should be responsible for this

misoptimization. Bob Wilson mentioned it could be an issue with

RegisterCoalescer, but Coalescer seems optimal on this particular example.

Should TwoAddressInstructionPass use a better heuristic? Is it a phase-ordering

issue: part of TwoAddressInstructionPass should run after RegisterCoalescer? Or

should we run a peephole-optimization pass looking for the pattern of addl+movl

after register coalescing? 

Any thoughts?  

Thanks,

Jingyue</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>