<html>
    <head>
      <base href="http://llvm.org/bugs/" />
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - Misoptimization of combining movl+addl int leal"
   href="http://llvm.org/bugs/show_bug.cgi?id=20776">20776</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Misoptimization of combining movl+addl int leal
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Backend: X86
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>wujingyue@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvmbugs@cs.uiuc.edu
          </td>
        </tr>

        <tr>
          <th>Classification</th>
          <td>Unclassified
          </td>
        </tr></table>
      <p>
        <div>
        <pre>The symptom is similar to <a class="bz_bug_link 
          bz_status_NEW "
   title="NEW --- - TwoAddressInstructionPass fails to optimize mov+add to lea"
   href="show_bug.cgi?id=20701">http://llvm.org/bugs/show_bug.cgi?id=20701</a> where the
backend misses the opportunity of combining addl+movl into leal. However, I
feel the root cause is different, so I decided to file a separate bug. 

The test case is reduced from loop-strength-reduce8.ll

@G = external global float*                                                     

declare i32* @foo()                                                             

define i32* @bar(i1 %cond) {                                                    
entry:                                                                          
  %v0 = call i32* @foo() ; v0 = eax                                             
  %v1 = bitcast i32* %v0 to float* ; v1 = v0                                    
  br i1 %cond, label %then, label %merge                                        

then:                                                                           
  %v2 = getelementptr float* %v1, i64 16                                        
  store float* %v2, float** @G ; just use %v2. doesn't have to be a store
  br label %merge                                                               

merge:                                                                          
  ret i32* %v0                                                                  
}

running "llc -mtriple=i386-apple-darwin" on it gives the following machine
code:

## BB#0:                                ## %entry
        subl    $12, %esp
Ltmp0:
        .cfi_def_cfa_offset 16
        calll   L_foo$stub
        testb   $1, 16(%esp)
        je      LBB0_2
## BB#1:                                ## %then
        movl    %eax, %ecx
        addl    $64, %ecx
        movl    L_G$non_lazy_ptr, %edx
        movl    %ecx, (%edx)
LBB0_2:                                 ## %merge
        addl    $12, %esp
        retl

where

movl %eax, %ecx
addl $64, %ecx

could have been combined into

leal 64(%eax), %ecx

The only place I am aware of that could combine movl+addl into leal is in
TwoAddressInstructionPass at around Line 1157. 

if (!regBKilled || isProfitableToConv3Addr(regA, regB)) {
  if (convertInstTo3Addr(...)) {

But both heuristics (i.e., !regBKilled and isProfitableToConv3Addr) failed. 

The pseudo machine code before the TwoAddressInstructionPass is

vreg0 = %eax
vreg1 = vreg0
if (cond) {
  vreg2 = vreg1 + 64
  use(vreg2)
}
%eax = vreg0

isProfitableToConv3Addr failed because vreg1 is not a direct copy of %eax.
!regBKilled failed because "vreg2 = vreg1 + 16" is the last use of vreg1, and
the pass seems to think RegisterCoalescer would coalesce vreg1 and vreg2 and
end up with simply vreg1/vreg2 += 16. However, while RegisterCoalescer does
coalesce them later, it cannot coalesce vreg0 and vreg2 because vreg0 is used
after then if-then

vreg0 = %eax
if (cond) {
  vreg2 = vreg0
  vreg2 += 64
  use(vreg2)
}
%eax = vreg0

leaving

vreg2 = vreg0
vreg2 += 64

not combined. 

I am not sure which part of the backend should be responsible for this
misoptimization. Bob Wilson mentioned it could be an issue with
RegisterCoalescer, but Coalescer seems optimal on this particular example.
Should TwoAddressInstructionPass use a better heuristic? Is it a phase-ordering
issue: part of TwoAddressInstructionPass should run after RegisterCoalescer? Or
should we run a peephole-optimization pass looking for the pattern of addl+movl
after register coalescing? 

Any thoughts?  

Thanks,
Jingyue</pre>
        </div>
      </p>
      <hr>
      <span>You are receiving this mail because:</span>
      
      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>