<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Performance regression with rrL308142"
   href="https://bugs.llvm.org/show_bug.cgi?id=33954">33954</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Performance regression with rrL308142
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>tools
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>All
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>llc
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>anna@azul.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>Created <span class=""><a href="attachment.cgi?id=18851" name="attach_18851" title="IR file to run with llc">attachment 18851</a> <a href="attachment.cgi?id=18851&action=edit" title="IR file to run with llc">[details]</a></span>
IR file to run with llc

With rL308142, there was an optimization introduced to convert cmov to branches
when profitable.

We noticed couple of regressions (around 2-5%) on skylake hardware on internal
benchmarks. The performance was back to normal when -x86-cmov-converter=false
was supplied.

I've tried to reduce the IR as much as possible, and added a main method along
with it. However, the regression (using time command) is not quite visible,
seems to be attributed to noise.

I've attached the IR and how the assembly is generated. Perhaps something may
jump out wrt the heuristics, which seems to be having a performance cliff: we
convert a cmov to branch when the gain is greater than 25% of misprediction
penalty.


Reproduce as:
llc -mcpu=skylake
-mattr=+sse2,+cx16,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,+mmx,+rdseed,+clflushopt,+xsave,+avx,+rtm,+fma,+bmi,+rdrnd,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,+f16c,+ssse3,+sgx,+cmov,+movbe,+xsaveopt,+adx,+sse3,
-O3 -x86-cmov-converter=false test.ll ; mv test.s test.falsenative.s; gcc
test.falsenative.s -o test.falsenative

llc -mcpu=skylake
-mattr=+sse2,+cx16,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,+mmx,+rdseed,+clflushopt,+xsave,+avx,+rtm,+fma,+bmi,+rdrnd,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,+f16c,+ssse3,+sgx,+cmov,+movbe,+xsaveopt,+adx,+sse3,
-O3 -x86-cmov-converter=true test.ll ; mv test.s test.truenative.s; gcc
test.truenative.s -o test.truenative

echo "time for x86-cmov-converter=true"
time ./test.truenative > chk2 

echo "time for x86-cmov-converter=false" 
time ./test.falsenative > chk 

time for x86-cmov-converter=true

real    0m0.155s
user    0m0.065s
sys     0m0.007s
time for x86-cmov-converter=false

real    0m0.154s
user    0m0.064s
sys     0m0.008s</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>