<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - Performance regression with rrL308142"

   href="https://bugs.llvm.org/show_bug.cgi?id=33954">33954</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>Performance regression with rrL308142

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>tools

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>All

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>llc

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>anna@azul.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>Created <span class=""><a href="attachment.cgi?id=18851" name="attach_18851" title="IR file to run with llc">attachment 18851</a> <a href="attachment.cgi?id=18851&action=edit" title="IR file to run with llc">[details]</a></span>

IR file to run with llc

With rL308142, there was an optimization introduced to convert cmov to branches

when profitable.

We noticed couple of regressions (around 2-5%) on skylake hardware on internal

benchmarks. The performance was back to normal when -x86-cmov-converter=false

was supplied.

I've tried to reduce the IR as much as possible, and added a main method along

with it. However, the regression (using time command) is not quite visible,

seems to be attributed to noise.

I've attached the IR and how the assembly is generated. Perhaps something may

jump out wrt the heuristics, which seems to be having a performance cliff: we

convert a cmov to branch when the gain is greater than 25% of misprediction

penalty.

Reproduce as:

llc -mcpu=skylake

-mattr=+sse2,+cx16,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,+mmx,+rdseed,+clflushopt,+xsave,+avx,+rtm,+fma,+bmi,+rdrnd,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,+f16c,+ssse3,+sgx,+cmov,+movbe,+xsaveopt,+adx,+sse3,

-O3 -x86-cmov-converter=false test.ll ; mv test.s test.falsenative.s; gcc

test.falsenative.s -o test.falsenative

llc -mcpu=skylake

-mattr=+sse2,+cx16,+prfchw,+bmi2,+xsavec,+fsgsbase,+popcnt,+aes,+xsaves,+mmx,+rdseed,+clflushopt,+xsave,+avx,+rtm,+fma,+bmi,+rdrnd,+sse4.1,+sse4.2,+avx2,+sse,+lzcnt,+pclmul,+f16c,+ssse3,+sgx,+cmov,+movbe,+xsaveopt,+adx,+sse3,

-O3 -x86-cmov-converter=true test.ll ; mv test.s test.truenative.s; gcc

test.truenative.s -o test.truenative

echo "time for x86-cmov-converter=true"

time ./test.truenative > chk2 

echo "time for x86-cmov-converter=false" 

time ./test.falsenative > chk 

time for x86-cmov-converter=true

real    0m0.155s

user    0m0.065s

sys     0m0.007s

time for x86-cmov-converter=false

real    0m0.154s

user    0m0.064s

sys     0m0.008s</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>