<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - Performance regression in SLPVectorize between llvm 10.0 and 11.0"
   href="https://bugs.llvm.org/show_bug.cgi?id=48486">48486</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>Performance regression in SLPVectorize between llvm 10.0 and 11.0
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>new-bugs
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>11.0
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>normal
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>new bugs
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>code.optimizer@gmail.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>htmldeveloper@gmail.com, llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>With llvm 11.0 the change to the heuristics and/or instructions costs used in
SLPVectorize.cpp (opt) have causes a 30% regression in overall application
performance with routine  __nv_MorphologyPrimitive_F1L2849_2 in the attached
morphology.ll as measured on an Intel Skylake 40 core Xeon server.

With llvm 10.0, SLPVectorize promotes some of the loops from using xmm pd to
ymm pd.  Those same transformations do not happen with llvm 11.0.

Attached in SLPV.tar are:
morphology.ll (used as input for llvm opt releases 10 and 11)
morphology-10.llvm (output of opt using --opt-bisect-limit=778 - just after the
SLP pass) - exactly:

lim=778
opt -O2 -mcpu=skylake-avx512 --enable-unsafe-fp-math --enable-no-nans-fp-math
--enable-no-infs-fp-math --enable-no-signed-zeros-fp-math
--opt-bisect-limit=${lim} ./obj/magick/morphology.ll -S -o
./obj/magick/morphology-10.llvm

morphology-11.llvm
morphology-10.s output from llc invoked with:
-mcpu=skylake-avx512 -O2 --enable-unsafe-fp-math --enable-no-nans-fp-math
--enable-no-infs-fp-math --enable-no-signed-zeros-fp-math -fast-isel=0
-non-global-value-max-name-size=4294967295 -x86-cmov-converter=0 -filetype=obj

perf-10.lst and perf-11.lst: snapshots of perf report ofthe most costly loop in
routine __nv_MorphologyPrimitive_F1L2849_2</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>