<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><table border="1" cellspacing="0" cellpadding="8">

        <tr>

          <th>Bug ID</th>

          <td><a class="bz_bug_link 

          bz_status_NEW "

   title="NEW - [SLP Vectorizer] Perfomance degradation"

   href="https://bugs.llvm.org/show_bug.cgi?id=48155">48155</a>

          </td>

        </tr>

        <tr>

          <th>Summary</th>

          <td>[SLP Vectorizer] Perfomance degradation

          </td>

        </tr>

        <tr>

          <th>Product</th>

          <td>libraries

          </td>

        </tr>

        <tr>

          <th>Version</th>

          <td>trunk

          </td>

        </tr>

        <tr>

          <th>Hardware</th>

          <td>PC

          </td>

        </tr>

        <tr>

          <th>OS</th>

          <td>Linux

          </td>

        </tr>

        <tr>

          <th>Status</th>

          <td>NEW

          </td>

        </tr>

        <tr>

          <th>Severity</th>

          <td>enhancement

          </td>

        </tr>

        <tr>

          <th>Priority</th>

          <td>P

          </td>

        </tr>

        <tr>

          <th>Component</th>

          <td>Loop Optimizer

          </td>

        </tr>

        <tr>

          <th>Assignee</th>

          <td>unassignedbugs@nondot.org

          </td>

        </tr>

        <tr>

          <th>Reporter</th>

          <td>paulsson@linux.vnet.ibm.com

          </td>

        </tr>

        <tr>

          <th>CC</th>

          <td>llvm-bugs@lists.llvm.org

          </td>

        </tr></table>

      <p>

        <div>

        <pre>It appears that 6403009 "SLP: honor requested max vector size merging PHIs" has

had a very bad impact on a benchmark on SystemZ (imagick: 10-15% increased

runtime).

The original discussion (against that patch) mentions that vectorizing as wide

as possible past the size of the physical vector registers is supposed to be

beneficial, which is certainly true in this case. I think this includes the

extensions of i16 to double, and it is much better to do a vector

zero-extension and then a vector conversion than keeping everything scalar,

even if that later translates to two separate physical vector regs/operations

(which does not automatically mean spilling).

Is there some other way to avoid the original problem of register spilling?

Or should this limit be optional per target? Could some other cost function

regulate it in a better way?

Reverting that patch in a clumsy way:

--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp

@@ -7566,8 +7566,8 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock

*BB, BoUpSLP &R) {

       }

       while (SameTypeIt != E &&

-             (*SameTypeIt)->getType() == EltTy &&

-             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts) {

+             (*SameTypeIt)->getType() == EltTy) { //  &&

+        //             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts)

{

         VisitedInstrs.insert(*SameTypeIt);

         ++SameTypeIt;

       }

(hot function: imagick/morphology.s/MorphologyApply)

I have not yet a representative test case...

One observation so far is that in some of those PHIs all but 10 inputs were

constant zero... That makes them a bit special comparing to ordinary

arithmetic, perhaps...</pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>