<html>
    <head>
      <base href="https://bugs.llvm.org/">
    </head>
    <body><table border="1" cellspacing="0" cellpadding="8">
        <tr>
          <th>Bug ID</th>
          <td><a class="bz_bug_link 
          bz_status_NEW "
   title="NEW - [SLP Vectorizer] Perfomance degradation"
   href="https://bugs.llvm.org/show_bug.cgi?id=48155">48155</a>
          </td>
        </tr>

        <tr>
          <th>Summary</th>
          <td>[SLP Vectorizer] Perfomance degradation
          </td>
        </tr>

        <tr>
          <th>Product</th>
          <td>libraries
          </td>
        </tr>

        <tr>
          <th>Version</th>
          <td>trunk
          </td>
        </tr>

        <tr>
          <th>Hardware</th>
          <td>PC
          </td>
        </tr>

        <tr>
          <th>OS</th>
          <td>Linux
          </td>
        </tr>

        <tr>
          <th>Status</th>
          <td>NEW
          </td>
        </tr>

        <tr>
          <th>Severity</th>
          <td>enhancement
          </td>
        </tr>

        <tr>
          <th>Priority</th>
          <td>P
          </td>
        </tr>

        <tr>
          <th>Component</th>
          <td>Loop Optimizer
          </td>
        </tr>

        <tr>
          <th>Assignee</th>
          <td>unassignedbugs@nondot.org
          </td>
        </tr>

        <tr>
          <th>Reporter</th>
          <td>paulsson@linux.vnet.ibm.com
          </td>
        </tr>

        <tr>
          <th>CC</th>
          <td>llvm-bugs@lists.llvm.org
          </td>
        </tr></table>
      <p>
        <div>
        <pre>It appears that 6403009 "SLP: honor requested max vector size merging PHIs" has
had a very bad impact on a benchmark on SystemZ (imagick: 10-15% increased
runtime).

The original discussion (against that patch) mentions that vectorizing as wide
as possible past the size of the physical vector registers is supposed to be
beneficial, which is certainly true in this case. I think this includes the
extensions of i16 to double, and it is much better to do a vector
zero-extension and then a vector conversion than keeping everything scalar,
even if that later translates to two separate physical vector regs/operations
(which does not automatically mean spilling).

Is there some other way to avoid the original problem of register spilling?

Or should this limit be optional per target? Could some other cost function
regulate it in a better way?

Reverting that patch in a clumsy way:

--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -7566,8 +7566,8 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock
*BB, BoUpSLP &R) {
       }

       while (SameTypeIt != E &&
-             (*SameTypeIt)->getType() == EltTy &&
-             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts) {
+             (*SameTypeIt)->getType() == EltTy) { //  &&
+        //             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts)
{
         VisitedInstrs.insert(*SameTypeIt);
         ++SameTypeIt;
       }

(hot function: imagick/morphology.s/MorphologyApply)

I have not yet a representative test case...

One observation so far is that in some of those PHIs all but 10 inputs were
constant zero... That makes them a bit special comparing to ordinary
arithmetic, perhaps...</pre>
        </div>
      </p>


      <hr>
      <span>You are receiving this mail because:</span>

      <ul>
          <li>You are on the CC list for the bug.</li>
      </ul>
    </body>
</html>