[llvm-bugs] [Bug 48155] New: [SLP Vectorizer] Perfomance degradation

Wed Nov 11 10:01:34 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=48155

            Bug ID: 48155
           Summary: [SLP Vectorizer] Perfomance degradation
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Loop Optimizer
          Assignee: unassignedbugs at nondot.org
          Reporter: paulsson at linux.vnet.ibm.com
                CC: llvm-bugs at lists.llvm.org

It appears that 6403009 "SLP: honor requested max vector size merging PHIs" has
had a very bad impact on a benchmark on SystemZ (imagick: 10-15% increased
runtime).

The original discussion (against that patch) mentions that vectorizing as wide
as possible past the size of the physical vector registers is supposed to be
beneficial, which is certainly true in this case. I think this includes the
extensions of i16 to double, and it is much better to do a vector
zero-extension and then a vector conversion than keeping everything scalar,
even if that later translates to two separate physical vector regs/operations
(which does not automatically mean spilling).

Is there some other way to avoid the original problem of register spilling?

Or should this limit be optional per target? Could some other cost function
regulate it in a better way?

Reverting that patch in a clumsy way:

--- a/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
+++ b/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp
@@ -7566,8 +7566,8 @@ bool SLPVectorizerPass::vectorizeChainsInBlock(BasicBlock
*BB, BoUpSLP &R) {
       }

       while (SameTypeIt != E &&
-             (*SameTypeIt)->getType() == EltTy &&
-             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts) {
+             (*SameTypeIt)->getType() == EltTy) { //  &&
+        //             static_cast<unsigned>(SameTypeIt - IncIt) < MaxNumElts)
{
         VisitedInstrs.insert(*SameTypeIt);
         ++SameTypeIt;
       }

(hot function: imagick/morphology.s/MorphologyApply)

I have not yet a representative test case...

One observation so far is that in some of those PHIs all but 10 inputs were
constant zero... That makes them a bit special comparing to ordinary
arithmetic, perhaps...

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20201111/547afe58/attachment.html>