<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/58705>58705</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            SLP vectorizer is slow on some large input
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          gezalore
      </td>
    </tr>
</table>

<pre>
    Attached is a large input file that shows compilation-time issues in the SLP vectorizer. (The input file is from machine generated C++ emitted by Verilator.). Observe:

```
% time opt -O3 slowSLP.bc -disable-output
opt -O3 slowSLP.bc -disable-output  64.34s user 0.02s system 99% cpu 1:04.37 total
% time opt -O3 slowSLP.bc -disable-output -vectorize-slp=0
opt -O3 slowSLP.bc -disable-output -vectorize-slp=0  2.62s user 0.02s system 99% cpu 2.648 total
```

Worse yet I don't think we actually end up vectorizing anything, so we spend a lot of time analyzing the code to no effect.

LLVM version (main at time of writing): 1258747a59db5200112fca7c6140d184f3b8748e

I did some digging and there are at least two sections that have big impact, but I don't think I will have time in the near term to immerse myself enough to provide a fix:

---

First is the building of the `ValueToTEs` map in `BoUpSLP::isGatherShuffledEntry` (50%+ of time):
https://github.com/llvm/llvm-project/blob/a784de783af5096e593c5e214c2c78215fe303f5/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L7568-L7577

We call this function in a loop from `getTreeCost` via `getElementCost`:
https://github.com/llvm/llvm-project/blob/a784de783af5096e593c5e214c2c78215fe303f5/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L7358
https://github.com/llvm/llvm-project/blob/a784de783af5096e593c5e214c2c78215fe303f5/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L6222-L6223

I *think* (and might be wrong) that `VectorizableTree` does not change during that loop in `getTreeCost`, so lazy/pre-computing `ValueToTEs` once might be an easy win?

---

Another significant performance hog (20%+ of total runtime) is this loop in `clusterSortPtrAccesses`:
https://github.com/llvm/llvm-project/blob/a784de783af5096e593c5e214c2c78215fe303f5/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L3723-L3743

This gets called both from `reorderTopToBottom` and `reorderBottomToTop` with this input. The passed `VL` is ~16 entries long and AFAICT none of the pointers can be clustered, eventually hitting the early `return` in the loop, but evaluating `getPointersDiff` repeatedly appears expensive:
https://github.com/llvm/llvm-project/blob/a784de783af5096e593c5e214c2c78215fe303f5/llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp#L3738

[slowSLP.bc.gz](https://github.com/llvm/llvm-project/files/9900888/slowSLP.bc.gz)

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzVVkuPm0gQ_jX4UjKCxjx88MHzWkWaVSLFO3tuoIDebWjU3Xji-fVbBXbGTqTs5raRMNDuouqrrx5dpalPu733suqwBuVAgpa2RVDDOHlolEbwnfTgOvPqoDL9qLT0ygxrr3oSc25CR9IkhfD5-RMcsfLGqje0IQSiOHQ3ushCY00PPRlUA0KLA1rpyfZ9IO7oAuyV53V5ghe0bMzYMBDbED6WDu0Rg2QfRA9BdLln0flaliKFGZkZPaw_JuC0eSVcYVnBulZOlhrXZvKEaPng3-UAsk2YbBxMZB-iMBIO3Ml57GG7ZXvVOEFMsCISy8EbL_VPg4H1V-LWTo9B8hD9Z3zffwogwkz8GDFJbIobtN8wOd__NNYhnNDDB6jNEIjcU6jV8De8IsjKT1LrE-BQwzR-jb0aWpDDieXaQNyDMyztRhajBDMeTLMQIwepT7M8509lako3A4MBbBpSFl5DeX5--Z1MWEfZx6nVS0o7Ss2F4QZerfKzwS3FAmKRFvkml-m2LlMRRXEsmkrmVRZvojouNk1S0n6B1xbIRVUTWtJXq7Zd3KgZmiWo_POgUTqy-WrAEUCC4pYC6eQRoVQtqH4kXtjtcvqetQ_wqrRepJcKWkpnQGnBo-3Zf9X3yLT3J4e6IXbN1Ha8MVpzVMSRpGr68k0hrNfr6-WTsoRTuVl7OSldszvMO60pyC9ST3gwh0dHC6rHkZHQ6535Y6Q8Y-XJXrnfJHv_uZuaRmP9OHh7YnmiP40ok7hiz7FceF-Md96Pjlfiia5W-W4qQ2oetND6eHmsyZ2_kLl6KrUp6SHzYlNjXiSySaNthuk2qVIU8aYSVV6IOG0wiZImvdKj-LuDlYNrjO0dLV4u1UDv5MnLe0OqxjEQyXOeZsWa7nl-k-mUf5TMHCdqUtMwB5dJ4Yw149K3yPcW_cEi3hvnmYmjkud_HzX2OPjzxq_CRZIWvwLOTAix5ntyW7CB2M-FRU9OSi7XXrWdhxKpIZi5HSwFyjl_VsvNk2PI8asNnV8DtaSqkwMdfPVkl3bEtc5xX8riNurnpqbl24kAjxbXfDJO3H6-ry0zVPgOSg5AHeREbWAIkqcfFPCeQFHpgVPtoBpVycHDiJYZk6yxMy17LK7LkJs52Gk41-NS_nS7cqTSEx0E9rOx_pO3-6pC59D9Ogmb5CJZ031zkwgH9pJi5OYa5vGByPtashaNrdEezHgwd8Z78oXiwrnyvrn8T0EzI2--ktsLd_P0EgIPMqMkruaPXp5ZiHaD_DHOqEN7q5B5Ph8Z-6f9h_sD5dWAl547GjUQ8Yxw4EQ4BwJrTiY8korlLO1o_rkciHQo0D8zSD_ZYba5nBcc0csZg0fKN3lJPmLh09nUg2oa_sbiiDxkkS450iuBwC90Gjt1NU7938OeFDdjX3r3PhGF7VuQPlAx_LQPPJSy_e02ioqioLdbrXSmzeZWuIuzLBdxIbJsVe-Septs5corr3F3O_ZyVrASqvtllLgaqFeT1bufBrkM2fRCE02UrrpdtGmaLI-EaDZRQlRv87gUVSU3dYoJFnKlZYna7Ygk4mWldoJk4yihGSiOoywss23aJFWeZ2nciFoEmwhpmtIhGw6NbVd2N2Mop9bRplbOu_dNqgLqSYgX_XLynbG7Ft-kNhZXM97dDPYfijD1Yw">