[PATCH] D122148: [SLP] Peak into loads when hitting the RecursionMaxDepth

Tue Mar 29 04:25:08 PDT 2022

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3998
+  if (Depth >= RecursionMaxDepth &&
+      !(S.MainOp && match(S.MainOp, m_ZExtOrSExtOrSelf(m_Load(m_Value()))))) {
     LLVM_DEBUG(dbgs() << "SLP: Gathering due to max recursion depth.\n");
----------------
dmgreen wrote:
> ABataev wrote:
> > dmgreen wrote:
> > > ABataev wrote:
> > > > It might be not profitable, e.g. if vector extension is not free, while scalar extension is free, and loads are not vectorizable. Also, what if you have an extension of a load and some other instructions, not loads/extractelements, etc.?
> > > Im not sure I understand what you mean by unprofitable? This just stops zext(load being forced to be gather if it hits the max depth. It should just mean that those node are either better (not gathers) or the same and shouldn't lead to regressions. It was previously hitting an arbitrary limit - you could say the same where any arbitrary limit causes arbitrary problems. Giving the loads the ability to order nicely should be a bigger win.
> > > 
> > > For the second part - do you mean a AltOp? If so then that makes sense, we can add a check for that, making sure it is the same as the MainOp.
> > 1. The cost of vector sext/zext is larger than the cost of scalar sext/zext (which might be free in many cases).
> > 2. If S.MainOp is zext/sext(load), it does not mean that all values are zext/sext(load), they might be sext/sext(load,extract,binop,etc.), since you're checking only the mainop.  
> Can't that be true for any limit though? We have an arbitrary limit of 12 at the moment. Decreasing the limit to 11 will mean some zext are treated like gathers, not vectorized, and the cost of zexting the loads may be cheaper for scalars than it is for vectors. The same would be true for decreasing the limit to 10 or 9. We would end up picking the limit where we most expect to find zext(load, which is probably very low. (Or just never vectorizing zext(load if the load is gather).
> 
> But in general if the loads can be vectorized nicely (either continuously or in clusters) then it should be a gain. The better vectorization of the load would overcome the difference in cost between the scalar and vector zext. We should expect for all the code out there for this to improve performance more than it happens to decrease it.
> 
> For the second point, do you have any suggestions? As a simple heuristic, this seemed like nice enough check to me, to balance the complexity of the check vs the expected outcome. Should I make it an all_of?
> Can't that be true for any limit though? We have an arbitrary limit of 12 at the moment. Decreasing the limit to 11 will mean some zext are treated like gathers, not vectorized, and the cost of zexting the loads may be cheaper for scalars than it is for vectors. The same would be true for decreasing the limit to 10 or 9. We would end up picking the limit where we most expect to find zext(load, which is probably very low. (Or just never vectorizing zext(load if the load is gather).

Yes, but we have already some optimizations/numbers/data for this limit. Yes, this limit is not optimal but it is stable, lots of apps shows that it is good enough. Any changes here are very sensitive.

> 
> But in general if the loads can be vectorized nicely (either continuously or in clusters) then it should be a gain. The better vectorization of the load would overcome the difference in cost between the scalar and vector zext. We should expect for all the code out there for this to improve performance more than it happens to decrease it.

The problem here is that you're not checking for loads, you just check that you have a single load. It will work for just loads, but for zext/sext it is not.

> 
> For the second point, do you have any suggestions? As a simple heuristic, this seemed like nice enough check to me, to balance the complexity of the check vs the expected outcome. Should I make it an all_of?

Yes, better to have all_of here, at least for zex/sext. For loads themselves it is enough just to check S.MainOp

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122148/new/

https://reviews.llvm.org/D122148