[PATCH] D142255: [WIP] Loop peeling opportunity for identity operators

Fri Jan 27 09:53:04 PST 2023

jamieschmeiser added a comment.

@nikic, thank you for your comments.  I too have concerns about this proposal.  I agree that the benefits are marginal but I would like to address some of your comments.

In D142255#4082028 <https://reviews.llvm.org/D142255#4082028>, @nikic wrote:

> Strong -1 on this change as implemented. There's a lot of red flags here, for example that you are trying to artificially limit this transform so as not to completely break LoopVectorize (indicating that you're breaking a canonical form), and the number of times you have to suppress the transform in tests, where it looks like the transform would be clearly non-profitable if it were actually applied.

How this affects other transforms, LoopVectorize in particular, is one of my biggest concerns, but for different reasons.  I don't think that it is breaking a canonical form, but rather that it may cause different remainders after vectorizing.  This could be beneficial in some cases, harmful in others, depending on how it happens to line up.  This was one of the things we discussed in the loop-opt group meeting and there was some question as to why peeling was being done in the full looop unroll pass before vectorizing as well as after in the loop unroll pass.  I investigated this and found that peeling was specifically turned on in loop full unroll to aid vectorizing (commit 35b3989a30eefa66cd6edca4c6e1ec061c05ad96 <https://reviews.llvm.org/rG35b3989a30eefa66cd6edca4c6e1ec061c05ad96>) because, as you indicated, the existing peeling strategies tend to eliminate phis, which can help the vectorizer.  A proposed solution from this discussion was to not do this new peeling strategy in full loop unrolling but only after vectorization.  This also removes the need for that bit of code for limiting the peel.
I disagree that the number of times I suppressed it shows it is not profitable.  In fact, the frequency that it occurs shows its usefulness, but I suppressed it because it isn't germane to the tests in question and obfuscates their meaning and purpose.

> There might be something to the general idea here, but this would need entirely different cost modelling from the current peeling transform. Loop peeling is a tradeoff between simplifying the loop body vs increasing code size. In the cases where current peeling transforms are performed, we expect some significant simplification of the loop, like the elimination of branches that are constant after some iterations. For the transform proposed here the benefit is very marginal, you're basically saving a single operation. This //might// make sense if that operation is pretty much the only thing the loop does, but not if you need to duplicate a large loop body that also does many other things.

The peeling code already considers the size of code growth and establishes limits to the amount of peeling that can occur.  The proposed code does not change that and only suggests the peel when it is within the established limits.

I am preparing a revision of this code that adds a new peeling flag to the peeling control struct called allow-aggressive-peeling.  This would control opportunities that do not necessarily simplify phi structure of control flow but exist for other opportunities such as this one.  It would be set to false for full loop unroll (before vectorizing) but on for the loop unroll pass.  Like other flags in this struct, it will have an option for controlling it and it will be suppressed using the option in the loop unrolling tests for the reasons given above but not specified for other ones.  Does this approach address your concerns?  I recognize that the change would need to be examined before any decision is made...

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142255/new/

https://reviews.llvm.org/D142255