[PATCH] D115261: [LV] Disable runtime unrolling for vectorized loops.

Dave Green via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Dec 13 01:04:16 PST 2021


dmgreen added a comment.

> I believe many targets already disable runtime unrolling for loops that contain vector instructions. For example AArch64 does that, though X86 currently does not. This is the principal alternative I would see, to move that logic up into the generic unroll preferences. It would be the difference between not unrolling loops that LLVM vectorized and not unrolling vector loops in general -- I assume the preference would be the former, as this patch does?

In ARM-MVE we do already, both for vectorized code from the vectorizer and vector code from the frontend (intrinsics for example). That is largely to do with the way MVE handles tail-predication and hardware loops, which makes runtime unrolling generally detrimental.

For AArch64 we inherit some of the unrolling preferences from the base class, so I would expect this to alter in-order and out of order unrolling decisions to some degree. The main reasons we disabled the extra unrolling of loops with vectors was to not over-unroll loops past the point that it is useful - as it if you have a loop acting on i8's that is vectorized x16, then "interleaved" x2 or x4, then further runtime unrolled, you end up with a fast loop body that handles 64 or 128 or even 256 elements at a time. For too many tripcounts you just don't enter that loop or do most of the processing in the remainder loop. (Also we didn't want to break a lot of the hand-crafted intrinsic code that is out there that already unrolls the loop to carefully fit into the number of registers available).

> Note that we already add metadata to disable runtime unrolling to the scalar loop after vectorization.

We only currently add no-unroll metadata to the remainder when there are no runtime-checks added. I wouldn't be surprised if a lot of the codesize gain from this patch is due to this patch adding no-unroll to the remainder, not to the vector body. It is quite easy to construct cases where the remainder really should be allowed to unroll.

LLVM has never had very good end-to-end testing for this kind of thing and has relied upon benchmarks like the llvm-test-suite to fill that gap. The noise makes it difficult, but I would expect a descent amount of benchmarking to prove a change like this is better overall and to catch a lot of the cases where it is not.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115261/new/

https://reviews.llvm.org/D115261



More information about the llvm-commits mailing list