[PATCH] D152366: [LoopVectorize] Allow inner loop runtime checks to be hoisted above an outer loop

Wed Jun 28 03:47:45 PDT 2023

david-arm added a comment.

In D152366#4450222 <https://reviews.llvm.org/D152366#4450222>, @fhahn wrote:

> Does this require the `extra-vectorizer-passes` to hoist the checks out of the loop in the full pipeline? Do you know if we already have sufficient runtime tests in llvm-test-suite for this kind of code? If not, it would be good to add some, perhaps to https://github.com/llvm/llvm-test-suite/blob/main/SingleSource/UnitTests/Vectorizer/runtime-checks.cpp.
>
> Also, could you clean up the block & value names in the tests a bit so they are easier to read and land them separately?

Hi @fhahn, so I can see there are tests in the LLVM test suite that have examples of nested loops where the inner loop runtime checks could be hoisted out with this patch. For example, see MicroBenchmarks/ImageProcessing/Dither/floydDitherKernel.c and MicroBenchmarks/ImageProcessing/Dither/orderedDitherKernel.c. However, I am looking at trying to add some specific tests to SingleSource/UnitTests/Vectorizer/runtime-checks.cpp as well. Thanks for the suggestion!

================
Comment at: llvm/test/Transforms/LoopVectorize/runtime-checks-hoist.ll:147
+;   }
+; }
+; We decide to do full runtime checks here (as opposed to diff checks) due to
----------------
Ayal wrote:
> This raises a thought, which may deserve a separate patch:
> In this example the runtime checks should be optimized into a form independent of the enclosing i loop and hoisted out of it w/o expansion - and w/o asking a flag, as an obvious loop invariant unswitching opportunity.
> 
> I.e., instead of checking if intervals dst[i*n : i*n+n) and src[i*n : i*n + n) intersect, suffice to check if dst[0 : n) and src[0 : n) intersect. When checking if the end SCEV point of one precedes the start SCEV of the other, try to cancel common addends, or check if the SCEV of their difference is positive - subject to wrapping?
> 
> This also holds for the motivating memcopy example w/o the additional load, which is also worth adding a test for? Perhaps the first diff_checks() test above can serve as a better example where expansion is indeed necessary in order to loop unswitch a (non invariant) runtime check.
> 
> Wonder if the SPEC2017 significant cases are invariant(?)
Hi @Ayal, that's a good spot and thanks for pointing that out! The tests I wrote don't really bear any resemblance to the specific loops in x264 that benefit from this optimisation, where the inner loop runtime checks are not loop invariant. For example, one loop looks like a bit like this in x264:

    for(int i = 0; i < m; i++) {
        for(int j = 0; j < n; j++)
            out[j] = ... do something with in1[j] + in2[j] ...;
        out += out_stride;
        in1 += in1_stride;
        in2 += in2_stride;
    }

However, what you said makes me wonder if the tests I added in this patch are unreliable, because like you said they may get optimised away in future because they are genuinely invariant. I'll look into adding another test.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152366/new/

https://reviews.llvm.org/D152366