[PATCH] D112552: [LoopVectorize] When tail-folding, don't always predicate uniform loads

Wed Oct 27 05:34:47 PDT 2021

sdesmalen added a comment.

In D112552#3090066 <https://reviews.llvm.org/D112552#3090066>, @david-arm wrote:

> In D112552#3088838 <https://reviews.llvm.org/D112552#3088838>, @fhahn wrote:
>
>> Does the test need to be aarch64 specific or could it be target independent?
>
> Hi @fhahn, I think that's a really good idea. The main problem I had when trying this is that tail-folding requires masked load/store support, so when you make this target-independent we end up scalarising the loads/stores and never expose the path I changed in this patch. If there is a way to force masked loads to be legal then I could try that?

>From what I can see, there doesn't seem to be a way to force masked loads to be legal, and personally I don't think it is a big-enough deal to create a new option for it.
Maybe you can add a comment to the test describing that the test isn't really target-specific (other than the need for `TTI.isLegalMaskedLoad` to return true) ?

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9071
+           "Uniform load is unexpectedly marked as predicated!");
+    LLVM_DEBUG(dbgs() << "LV: Scalarizing uniform load in predicated loop:"
+                      << *I << "\n");
----------------
I see you've followed the message above, but I'm confused by the terminology here. From the test it does the opposite of scalarising (e.g. load scalar value + splat), which I personally associate more with 'widening'.  So perhaps `"Replicating uniform load in predicated loop:"` would be more appropriate? (assuming that //replicating// can be done in multiple ways, not necessarily by //scalarizing//)

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll:48
+; and the original condition.
+define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {
+; CHECK-LABEL: @cond_uniform_load(
----------------
The output of this test is unchanged by your code. Is it worth adding this test as a NFC change ("adding tests for uniform memops") and then rebasing your patch? That way it's clear what your patch has changed from the test.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D112552/new/

https://reviews.llvm.org/D112552