[PATCH] D112552: [LoopVectorize] When tail-folding, don't always predicate uniform loads
Sander de Smalen via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Oct 27 05:34:47 PDT 2021
sdesmalen added a comment.
In D112552#3090066 <https://reviews.llvm.org/D112552#3090066>, @david-arm wrote:
> In D112552#3088838 <https://reviews.llvm.org/D112552#3088838>, @fhahn wrote:
>
>> Does the test need to be aarch64 specific or could it be target independent?
>
> Hi @fhahn, I think that's a really good idea. The main problem I had when trying this is that tail-folding requires masked load/store support, so when you make this target-independent we end up scalarising the loads/stores and never expose the path I changed in this patch. If there is a way to force masked loads to be legal then I could try that?
>From what I can see, there doesn't seem to be a way to force masked loads to be legal, and personally I don't think it is a big-enough deal to create a new option for it.
Maybe you can add a comment to the test describing that the test isn't really target-specific (other than the need for `TTI.isLegalMaskedLoad` to return true) ?
================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:9071
+ "Uniform load is unexpectedly marked as predicated!");
+ LLVM_DEBUG(dbgs() << "LV: Scalarizing uniform load in predicated loop:"
+ << *I << "\n");
----------------
I see you've followed the message above, but I'm confused by the terminology here. From the test it does the opposite of scalarising (e.g. load scalar value + splat), which I personally associate more with 'widening'. So perhaps `"Replicating uniform load in predicated loop:"` would be more appropriate? (assuming that //replicating// can be done in multiple ways, not necessarily by //scalarizing//)
================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/tail-fold-uniform-memops.ll:48
+; and the original condition.
+define void @cond_uniform_load(i32* nocapture %dst, i32* nocapture readonly %src, i32* nocapture readonly %cond, i64 %n) #0 {
+; CHECK-LABEL: @cond_uniform_load(
----------------
The output of this test is unchanged by your code. Is it worth adding this test as a NFC change ("adding tests for uniform memops") and then rebasing your patch? That way it's clear what your patch has changed from the test.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D112552/new/
https://reviews.llvm.org/D112552
More information about the llvm-commits
mailing list