[llvm-branch-commits] [llvm] a57d981 - [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
Tom Stellard via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Mon Aug 16 11:33:21 PDT 2021
Author: David Sherwood
Date: 2021-08-16T11:32:41-07:00
New Revision: a57d98111e63d390686f30b07fd1b1227ab99086
URL: https://github.com/llvm/llvm-project/commit/a57d98111e63d390686f30b07fd1b1227ab99086
DIFF: https://github.com/llvm/llvm-project/commit/a57d98111e63d390686f30b07fd1b1227ab99086.diff
LOG: [LoopVectorize] Improve vectorisation of some intrinsics by treating them as uniform
This patch adds more instructions to the Uniforms list, for example certain
intrinsics that are uniform by definition or whose operands are loop invariant.
This list includes:
1. The intrinsics 'experimental.noalias.scope.decl' and 'sideeffect', which
are always uniform by definition.
2. If intrinsics 'lifetime.start', 'lifetime.end' and 'assume' have
loop invariant input operands then these are also uniform too.
Also, in VPRecipeBuilder::handleReplication we check if an instruction is
uniform based purely on whether or not the instruction lives in the Uniforms
list. However, there are certain cases where calls to some intrinsics can
be effectively treated as uniform too. Therefore, we now also treat the
following cases as uniform for scalable vectors:
1. If the 'assume' intrinsic's operand is not loop invariant, then we
are free to treat this as uniform anyway since it's only a performance
hint. We will get the benefit for the first lane.
2. When the input pointers for 'lifetime.start' and 'lifetime.end' are loop
variant then for scalable vectors we assume these still ultimately come
from the broadcast of an alloca. We do not support scalable vectorisation
of loops containing alloca instructions, hence the alloca itself would
be invariant. If the pointer does not come from an alloca then the
intrinsic itself has no effect.
I have updated the assume test for fixed width, since we now treat it
as uniform:
Transforms/LoopVectorize/assume.ll
I've also added new scalable vectorisation tests for other intriniscs:
Transforms/LoopVectorize/scalable-assume.ll
Transforms/LoopVectorize/scalable-lifetime.ll
Transforms/LoopVectorize/scalable-noalias-scope-decl.ll
Differential Revision: https://reviews.llvm.org/D107284
(cherry picked from commit 3fd96e1b2e129b981f1bc1be2615486187e74687)
Added:
Modified:
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
llvm/test/Transforms/LoopVectorize/assume.ll
Removed:
################################################################################
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
index f24ae6b100d57..671bc6b5212b5 100644
--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -5433,6 +5433,21 @@ void LoopVectorizationCostModel::collectLoopUniforms(ElementCount VF) {
// lane 0 demanded or b) are uses which demand only lane 0 of their operand.
for (auto *BB : TheLoop->blocks())
for (auto &I : *BB) {
+ if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(&I)) {
+ switch (II->getIntrinsicID()) {
+ case Intrinsic::sideeffect:
+ case Intrinsic::experimental_noalias_scope_decl:
+ case Intrinsic::assume:
+ case Intrinsic::lifetime_start:
+ case Intrinsic::lifetime_end:
+ if (TheLoop->hasLoopInvariantOperands(&I))
+ addToWorklistIfAllowed(&I);
+ break;
+ default:
+ break;
+ }
+ }
+
// If there's no pointer operand, there's nothing to do.
auto *Ptr = getLoadStorePointerOperand(&I);
if (!Ptr)
@@ -8916,6 +8931,37 @@ VPBasicBlock *VPRecipeBuilder::handleReplication(
bool IsPredicated = LoopVectorizationPlanner::getDecisionAndClampRange(
[&](ElementCount VF) { return CM.isPredicatedInst(I); }, Range);
+ // Even if the instruction is not marked as uniform, there are certain
+ // intrinsic calls that can be effectively treated as such, so we check for
+ // them here. Conservatively, we only do this for scalable vectors, since
+ // for fixed-width VFs we can always fall back on full scalarization.
+ if (!IsUniform && Range.Start.isScalable() && isa<IntrinsicInst>(I)) {
+ switch (cast<IntrinsicInst>(I)->getIntrinsicID()) {
+ case Intrinsic::assume:
+ case Intrinsic::lifetime_start:
+ case Intrinsic::lifetime_end:
+ // For scalable vectors if one of the operands is variant then we still
+ // want to mark as uniform, which will generate one instruction for just
+ // the first lane of the vector. We can't scalarize the call in the same
+ // way as for fixed-width vectors because we don't know how many lanes
+ // there are.
+ //
+ // The reasons for doing it this way for scalable vectors are:
+ // 1. For the assume intrinsic generating the instruction for the first
+ // lane is still be better than not generating any at all. For
+ // example, the input may be a splat across all lanes.
+ // 2. For the lifetime start/end intrinsics the pointer operand only
+ // does anything useful when the input comes from a stack object,
+ // which suggests it should always be uniform. For non-stack objects
+ // the effect is to poison the object, which still allows us to
+ // remove the call.
+ IsUniform = true;
+ break;
+ default:
+ break;
+ }
+ }
+
auto *Recipe = new VPReplicateRecipe(I, Plan->mapToVPValues(I->operands()),
IsUniform, IsPredicated);
setRecipe(I, Recipe);
diff --git a/llvm/test/Transforms/LoopVectorize/assume.ll b/llvm/test/Transforms/LoopVectorize/assume.ll
index 10cb67fd4e6a0..b1cb79efa2245 100644
--- a/llvm/test/Transforms/LoopVectorize/assume.ll
+++ b/llvm/test/Transforms/LoopVectorize/assume.ll
@@ -49,12 +49,8 @@ define void @test2(%struct.data* nocapture readonly %d) {
; CHECK: vector.body:
; CHECK: tail call void @llvm.assume(i1 [[MASKCOND]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
-; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
-; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND]])
; CHECK: tail call void @llvm.assume(i1 [[MASKCOND4]])
; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
-; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
-; CHECK-NEXT: tail call void @llvm.assume(i1 [[MASKCOND4]])
; CHECK: for.body:
entry:
%b = getelementptr inbounds %struct.data, %struct.data* %d, i64 0, i32 1
More information about the llvm-branch-commits
mailing list