[llvm] [LoopVectorizer] Prune VFs based on plan register pressure (PR #132190)
David Sherwood via llvm-commits
llvm-commits at lists.llvm.org
Thu Mar 20 06:54:56 PDT 2025
================
@@ -35,16 +35,25 @@ define void @arm_mean_q7(ptr noundef %pSrc, i32 noundef %blockSize, ptr noundef
; CHECK-NEXT: [[AND:%.*]] = and i32 [[BLOCKSIZE]], 15
; CHECK-NEXT: [[CMP2_NOT15:%.*]] = icmp eq i32 [[AND]], 0
; CHECK-NEXT: br i1 [[CMP2_NOT15]], label [[WHILE_END5:%.*]], label [[MIDDLE_BLOCK:%.*]]
-; CHECK: middle.block:
-; CHECK-NEXT: [[ACTIVE_LANE_MASK:%.*]] = tail call <16 x i1> @llvm.get.active.lane.mask.v16i1.i32(i32 0, i32 [[AND]])
-; CHECK-NEXT: [[WIDE_MASKED_LOAD:%.*]] = tail call <16 x i8> @llvm.masked.load.v16i8.p0(ptr [[PSRC_ADDR_0_LCSSA]], i32 1, <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i8> poison)
-; CHECK-NEXT: [[TMP4:%.*]] = sext <16 x i8> [[WIDE_MASKED_LOAD]] to <16 x i32>
-; CHECK-NEXT: [[TMP5:%.*]] = select <16 x i1> [[ACTIVE_LANE_MASK]], <16 x i32> [[TMP4]], <16 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP6:%.*]] = tail call i32 @llvm.vector.reduce.add.v16i32(<16 x i32> [[TMP5]])
-; CHECK-NEXT: [[TMP7:%.*]] = add i32 [[TMP6]], [[SUM_0_LCSSA]]
-; CHECK-NEXT: br label [[WHILE_END5]]
+; CHECK: vector.ph:
----------------
david-arm wrote:
This looks like a possible regression to me. The previous code was optimal because it removed the loop entirely and hence there was no register pressure due to PHIs.
https://github.com/llvm/llvm-project/pull/132190
More information about the llvm-commits
mailing list