[llvm] [LV] Enable CSA for RISCV EVL tail-folding with scalable vector. (PR #184068)
Elvis Wang via llvm-commits
llvm-commits at lists.llvm.org
Sun Mar 15 19:22:34 PDT 2026
================
@@ -84,17 +80,12 @@ define i32 @non_speculatable_find_last_reduction(ptr noalias %a, ptr noalias %b,
; CHECK-NEXT: [[TMP0:%.*]] = phi <vscale x 4 x i1> [ zeroinitializer, %[[EXIT]] ], [ [[TMP11:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[AVL:%.*]] = phi i64 [ [[N]], %[[EXIT]] ], [ [[AVL_NEXT:%.*]], %[[VECTOR_BODY]] ]
; CHECK-NEXT: [[TMP1:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 4, i1 true)
-; CHECK-NEXT: [[BROADCAST_SPLATINSERT3:%.*]] = insertelement <vscale x 4 x i32> poison, i32 [[TMP1]], i64 0
-; CHECK-NEXT: [[BROADCAST_SPLAT4:%.*]] = shufflevector <vscale x 4 x i32> [[BROADCAST_SPLATINSERT3]], <vscale x 4 x i32> poison, <vscale x 4 x i32> zeroinitializer
-; CHECK-NEXT: [[TMP2:%.*]] = call <vscale x 4 x i32> @llvm.stepvector.nxv4i32()
-; CHECK-NEXT: [[TMP3:%.*]] = icmp ult <vscale x 4 x i32> [[TMP2]], [[BROADCAST_SPLAT4]]
; CHECK-NEXT: [[A_ADDR:%.*]] = getelementptr inbounds nuw i32, ptr [[A]], i64 [[IV]]
; CHECK-NEXT: [[VP_OP_LOAD:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[A_ADDR]], <vscale x 4 x i1> splat (i1 true), i32 [[TMP1]])
; CHECK-NEXT: [[TMP5:%.*]] = icmp sgt <vscale x 4 x i32> [[VP_OP_LOAD]], [[BROADCAST_SPLAT]]
-; CHECK-NEXT: [[TMP6:%.*]] = select <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i1> [[TMP5]], <vscale x 4 x i1> zeroinitializer
+; CHECK-NEXT: [[TMP8:%.*]] = call <vscale x 4 x i1> @llvm.vp.merge.nxv4i1(<vscale x 4 x i1> splat (i1 true), <vscale x 4 x i1> [[TMP5]], <vscale x 4 x i1> zeroinitializer, i32 [[TMP1]])
; CHECK-NEXT: [[TMP7:%.*]] = getelementptr i32, ptr [[B]], i64 [[IV]]
-; CHECK-NEXT: [[VP_OP_LOAD5:%.*]] = call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 [[TMP7]], <vscale x 4 x i1> [[TMP5]], i32 [[TMP1]])
-; CHECK-NEXT: [[TMP8:%.*]] = select <vscale x 4 x i1> [[TMP3]], <vscale x 4 x i1> [[TMP6]], <vscale x 4 x i1> zeroinitializer
+; CHECK-NEXT: [[VP_OP_LOAD5:%.*]] = call <vscale x 4 x i32> @llvm.masked.load.nxv4i32.p0(ptr align 4 [[TMP7]], <vscale x 4 x i1> [[TMP8]], <vscale x 4 x i32> poison)
----------------
ElvisWang123 wrote:
> Oh right so in this PR you're trying to remove all uses of the header mask, not just the AnyOf one?
In the first place I tried replacing logicalAnd with user of the AnyOf. But yes, we can remove all the logicalAnd in the second pass to make more loops get benefit from this.
> I think what we really want to do is replace all (LogicalAnd HeaderMask, X) --> vp.merge(True, HeaderMask, False, EVL). But if we do that eagerly then we break the m_RemoveMask pattern, which is why you need to check the users right?
Yes we cannot do this in the `optimizeMaskToEVL` since after `LogicalAnd` being replaced to `vp.merge` the `m_RemoveMask` pattern match cannot replace the header mask and convert recipes to VP recipes any more.
Thanks for your suggestion. Updated to support replacing all left-over LogicalAnd.
https://github.com/llvm/llvm-project/pull/184068
More information about the llvm-commits
mailing list