[llvm] [LV] Transform to handle exits in the scalar loop (PR #148626)

David Sherwood via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 10 08:31:34 PST 2025


================
@@ -355,6 +451,70 @@ define void @inner_loop_trip_count_depends_on_outer_iv(ptr align 8 dereferenceab
 ; CHECK:       exit:
 ; CHECK-NEXT:    ret void
 ;
+; EE-SCALAR-LABEL: define void @inner_loop_trip_count_depends_on_outer_iv(
+; EE-SCALAR-SAME: ptr align 8 dereferenceable(1792) [[THIS:%.*]], ptr [[DST:%.*]]) {
+; EE-SCALAR-NEXT:  entry:
+; EE-SCALAR-NEXT:    [[GEP_SRC:%.*]] = getelementptr i8, ptr [[THIS]], i64 1000
+; EE-SCALAR-NEXT:    br label [[OUTER_HEADER:%.*]]
+; EE-SCALAR:       outer.header:
+; EE-SCALAR-NEXT:    [[OUTER_IV:%.*]] = phi i64 [ 0, [[ENTRY:%.*]] ], [ [[OUTER_IV_NEXT:%.*]], [[OUTER_LATCH:%.*]] ]
+; EE-SCALAR-NEXT:    [[C_1:%.*]] = icmp eq i64 [[OUTER_IV]], 0
+; EE-SCALAR-NEXT:    br i1 [[C_1]], label [[THEN:%.*]], label [[INNER_HEADER_PREHEADER:%.*]]
+; EE-SCALAR:       inner.header.preheader:
+; EE-SCALAR-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 [[OUTER_IV]], 4
+; EE-SCALAR-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; EE-SCALAR:       vector.ph:
+; EE-SCALAR-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[OUTER_IV]], 4
+; EE-SCALAR-NEXT:    [[N_VEC:%.*]] = sub i64 [[OUTER_IV]], [[N_MOD_VF]]
+; EE-SCALAR-NEXT:    [[WIDE_LOAD:%.*]] = load <4 x ptr>, ptr [[GEP_SRC]], align 8
+; EE-SCALAR-NEXT:    [[TMP0:%.*]] = icmp eq <4 x ptr> [[WIDE_LOAD]], zeroinitializer
+; EE-SCALAR-NEXT:    [[TMP1:%.*]] = freeze <4 x i1> [[TMP0]]
+; EE-SCALAR-NEXT:    [[TMP2:%.*]] = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> [[TMP1]])
+; EE-SCALAR-NEXT:    br i1 [[TMP2]], label [[SCALAR_PH]], label [[VECTOR_PH_SPLIT:%.*]]
+; EE-SCALAR:       vector.ph.split:
+; EE-SCALAR-NEXT:    br label [[VECTOR_BODY:%.*]]
+; EE-SCALAR:       vector.body:
+; EE-SCALAR-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH_SPLIT]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; EE-SCALAR-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 4
+; EE-SCALAR-NEXT:    [[TMP3:%.*]] = getelementptr ptr, ptr [[GEP_SRC]], i64 [[INDEX_NEXT]]
+; EE-SCALAR-NEXT:    [[UNCOUNTABLE_EXIT_MASK:%.*]] = call <4 x i1> @llvm.get.active.lane.mask.v4i1.i64(i64 [[INDEX_NEXT]], i64 [[N_VEC]])
----------------
david-arm wrote:

It might be safer to simply select between all-true and all-false based on index==n_vec-4, that way you can also avoid having to worry about wrapping.

https://github.com/llvm/llvm-project/pull/148626


More information about the llvm-commits mailing list