[llvm] [LoadStoreVectorizer] Fill gaps in load/store chains to enable vectorization (PR #159388)

Drew Kersnar via llvm-commits llvm-commits at lists.llvm.org
Wed Dec 3 09:35:51 PST 2025


================
@@ -22,17 +57,36 @@ define void @ldg_f16(ptr nocapture align 16 %rd0) {
   store <2 x half> %s4, ptr %in4, align 4
   ret void
 
-; CHECK-LABEL: @ldg_f16
-; CHECK: %[[LD:.*]] = load <8 x half>, ptr
-; CHECK: shufflevector <8 x half> %[[LD]], <8 x half> poison, <2 x i32> <i32 0, i32 1>
-; CHECK: shufflevector <8 x half> %[[LD]], <8 x half> poison, <2 x i32> <i32 2, i32 3>
-; CHECK: shufflevector <8 x half> %[[LD]], <8 x half> poison, <2 x i32> <i32 4, i32 5>
-; CHECK: shufflevector <8 x half> %[[LD]], <8 x half> poison, <2 x i32> <i32 6, i32 7>
-; CHECK: store <8 x half>
 }
 
 define void @no_nonpow2_vector(ptr nocapture align 16 %rd0) {
-  %load1 = load <3 x half>, ptr %rd0, align 4
+; CHECK-LABEL: define void @no_nonpow2_vector(
+; CHECK-SAME: ptr align 16 captures(none) [[RD0:%.*]]) {
+; CHECK-NEXT:    [[TMP1:%.*]] = call <8 x half> @llvm.masked.load.v8f16.p0(ptr align 16 [[RD0]], <8 x i1> <i1 true, i1 true, i1 true, i1 true, i1 true, i1 true, i1 false, i1 false>, <8 x half> poison)
+; CHECK-NEXT:    [[LOAD13:%.*]] = shufflevector <8 x half> [[TMP1]], <8 x half> poison, <3 x i32> <i32 0, i32 1, i32 2>
+; CHECK-NEXT:    [[LOAD24:%.*]] = shufflevector <8 x half> [[TMP1]], <8 x half> poison, <3 x i32> <i32 3, i32 4, i32 5>
+; CHECK-NEXT:    [[EXTEND5:%.*]] = extractelement <8 x half> [[TMP1]], i32 6
+; CHECK-NEXT:    [[EXTEND26:%.*]] = extractelement <8 x half> [[TMP1]], i32 7
+; CHECK-NEXT:    [[P1:%.*]] = fcmp ogt <3 x half> [[LOAD13]], zeroinitializer
+; CHECK-NEXT:    [[S1:%.*]] = select <3 x i1> [[P1]], <3 x half> [[LOAD13]], <3 x half> zeroinitializer
+; CHECK-NEXT:    store <3 x half> [[S1]], ptr [[RD0]], align 16
+; CHECK-NEXT:    [[IN2:%.*]] = getelementptr half, ptr [[RD0]], i64 3
+; CHECK-NEXT:    [[P2:%.*]] = fcmp ogt <3 x half> [[LOAD24]], zeroinitializer
+; CHECK-NEXT:    [[S2:%.*]] = select <3 x i1> [[P2]], <3 x half> [[LOAD24]], <3 x half> zeroinitializer
+; CHECK-NEXT:    store <3 x half> [[S2]], ptr [[IN2]], align 4
+; CHECK-NEXT:    [[IN3:%.*]] = getelementptr half, ptr [[RD0]], i64 6
+; CHECK-NEXT:    [[LOAD3:%.*]] = load <3 x half>, ptr [[IN3]], align 4
+; CHECK-NEXT:    [[P3:%.*]] = fcmp ogt <3 x half> [[LOAD3]], zeroinitializer
+; CHECK-NEXT:    [[S3:%.*]] = select <3 x i1> [[P3]], <3 x half> [[LOAD3]], <3 x half> zeroinitializer
+; CHECK-NEXT:    store <3 x half> [[S3]], ptr [[IN3]], align 4
+; CHECK-NEXT:    [[IN4:%.*]] = getelementptr half, ptr [[RD0]], i64 9
+; CHECK-NEXT:    [[LOAD4:%.*]] = load <3 x half>, ptr [[IN4]], align 4
+; CHECK-NEXT:    [[P4:%.*]] = fcmp ogt <3 x half> [[LOAD4]], zeroinitializer
+; CHECK-NEXT:    [[S4:%.*]] = select <3 x i1> [[P4]], <3 x half> [[LOAD4]], <3 x half> zeroinitializer
+; CHECK-NEXT:    store <3 x half> [[S4]], ptr [[IN4]], align 4
+; CHECK-NEXT:    ret void
+;
+  %load1 = load <3 x half>, ptr %rd0, align 16
----------------
dakersnar wrote:

I updated this test to match what we expect to see after InferAlignment runs and puts the alignment of 16 onto this `load <3 x half>`. This is better representative of an input the vectorizer should expect. And with that alignment + the masked load change, this test extends the two `load <3xhalf>` into an `<8xhalf>` masked load.

https://github.com/llvm/llvm-project/pull/159388


More information about the llvm-commits mailing list