[PATCH] D122145: [SLP] Cluster ordering for loads

Tue Apr 5 04:03:35 PDT 2022

ABataev added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp:3399-3400
+  for (auto &Base : Bases) {
+    for (auto &T : Base.second)
+      SortedIndices.push_back(std::get<2>(T));
+  }
----------------
What if we have non-power-of-2 number of elements in each cluster?

================
Comment at: llvm/test/Transforms/SLPVectorizer/AArch64/loadorder.ll:354-355
 ; CHECK-NEXT:    [[TMP7:%.*]] = load <4 x i16>, <4 x i16>* [[TMP6]], align 2
-; CHECK-NEXT:    [[TMP8:%.*]] = load i16, i16* [[ARRAYIDX20]], align 2
-; CHECK-NEXT:    [[TMP9:%.*]] = load i16, i16* [[ARRAYIDX23]], align 2
-; CHECK-NEXT:    [[TMP10:%.*]] = load i16, i16* [[ARRAYIDX26]], align 2
-; CHECK-NEXT:    [[TMP11:%.*]] = load i16, i16* [[ARRAYIDX29]], align 2
-; CHECK-NEXT:    [[TMP12:%.*]] = shufflevector <4 x i16> [[TMP7]], <4 x i16> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
-; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <8 x i16> [[TMP12]], i16 [[TMP9]], i64 4
-; CHECK-NEXT:    [[TMP14:%.*]] = insertelement <8 x i16> [[TMP13]], i16 [[TMP8]], i64 5
-; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <8 x i16> [[TMP14]], i16 [[TMP11]], i64 6
-; CHECK-NEXT:    [[TMP16:%.*]] = insertelement <8 x i16> [[TMP15]], i16 [[TMP10]], i64 7
-; CHECK-NEXT:    [[TMP17:%.*]] = shufflevector <4 x i16> [[TMP1]], <4 x i16> poison, <8 x i32> <i32 1, i32 0, i32 3, i32 2, i32 undef, i32 undef, i32 undef, i32 undef>
-; CHECK-NEXT:    [[TMP18:%.*]] = insertelement <8 x i16> [[TMP17]], i16 [[TMP3]], i64 4
-; CHECK-NEXT:    [[TMP19:%.*]] = insertelement <8 x i16> [[TMP18]], i16 [[TMP2]], i64 5
-; CHECK-NEXT:    [[TMP20:%.*]] = insertelement <8 x i16> [[TMP19]], i16 [[TMP5]], i64 6
-; CHECK-NEXT:    [[TMP21:%.*]] = insertelement <8 x i16> [[TMP20]], i16 [[TMP4]], i64 7
-; CHECK-NEXT:    [[TMP22:%.*]] = mul <8 x i16> [[TMP16]], [[TMP21]]
-; CHECK-NEXT:    [[TMP23:%.*]] = call i16 @llvm.vector.reduce.add.v8i16(<8 x i16> [[TMP22]])
-; CHECK-NEXT:    ret i16 [[TMP23]]
+; CHECK-NEXT:    [[TMP8:%.*]] = mul <4 x i16> [[TMP5]], [[TMP1]]
+; CHECK-NEXT:    [[TMP9:%.*]] = mul <4 x i16> [[TMP7]], [[TMP3]]
+; CHECK-NEXT:    [[TMP10:%.*]] = shufflevector <4 x i16> [[TMP8]], <4 x i16> [[TMP9]], <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7>
----------------
Looks like a regression here, worth investigation.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122145/new/

https://reviews.llvm.org/D122145