[PATCH] D138353: [Passes][VectorCombine] enable early run generally and try load folds

Sat Nov 19 08:15:39 PST 2022

spatel added inline comments.

================
Comment at: llvm/lib/Passes/PassBuilderPipelines.cpp:618-619

-  // The matrix extension can introduce large vector operations early, which can
-  // benefit from running vector-combine early on.
-  if (EnableMatrix)
-    FPM.addPass(VectorCombinePass(/*TryEarlyFoldsOnly=*/true));
+  // Try vectorization/scalarization transforms that are likely to be reduced by
+  // GVN and InstCombine.
+  FPM.addPass(VectorCombinePass(/*TryEarlyFoldsOnly=*/true));
----------------
lebedev.ri wrote:
> What does "reduced" here mean? "obscured"?
No, that was supposed to mean "enable more folds". 
In the motivating example from #17113, we have:

```
  %2 = load float, ptr %0, align 16
  %3 = insertelement <4 x float> undef, float %2, i64 0
  %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
  %5 = load float, ptr %4, align 4

```
VectorCombine can widen the first load (with legality/profitability constraints):

```
  %2 = load <4 x float>, ptr %0, align 16
  %3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
  %5 = load float, ptr %4, align 4

```
And GVN then replaces the redundant 2nd load:

```
  %2 = load <4 x float>, ptr %0, align 16
  %3 = shufflevector <4 x float> %2, <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
  %4 = getelementptr inbounds [4 x float], ptr %0, i64 0, i64 1
  %5 = bitcast <4 x float> %2 to i128
  %6 = lshr i128 %5, 32
  %7 = trunc i128 %6 to i32
  %8 = bitcast i32 %7 to float

```
And then InstCombine manages to remove all of those extra instructions.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138353/new/

https://reviews.llvm.org/D138353