[PATCH] D106399: [VectorCombine] Widening of partial vector loads

Wed Jul 21 08:03:08 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:290
+
+  // Okay, we currently load less than full worth of the legalized vectors.
+  // If we'd widen the load, would that be more costly than the current load?
----------------
spatel wrote:
> worth -> width
This is not a typo, but okay.

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:312-313
+  std::iota(Mask.begin(), Mask.end(), 0);
+  Value *SmallVec =
+      Builder.CreateShuffleVector(WideVecLd, PoisonValue::get(WideVecTy), Mask);
+  replaceValue(I, *SmallVec);
----------------
spatel wrote:
> Use the unary variant of `CreateShuffleVector` here.
> Could call this value `ExtractSubvector` or state that in the code comment. Can we always assume that the extract op is free, or should we add that potential cost into the equation?
That subvector we're extracting here will be legalized (widened) into the full vector we've just loaded,
so the shuffle just pretends that a few high elements of that legal vector don't exist.
That is the whole assumption behind the transformation here.
There isn't any actual subvector extraction here.
So i'm not really seeing why we'd need to check the cost of that.

================
Comment at: llvm/test/Transforms/VectorCombine/X86/load-inseltpoison.ll:590
 ; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x float>* [[P:%.*]] to <4 x float>*
-; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, <4 x float>* [[TMP1]], align 16
-; CHECK-NEXT:    [[R:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <4 x i32> <i32 0, i32 undef, i32 undef, i32 undef>
+; CHECK-NEXT:    [[TMP2:%.*]] = load <4 x float>, <4 x float>* [[TMP1]], align 4
+; CHECK-NEXT:    [[L:%.*]] = shufflevector <4 x float> [[TMP2]], <4 x float> poison, <2 x i32> <i32 0, i32 1>
----------------
spatel wrote:
> Can we preserve the better alignment?
Some other transformation does this, i'll take a look.

================
Comment at: llvm/test/Transforms/VectorCombine/X86/load-widening.ll:4
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=avx2 --data-layout="e" | FileCheck %s --check-prefixes=CHECK
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=sse2 --data-layout="E" | FileCheck %s --check-prefixes=CHECK
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=avx2 --data-layout="E" | FileCheck %s --check-prefixes=CHECK
----------------
spatel wrote:
> Hmm...I don't think I've ever tried that experiment. :)
> Did you confirm that the layout "wins" over the target to provide the coverage you expected?
https://alive2.llvm.org/ce/z/yAGpMV

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106399/new/

https://reviews.llvm.org/D106399