[PATCH] D106399: [VectorCombine] Widening of partial vector loads

Wed Jul 21 08:28:46 PDT 2021

spatel added inline comments.

================
Comment at: llvm/lib/Transforms/Vectorize/VectorCombine.cpp:312-313
+  std::iota(Mask.begin(), Mask.end(), 0);
+  Value *SmallVec =
+      Builder.CreateShuffleVector(WideVecLd, PoisonValue::get(WideVecTy), Mask);
+  replaceValue(I, *SmallVec);
----------------
lebedev.ri wrote:
> spatel wrote:
> > Use the unary variant of `CreateShuffleVector` here.
> > Could call this value `ExtractSubvector` or state that in the code comment. Can we always assume that the extract op is free, or should we add that potential cost into the equation?
> That subvector we're extracting here will be legalized (widened) into the full vector we've just loaded,
> so the shuffle just pretends that a few high elements of that legal vector don't exist.
> That is the whole assumption behind the transformation here.
> There isn't any actual subvector extraction here.
> So i'm not really seeing why we'd need to check the cost of that.
Ok - please add a code comment with that explanation then. That way, we'll document why we don't factor cost of the shuffle.

================
Comment at: llvm/test/Transforms/VectorCombine/X86/load-widening.ll:194
 ; We can't tell if we can load more than 256 bits.
 define <9 x float> @vec_with_9elts_256bits(<9 x float>* align 32 dereferenceable(32) %p) {
 ; CHECK-LABEL: @vec_with_9elts_256bits(
----------------
Can you explain the difference between this test and vec_with_7elts_256bits for an SSE target? It's not obvious to me why we are ok widening to 256-bit if that's not legal, but not wider than that.

================
Comment at: llvm/test/Transforms/VectorCombine/X86/load-widening.ll:4
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=avx2 --data-layout="e" | FileCheck %s --check-prefixes=CHECK
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=sse2 --data-layout="E" | FileCheck %s --check-prefixes=CHECK
 ; RUN: opt < %s -vector-combine -S -mtriple=x86_64-- -mattr=avx2 --data-layout="E" | FileCheck %s --check-prefixes=CHECK
----------------
lebedev.ri wrote:
> spatel wrote:
> > Hmm...I don't think I've ever tried that experiment. :)
> > Did you confirm that the layout "wins" over the target to provide the coverage you expected?
> https://alive2.llvm.org/ce/z/yAGpMV
I'm not questioning the logic, just that there is no big-endian x86 target, so I don't know what is happening internally in LLVM in this situation.

I think it would be better to add a different test file for a target that actually does support both modes (AArch64 or PowerPC?). We still want an x86 test file to verify that SSE vs. AVX is behaving as expected though.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106399/new/

https://reviews.llvm.org/D106399