[PATCH] D100018: [WebAssembly] Add shuffles as an option for lowering BUILD_VECTOR

Thu Apr 8 17:11:30 PDT 2021

tlively added inline comments.

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1640
+  // If the lane is extracted from another vector at a constant index, return
+  // that vector. The source vector must not have more lanes than the dest.
+  auto GetShuffleSrc = [&](const SDValue &Lane) {
----------------
aheejin wrote:
> > The source vector must not have more lanes than the dest.
> 
> Why? Shuffles can only include half of elements anyway, no?
If the source vector has more lanes than the destination, then those lanes would be narrower. The shufflevector SDNode uses indices based on the wider lanes of the destination type, so it cannot express the extension and use of smaller lanes in one of the source vectors.

In contrast, if the source vector has fewer lanes than the destination, then those lanes would be wider. The extract_vector_elt operand to the BUILD_VECTOR node would therefore be doing an implicit truncate, and by scaling up the indices for the smaller destination lanes, the truncated portions of the wider source lanes can be correctly pulled in.

I'll expand on this comment a bit.

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1693
+      AddCount(ShuffleCounts, ShuffleSrc);
+    if (CanSwizzle) {
       auto SwizzleSrcs = GetSwizzleSrcs(I, Lane);
----------------
aheejin wrote:
> dschuff wrote:
> > Unrelated to this CL but what exactly is the difference between a swizzle and a shuffle?
> - Shuffle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#shuffling-using-immediate-indices
> - Swizzle: https://github.com/WebAssembly/simd/blob/master/proposals/simd/SIMD.md#swizzling-using-variable-indices
In a single sentence: Shuffling uses a static array of indices to draw lanes from two source vectors while swizzling uses the lanes of one source vector as indices into a second source vector.

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1711-1729
+  // Shuffles can draw from up to two vectors, so find the two most common
+  // sources.
+  SDValue ShuffleSrc1, ShuffleSrc2;
+  size_t NumShuffleLanes = 0;
+  if (ShuffleCounts.size()) {
+    std::tie(ShuffleSrc1, NumShuffleLanes) = GetMostCommon(ShuffleCounts);
+    ShuffleCounts.erase(std::remove_if(ShuffleCounts.begin(),
----------------
aheejin wrote:
> Does this handle when it is most beneficial to use two same vectors as both sources?
The shuffle can draw an arbitrary number of lanes from each source, so it is never necessary to use the same source for both operands. That being said, if there is only one available source, ShuffleSrc2 will be set to `undef` here and will be later made a copy of the first operand, for lack of any better vector to use there.

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1735
   SDValue Result;
-  // Prefer swizzles over vector consts over splats
-  if (NumSwizzleLanes >= NumSplatLanes && NumSwizzleLanes >= NumConstantLanes) {
+  // Prefer swizzles over shuffles over vector consts over splats
+  if (NumSwizzleLanes >= NumShuffleLanes &&
----------------
aheejin wrote:
> Can we have some more comments on why we prefer this order?
It's more or less arbitrary, but now that I'm thinking about it, it would probably be better to prefer the simpler/smaller operations like splat over the more complex operations like shuffles and swizzles. I'll change this order in a follow-up PR and add a comment there.

================
Comment at: llvm/test/CodeGen/WebAssembly/simd-concat.ll:75
+; CHECK-NEXT:    local.get 1
+; CHECK-NEXT:    i8x16.shuffle 0, 1, 2, 3, 8, 9, 10, 11, 16, 17, 18, 19, 24, 25, 26, 27
+; CHECK-NEXT:    # fallthrough-return
----------------
aheejin wrote:
> I might be mistaken, but isn't this loss of data? v2i32 is a full 128bits vector and the result of `shufflevector` contain the whole two vectors but the result of `i8x16.shuffle` contains only half of it.
v2i32 is only 64 bits, but it is represented in Wasm using the low 32 bits of each lane in an i64x2 vector. The i8x16.shuffle here pulls in those low 32 bits from each lane and leaves the unused high 32 bits.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D100018/new/

https://reviews.llvm.org/D100018