[PATCH] D107502: [WebAssembly] Legalize vector types by widening

Tue Aug 17 20:54:36 PDT 2021

tlively added inline comments.

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.h:121-123
+    // Boolean values should be promoted to full lanes.
+    if (VT.getVectorElementType() == MVT::i1)
+      return TypePromoteInteger;
----------------
aheejin wrote:
> Why is the exception?
The target-independent code that figures out how to legalize types[1] tries the strategies in this order:

1. PromoteInteger (i.e. make individual lanes types wider)
2. WidenVector (i.e. keep the lane types but add more lanes)
3. Split or Scalarize

The preferred vector action says which strategy to start with, but if that fails, subsequent strategies will still be tried. The important part here is that strategies prior to the preferred strategy will not be tried. So if we set the preferred action to be WidenVector for all vector types, whenever that fails the vector will be split or scalarized instead of integer-promoted. Since we don't have any legal i1 vector types, this means that all i1 vectors would be scalarized without this exception, leading to terrible codegen.

Thinking on this more, it seems that we should actually be more selective and opt into WidenVector as the preferred action only for i8, i16, i32, i64, f32, and f64 vectors, since those are the only vectors for which WidenVector can possibly succeed.

[1] https://github.com/llvm/llvm-project/blob/a7ebc4d145892fd22442832549cb12c4b6920dea/llvm/lib/CodeGen/TargetLoweringBase.cpp#L1403-L1499

================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyInstrSIMD.td:199-205
+def load_splat_scalar :
+  PatFrag<(ops node:$addr), (scalar_to_vector (i64 (load $addr)))>;
+defm : LoadPatNoOffset<v2i64, load_splat_scalar, "LOAD64_SPLAT">;
+defm : LoadPatImmOff<v2i64, load_splat_scalar, regPlusImm, "LOAD64_SPLAT">;
+defm : LoadPatImmOff<v2i64, load_splat_scalar, or_is_add, "LOAD64_SPLAT">;
+defm : LoadPatOffsetOnly<v2i64, load_splat_scalar, "LOAD64_SPLAT">;
+defm : LoadPatGlobalAddrOffOnly<v2i64, load_splat_scalar, "LOAD64_SPLAT">;
----------------
aheejin wrote:
> - If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary?
> - Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2?
> - If what we want is to insert the element in the index 0 of the new vector, why is the splat necessary?

Good point, I guess we could use v128.load64_zero here instead.

> - Why is only `load64_splat` necessary? What happens when we widen i8x4 into i32x4? In this case do we need `load32_splat`? Is widening only happens by a factor of 2?

We don't have any tests that cover those cases right now, but yes, I think we would need load32_splat or load32_zero in that case. Note that we could go even smaller and have e.g. v2i8. Unfortunately load16_zero does not exist, so there load16_splat would probably be the best option. We could also use load16_lane, but that would require an extra zero input vector. Since we don't have test cases that depend on these additional patterns right now, I would like to address them in a follow-up.

================
Comment at: llvm/test/CodeGen/WebAssembly/simd-load-store-alignment.ll:297
 ; CHECK-NEXT:    local.get 0
-; CHECK-NEXT:    i16x8.load8x8_u 0:p2align=0
+; CHECK-NEXT:    v128.load64_splat 0:p2align=0
 ; CHECK-NEXT:    # fallthrough-return
----------------
aheejin wrote:
> It looks this actually changes the result being returned compared to the previously generate code. Is that fine?
This is expected because the way we are representing these vector types has changed. It is an ABI break for vector code, but I hope that will be ok in practice. Perhaps it's worth mentioning in the LLVM release notes?

@dschuff, do you have thoughts here?

@srj, will that cause problems for Halide?

================
Comment at: llvm/test/CodeGen/WebAssembly/simd-offset.ll:926
 ; CHECK-NEXT:    local.get 0
-; CHECK-NEXT:    v128.and
-; CHECK-NEXT:    local.get 0
-; CHECK-NEXT:    i8x16.narrow_i16x8_u
-; CHECK-NEXT:    i64x2.extract_lane 0
-; CHECK-NEXT:    i64.store 0
+; CHECK-NEXT:    v128.store64_lane 0, 0
 ; CHECK-NEXT:    # fallthrough-return
----------------
aheejin wrote:
> Now that we are doing this, are the narrowing store patterns added in D84377 necessary?
Nice catch! It looks like that can all be removed now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107502/new/

https://reviews.llvm.org/D107502