[PATCH] D68527: [WebAssembly] v8x16.swizzle and rewrite BUILD_VECTOR lowering

Thomas Lively via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Oct 8 13:31:15 PDT 2019


tlively added a comment.

In D68527#1699832 <https://reviews.llvm.org/D68527#1699832>, @aheejin wrote:

> - I remember before we had a somewhat complicated logic to calculate the number of bytes of total instructions of each case of the case we use `v128.const` and vs. when we use splats. Don't we need that anymore? Can we make the decision solely based the number of swizzles / consts / and splats?


Minimizing code size is not as important for SIMD as maximizing performance, so I dumped that complicated logic. I am led to believe that `v128.const` will be faster than splats once it is implemented, but we have no way to measure yet. We can make the decision based on whatever heuristic we want, but minimizing number of instructions seems like a good metric for now until we can run experiments to tune the selection algorithm.

> - Is the performance of `v128.const` better than splats? How is performance of swizzles compared to `v128.const`?

Yes, I believe `v128.const` will be faster than splats. I don't know how swizzles and `v128.const` compare, but I do know that emulating swizzles requires a lot of instructions per lane but emulating a v128.const only requires a single `replace_lane` and constant per lane. So it makes sense to prefer swizzles over `v128.const`s for now.



================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1350
     }
+  };
+
----------------
aheejin wrote:
> Would using `count_if` in place of `find_if` be simpler?
No, I need to get the iterator to the proper entry since I'm using a vector as an associative array here. If I used `count_if` and it returned 1, I would still need to find the entry to increment the count, so `find_if` is simpler because it gives me that entry directly.


================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1388
+    std::forward_as_tuple(std::tie(SwizzleSrc, SwizzleIndices),
+                          NumSwizzleLanes) = GetMostCommon(SwizzleCounts);
+
----------------
aheejin wrote:
> Is using `forward_as_tuple` any different from using `tie` again in this case, given that this is not passed as an argument to a function?
It turns out you can't nest `std::tie`. I have no idea why, but I got this solution from https://stackoverflow.com/questions/21298732/can-we-do-deep-tie-with-a-c1y-stdtie-like-function.


================
Comment at: llvm/lib/Target/WebAssembly/WebAssemblyISelLowering.cpp:1426
+        (SplattedLoad = dyn_cast<LoadSDNode>(SplatValue)) &&
+        SplattedLoad->getMemoryVT() == VecT.getVectorElementType()) {
+      Result = DAG.getNode(WebAssemblyISD::LOAD_SPLAT, DL, VecT, SplatValue);
----------------
aheejin wrote:
> It's not in this CL, but is there a case this condition is not satisfied?
Yes, for example when doing a sign extending load of an i8 to an i32 then splatting that i32.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68527/new/

https://reviews.llvm.org/D68527





More information about the llvm-commits mailing list