[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane

Matt Devereau via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 3 08:19:52 PST 2023


MattDevereau added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1425-1428
+  // Check for the pattern (y, x, y, x) or (y, x, y, x, y, x, y, x)
+  for (uint64_t i = 0; i < NumElements - 2; i++)
+    if (RSequence[i] != RSequence[i + 2])
+      return None;
----------------
sdesmalen wrote:
> This is a little bit restrictive, because it could also work for e.g.
> 
> <16 x i8>  <a, b, c, d, a, b, c, d, a, b, c, d, a, b, c, d>
> 
> where only `<a, b, c, d>` would need to be splat.
> 
> It might help instead to find the 'minimum' set by recursively halving the vector and seeing if all elements match. e.g.
> 
>   <a, b, a, b, a, b, a, b>
>   => <a, b, a, b> == <a, b, a, b>
>   => <a, b> == <a, b>
> 
> so that the minimum set to splat is `<a, b>`
I've implemented a recursive function which now handles `<a, b, c, d, a, b, c, d>`


================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:86
+
+define dso_local <vscale x 8 x half> @dupq_f16_complex_missing_middle(half %a, half %b) {
+; CHECK-LABEL: @dupq_f16_complex_missing_middle(
----------------
sdesmalen wrote:
> This case could still work right? If the two elements that are missing are both undef, they could be anything including `a, b`.
The case can definitely work, however when re-implenting this patch as a recursive algorithm I ran into a few headaches when trying to integrate null pointers/poision values into the recursion. It is entirely possible to do this, however if there is minimal gain for the time required this I'd suggest it be done as a separate patch.

Some cases I ran into issues with were how to handle cases such as:
`<a, b, c, nullptr, a, b, c, d>`, where nullptr respresents poison elements.  Logically you'd want to pick the right half as a pattern as it has no undefined values, but things start getting complicated with cases such as `<a, b, nullptr, nullptr, nullptr, nullptr, nullptr, d>`, and `<a, nullptr, a, nullptr, nullptr, b, nullptr, b>` It should be possible to simplify these, however I suspect it would be easier to write a separate algorithm from what I've done here to handle poison cases. 


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138203/new/

https://reviews.llvm.org/D138203



More information about the llvm-commits mailing list