[PATCH] D138203: [AArch64][InstCombine] Simplify repeated complex patterns in dupqlane

Matt Devereau via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 4 08:58:26 PST 2023


MattDevereau marked 2 inline comments as done.
MattDevereau added inline comments.


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1437
 
+static SmallVector<Value *> SimplifyValuePattern(SmallVector<Value *> Vec) {
+  std::size_t VecSize = Vec.size();
----------------
sdesmalen wrote:
> It's better to take and return an ArrayRef<Value*>, rather than a SmallVector object.
With the addition of allowing inserts into poison vectors to match patterns, the `SmallVector` now has to be mutable in order to replace poison values. 


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1443
+
+  SmallVector<Value *> Lhs(Vec.begin(), Vec.begin() + HalfVecSize);
+  SmallVector<Value *> Rhs(Vec.begin() + HalfVecSize, Vec.end());
----------------
sdesmalen wrote:
> if the code is not going to modify the content of the array, it's better to use ArrayRef to avoid creating a new object and copying (existing) data.
With the addition of allowing inserts into poison vectors to match patterns, the `SmallVector` now has to be mutable in order to replace poison values. 


================
Comment at: llvm/lib/Target/AArch64/AArch64TargetTransformInfo.cpp:1476-1477
+        dyn_cast<InsertElementInst>(CurrentInsertElt);
+    if (!InsertElt)
+      break;
+    auto Idx = cast<ConstantInt>(InsertElt->getOperand(2));
----------------
sdesmalen wrote:
> When the loop `break`s here, it will leave the other elements in InsertEltVec as `nullptr` from which you seem to infer `poison` or `undef` in SimplifyValuePattern.
> That's not always correct, because the IR could be inserting into some other vector, e.g. `<4 x i32> %arg` where these values are defined (it would be good to have a test for this)
This should now be ok with the inserting into poison logic added. I added the test `dupq_f16_ab_pattern_no_end_indices_not_poison` which I believe should cover this.


================
Comment at: llvm/test/Transforms/InstCombine/AArch64/sve-intrinsic-dupqlane.ll:86
+
+define dso_local <vscale x 8 x half> @dupq_f16_complex_missing_middle(half %a, half %b) {
+; CHECK-LABEL: @dupq_f16_complex_missing_middle(
----------------
sdesmalen wrote:
> MattDevereau wrote:
> > sdesmalen wrote:
> > > This case could still work right? If the two elements that are missing are both undef, they could be anything including `a, b`.
> > The case can definitely work, however when re-implenting this patch as a recursive algorithm I ran into a few headaches when trying to integrate null pointers/poision values into the recursion. It is entirely possible to do this, however if there is minimal gain for the time required this I'd suggest it be done as a separate patch.
> > 
> > Some cases I ran into issues with were how to handle cases such as:
> > `<a, b, c, nullptr, a, b, c, d>`, where nullptr respresents poison elements.  Logically you'd want to pick the right half as a pattern as it has no undefined values, but things start getting complicated with cases such as `<a, b, nullptr, nullptr, nullptr, nullptr, nullptr, d>`, and `<a, nullptr, a, nullptr, nullptr, b, nullptr, b>` It should be possible to simplify these, however I suspect it would be easier to write a separate algorithm from what I've done here to handle poison cases. 
> I wouldn't expect the undef/poison case to be that difficult to handle. 
> 
> E.g. for `<a, b, _, _, _, _, _, d>` it could compare `<a, b, _, _>` with `<_, _, _, d>`, which would match the most specific values `<a, b, _, d>`.
> For `<a, _, a, _, _, b, _, b>` it could compare `<a, _, a, _>` with `<_, b, _, b>` which would match `<a, b, a, b>`, which could then be further broken down to `<a, b>`.
I had an "aha" moment when reading this comment. In the previous diff revision I was doing the recursive algorithm from the bottom up - i.e. splitting the vectors into two, passing them down the recursion chain, doing the comparison logic, and then passing it back up the recursion chain for further comparison which was causing a lot of strife with more complicated patterns and poison values. Instead it is much simpler to approach it from a top-down angle where the nullptr/poison/_ values get replaced immediately and the recursion call is in the return statement. The tests `dupq_f16_ab_pattern_no_front_indices`, `dupq_f16_ab_pattern_no_end_indices` and `dupq_f16_ab_pattern_no_end_indices` should show this in action.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D138203/new/

https://reviews.llvm.org/D138203



More information about the llvm-commits mailing list