[PATCH] D101357: [CodeGen][AArch64][SVE] Substitute [rdffr, ptest] => rdffrs

Tue May 4 03:39:20 PDT 2021

peterwaller-arm added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:1385-1388
+    case AArch64::RDFFR_P: {
+      // rdffr   p1.b               <--- Definition of Pred
+      // ptest   Mask=p0, Pred=p1.b <--- Can always optimize this to
+      //                                 rdffrs p1.b, p0/z
----------------
paulwalker-arm wrote:
> Is this true? I don't see how you know the value of `p0` just prior to `rdffr`.  There's nothing to say `p0` has been defined at this point nor that it contains the same value as it will by the time the `ptest` is hit.
> 
> Although not exactly the same this is what the
> ```
> auto *RDFFRMask = MRI->getUniqueVRegDef(RDFFR->getOperand(1).getReg());
> if (Mask != RDFFRMask)
>   return false;
> ```
> code for the `AArch64::RDFFR_PPz` case is protecting against.
> 
> Perhaps this case is worth a separate patch?
Great catch, agree that seems problematic.

If the only use of p1.b after the rdffr is at the ptest, can we think of it as optimizing the ptest instead (as the comment is written)? e.g. hasOneNonDBGUser(rdffr.Operand) && getUniqueVRegDef(ptest.Pred) == rdffr.

If that is a simple enough solution and seems obviously correct to you, I'll keep it in-patch, otherwise split.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:1410-1412
+      PTest->eraseFromParent();
+      RDFFR->eraseFromParent();
+      return true;
----------------
paulwalker-arm wrote:
> Not saying there's anything wrong but I just wondered why you cannot fall through here?  Is the following code specific to the other case blocks? I only ask because `RDFFR_PPz->RDFFRS_PPz` looks very similar to `BRKN_PPzP->BRKNS_PPzP`.
I think you are correct that this could fall through.

It ended up like this because I first implemented RDFFR_P, which AIUI cannot fall through because it grows an operand. Then I used the same implementation strategy for RDFFR_PP. When I came to consider falling through I concluded that the non-fall-through code was more straightforward to read/reason about. It didn't seem to me a too much extra code (we 'spend 4 extra lines of obvious code' to 'get to a `return true` 30 lines of code earlier' which seemed a worthwhile trade).

I'm guessing that updating the existing MI is done to avoid additional allocations and reduce the total code slightly? Additionally it has to add NZCV as an operand and mark it live, none of which looks necessary if the instruction has been built.

What is your preference in this case? Shall I change to fall-through?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101357/new/

https://reviews.llvm.org/D101357