[PATCH] D110841: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

Tue Oct 5 02:58:52 PDT 2021

dmgreen added a comment.

> At instruction selection level, we check the operations which do not zero-out the high half of the 64-bit register using `isDef32` function as below.
>
>   // Any instruction that defines a 32-bit result zeros out the high half of the
>   // register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
>   // be copying from a truncate. But any other 32-bit operation will zero-extend
>   // up to 64 bits. AssertSext/AssertZext aren't saying anything about the upper
>   // 32 bits, they're probably just qualifying a CopyFromReg.
>   static inline bool isDef32(const SDNode &N) {
>     unsigned Opc = N.getOpcode();
>     return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
>            Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
>            Opc != ISD::AssertZext && Opc != ISD::AssertAlign &&
>            Opc != ISD::FREEZE;
>   }
>
> As you can see, the `isDef32` checks fundamentally `ISD::TRUNCATE` within a basic block because the below pattern is matched with the `ISD::TRUNCATE` and `EXTRACT_SUBREG` does not guarantee the high 32 bits are zero.
>
>   // To truncate, we can simply extract from a subregister.
>   def : Pat<(i32 (trunc GPR64sp:$src)),
>             (i32 (EXTRACT_SUBREG GPR64sp:$src, sub_32))>;
>
> This patch checks the `ORRWrs` has `EXTRACT_SUBREG` in different basic block as operand because the existing pattern with `isDef32` resolves case in which the operand is in same basic block.

That explains things in terms of DAG-ISel, but there are other instruction selectors and different optimization between then and here. (Plus the isDef32 has had so many bugs it's difficult to trust!)

We know that all (?) instructions that generate a W register under AArch64 will zero the upper bits of the X register. We seems to say in this patch that certain EXTRACT_SUBREG are not valid, COPY and PHI are currently excluded. Is that really all we have to worry about? Do we know that the top bits are always 0 for all other grp32 sources?

================
Comment at: llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp:215
+  MachineInstr *SrcMI = MRI->getUniqueVRegDef(MI.getOperand(2).getReg());
+  if (SrcMI->getParent() == MI.getParent())
+    return false;
----------------
jaykang10 wrote:
> dmgreen wrote:
> > Do we need to avoid when they are in the same block? It would be easier for testing if we didn't.
> If the pattern is in same block, I thought it has already been handled by existing pattern with `isDef32` and we do not need to check it.
> ```
> def def32 : PatLeaf<(i32 GPR32:$src), [{
>   return isDef32(*N);
> }]>;
> 
> // In the case of a 32-bit def that is known to implicitly zero-extend,
> // we can use a SUBREG_TO_REG.
> def : Pat<(i64 (zext def32:$src)), (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>;
> ```
But, is that needed for this pass? I can see it is how DAG ISel works, but if there are this pattern of code in the same block, it should still work fine, shouldn't it?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110841/new/

https://reviews.llvm.org/D110841