[PATCH] D110841: [AArch64] Remove redundant ORRWrs which is generated by zero-extend

Tue Oct 5 06:30:04 PDT 2021

jaykang10 added a comment.

In D110841#3042058 <https://reviews.llvm.org/D110841#3042058>, @dmgreen wrote:

>> At instruction selection level, we check the operations which do not zero-out the high half of the 64-bit register using `isDef32` function as below.
>>
>>   // Any instruction that defines a 32-bit result zeros out the high half of the
>>   // register. Truncate can be lowered to EXTRACT_SUBREG. CopyFromReg may
>>   // be copying from a truncate. But any other 32-bit operation will zero-extend
>>   // up to 64 bits. AssertSext/AssertZext aren't saying anything about the upper
>>   // 32 bits, they're probably just qualifying a CopyFromReg.
>>   static inline bool isDef32(const SDNode &N) {
>>     unsigned Opc = N.getOpcode();
>>     return Opc != ISD::TRUNCATE && Opc != TargetOpcode::EXTRACT_SUBREG &&
>>            Opc != ISD::CopyFromReg && Opc != ISD::AssertSext &&
>>            Opc != ISD::AssertZext && Opc != ISD::AssertAlign &&
>>            Opc != ISD::FREEZE;
>>   }
>>
>> As you can see, the `isDef32` checks fundamentally `ISD::TRUNCATE` within a basic block because the below pattern is matched with the `ISD::TRUNCATE` and `EXTRACT_SUBREG` does not guarantee the high 32 bits are zero.
>>
>>   // To truncate, we can simply extract from a subregister.
>>   def : Pat<(i32 (trunc GPR64sp:$src)),
>>             (i32 (EXTRACT_SUBREG GPR64sp:$src, sub_32))>;
>>
>> This patch checks the `ORRWrs` has `EXTRACT_SUBREG` in different basic block as operand because the existing pattern with `isDef32` resolves case in which the operand is in same basic block.
>
> That explains things in terms of DAG-ISel, but there are other instruction selectors and different optimization between then and here. (Plus the isDef32 has had so many bugs it's difficult to trust!)
>
> We know that all (?) instructions that generate a W register under AArch64 will zero the upper bits of the X register. We seems to say in this patch that certain EXTRACT_SUBREG are not valid, COPY and PHI are currently excluded. Is that really all we have to worry about? Do we know that the top bits are always 0 for all other grp32 sources?

I agree with you. I am also not sure about it...

As you mentioned, If AArch64's 32-bit form of instruction defines the source operand of zero-extend, we do not need the zero-extend.

  From https://developer.arm.com/documentation/dui0801/b/BABBGCAC
  When you use the 32-bit form of an instruction, the upper 32 bits of the source registers are ignored and the upper 32 bits of the destination register are set to zero.

We need to check the zero-extend's source operands which do not come from the destination register of AArch64's 32-bit form instructions. I thought we are not sure the upper 32-bits set to zero in below two cases.

1. Function argument - There would be copy instructions from physical register to virtual register.
2. EXTRACT_SUBREG

If you feel there are more cases, please let me know. Let me update code with the cases.

For keeping track of PHI and COPY's operands, if possible, I would like to handle them after we clearly understanding the cases which does not guarantee the upper 32-bit set to zero. I am sorry for that...

================
Comment at: llvm/lib/Target/AArch64/AArch64MIPeepholeOpt.cpp:215
+  MachineInstr *SrcMI = MRI->getUniqueVRegDef(MI.getOperand(2).getReg());
+  if (SrcMI->getParent() == MI.getParent())
+    return false;
----------------
dmgreen wrote:
> jaykang10 wrote:
> > dmgreen wrote:
> > > Do we need to avoid when they are in the same block? It would be easier for testing if we didn't.
> > If the pattern is in same block, I thought it has already been handled by existing pattern with `isDef32` and we do not need to check it.
> > ```
> > def def32 : PatLeaf<(i32 GPR32:$src), [{
> >   return isDef32(*N);
> > }]>;
> > 
> > // In the case of a 32-bit def that is known to implicitly zero-extend,
> > // we can use a SUBREG_TO_REG.
> > def : Pat<(i64 (zext def32:$src)), (SUBREG_TO_REG (i64 0), GPR32:$src, sub_32)>;
> > ```
> But, is that needed for this pass? I can see it is how DAG ISel works, but if there are this pattern of code in the same block, it should still work fine, shouldn't it?
Yep, I agree with you. Let me remove the check for different block.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D110841/new/

https://reviews.llvm.org/D110841