[PATCH] D132325: [AArch64][CodeGen] Fold the mov and lsl into ubfiz

Tue Aug 30 06:40:07 PDT 2022

Allen added inline comments.

================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.td:7348
 def : Pat<(shl (i64 (sext GPR32:$Rn)), (i64 imm0_63:$imm)),
-          (SBFMXri (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$Rn, sub_32),
+          (SBFMXri (SUBREG_TO_REG (i64 0), GPR32:$Rn, sub_32),
+                   (i64 (i64shift_a        imm0_63:$imm)),
----------------
Allen wrote:
> paulwalker-arm wrote:
> > Allen wrote:
> > > efriedma wrote:
> > > > paulwalker-arm wrote:
> > > > > @efriedma / @Allen : Sorry for the naive question but I'm not hugely familiar with this node and the documentation says:
> > > > > ```
> > > > > /// All other bits are
> > > > > /// assumed to be equal to the bits in the immediate integer constant in the
> > > > > /// first operand. This instruction just communicates information; No code
> > > > > /// should be generated.
> > > > > ```
> > > > > So is this safe? I mean if no code is generated then in this instance how can we be sure `$Rn` has it's top 32bits zeroed? I'm kind of assuming this is why the original code is using `INSERT_SUBREG`?
> > > > > 
> > > > > The emitted code is valid, but could something query the `SUBREG_TO_REG` post isel and reuse/transform it in an invalid way?
> > > > Oh, I didn't notice it was changed from using INSERT_SUBREG to use SUBREG_TO_REG.  I just glanced over the patterns and assumed it was just using the same pattern we were already using.  Not sure why it was changed.  We normally only use SUBREG_TO_REG if we know the operand actually clears the high bits.
> > > > 
> > > > Not sure there's really any practical consequence here; there isn't very much code that's actually aware of the semantics of SUBREG_TO_REG.
> > > hi @paulwalker-arm @efriedma
> > >     According https://developer.arm.com/documentation/ddi0602/2021-12/Base-Instructions/UBFIZ--Unsigned-Bitfield-Insert-in-Zero--an-alias-of-UBFM-?lang=en, the insn ubfiz  copies specified bits from the source $Rn and clears other unspecified bits. 
> > >    In this pattern, the top 32bits will be clean, so I think it is save.
> > > 
> > > If we use the original **INSERT_SUBREG**, I find the test case **@loop2** in file llvm/test/CodeGen/AArch64/tbl-loops.ll regressses.
> > > ```
> > > +++ b/llvm/test/CodeGen/AArch64/tbl-loops.ll
> > > @@ -151,7 +151,8 @@ define void @loop2(i8* noalias nocapture noundef writeonly %dst, float* nocaptur
> > >  ; CHECK-NEXT:    cmp w8, #2
> > >  ; CHECK-NEXT:    b.ls .LBB1_4
> > >  ; CHECK-NEXT:  // %bb.2: // %vector.memcheck
> > > -; CHECK-NEXT:    ubfiz x9, x8, #1, #32
> > > +; CHECK-NEXT:    mov w9, w8
> > > +; CHECK-NEXT:    ubfiz x9, x9, #1, #32
> > > ```
> > > I think this is because the extra **%37: grp64all = IMPLICIT_DEF** need alloc a register for %37, which is not always specified the same register of %36
> > > **%36:gpr64 = INSERT_SUBREG %37：grp64all (tied-def 0), %24:gpr32common, %subreg.sub_32**
> > I can see the emitted code is correct but my concern relates to a scenario where a later pass decides to reuse the result of the `SUBREG_TO_REG` MachineInstr in isolation, with the understanding that the top 32bits will be zero.  In this instance we don't know how `$Rn` was produced so we don't know anything about its top 32bits.
> > 
> > Within PeepholeOptimizer.cpp there's this comment:
> > ```
> >     // It's an error to translate this:
> >     //
> >     //    %reg1025 = <sext> %reg1024
> >     //     ...
> >     //    %reg1026 = SUBREG_TO_REG 0, %reg1024, 4
> >     //
> >     // into this:
> >     //
> >     //    %reg1025 = <sext> %reg1024
> >     //     ...
> >     //    %reg1027 = COPY %reg1025:4
> >     //    %reg1026 = SUBREG_TO_REG 0, %reg1027, 4
> >     //
> >     // The problem here is that SUBREG_TO_REG is there to assert that an
> >     // implicit zext occurs. It doesn't insert a zext instruction. If we allow
> >     // the COPY here, it will give us the value after the <sext>, not the
> >     // original value of %reg1024 before <sext>.
> > ```
> > This articulates what I mean.  An optimisation is prevented that would otherwise break `SUBREG_TO_REG`'s contract.  Presumably this is for a good reason and suggests we should only use `SUBREG_TO_REG` when we can honour its contract.
> > 
> > That said, if Eli is happy then I shalt worry about it, but to me this just looks unsafe.
> > 
> > Perhaps we're missing a MachineInstr transformation for `INSERT_SUBREG (undef), $GPR` -> `SUBREG_TO_REG 0, $GPR` for the cases where we know the top bits of `$GPR` will be zero?
> Thanks @paulwalker-arm for detail explanation, and now I think your worry is make sense. 
> I'll try to take a look at the MachineInstr transformation as your suggestion.
Address with D132939.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132325/new/

https://reviews.llvm.org/D132325