[PATCH] D91279: [PowerPC] DForm instructions should be preferred when using zero register

Tue Nov 17 13:29:01 PST 2020

stefanp added a comment.

In D91279#2390160 <https://reviews.llvm.org/D91279#2390160>, @shchenz wrote:

> Using dform with offset 0 can save one register r0/X0, this is benefit for register allocation? But adding it in `PPCPreEmitPeephole` pass which is after register allocation will make the benefit gone.
> Maybe we need to do it before register allocation? For example at the place where the x-form with zero register is generated.
>
> I checked one example `loadConstant` in  `test/CodeGen/PowerPC/f128-passByValue.ll`.
> We generate `LXVX $zero8, ` in ISEL because we meet the worst case and we don't have d-form choice for the instruction selection. so we have to use x-form and in x-form selection, we have to use zero/zero8 as the base and use load address as the index. See `PPCTargetLowering::SelectAddressRegRegOnly`.
>
> I guess most cases are with same reason for generating x-form + zero register, we meet the worst case in ISEL, so we have to use x-form + zero register form, with this form, we can always select a powerpc load/store instruction.
>
> For me, a better solution should be change the worst case handling in ISEL, it is before RA and it is also transparent for types like STXVX/LXVX/ and also LDX/STDX, LFDX/STFDX...

I'm just going to jump in to give a little more background. The initial reason we wanted to do this was to enable an optimization that actually happens in the linker after the code is emitted.
To get the idea you can look at this test:

  /llvm/test/CodeGen/PowerPC/pcrel-linkeropt.ll

Which contains this section:

  ; FIXME: we should always convert X-Form instructions that use
  ; PPC::ZERO[8] to the corresponding D-Form so we can perform this opt.
  define dso_local void @ReadWrite128() local_unnamed_addr #0 {
  ; CHECK-LABEL: ReadWrite128:
  ; CHECK:       # %bb.0: # %entry
  ; CHECK-NEXT:    pld r3, input128 at got@pcrel(0), 1
  ; CHECK-NEXT:    lxvx vs0, 0, r3
  ; CHECK-NEXT:    pld r3, output128 at got@pcrel(0), 1
  ; CHECK-NEXT:    stxvx vs0, 0, r3
  ; CHECK-NEXT:    blr
  entry:
    %0 = load i128, i128* @input128, align 16
    store i128 %0, i128* @output128, align 16
    ret void
  }

When we have a GOT access like this it is possible for the compiler to mark the instruction with `R_PPC64_PCREL_OPT` and then the linker merges the two instructions into one and replaces the second instruction with a `nop`. The problem is that this opt can only be done if the second instruction is DForm. We noticed that when we implemented this optimization we could not catch all of the cases because in some situations (like the one above) we use the XForm instead of the DForm.

Having said that, we should try to do this before the PreEmitPeephole. The optimization that adds the `R_PPC64_PCREL_OPT` relocation is also in the PreEmitPeephole and I'm not sure if it will be detected if we do both things at the same time (both as in convert the XForm to a DForm and then have the same opt use that DForm to add the relocation).

I agree that ISel is a better place for this. If we cannot do this in ISel then we should still try to do this before we get to the PreEmitPeephole or at least make sure that both the DForm is present and that the `R_PPC64_PCREL_OPT` relocation is added as we expected in the same pass.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D91279/new/

https://reviews.llvm.org/D91279