[PATCH] D38128: Handle COPYs of physregs better (regalloc hints)
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 16 14:09:24 PDT 2017
efriedma added inline comments.
================
Comment at: test/CodeGen/ARM/swifterror.ll:350
+; CHECK-APPLE: mov r0, r8
+; CHECK-APPLE: cmp r0, #0
; Access part of the error object and save it to error_ref
----------------
jonpa wrote:
> efriedma wrote:
> > This is... not really great. I mean, it's the same number of instructions, but you're increasing the latency by making the cmp refer to r0 rather than r8. Do you know why this is happening?
> master patched
>
> Register allocation input:
>
>
> ```
> ********** MACHINEINSTRS ********** ********** MACHINEINSTRS **********
> ...
> 224B BL_pred <ga:@foo_vararg>, pred:14, pred: 224B BL_pred <ga:@foo_vararg>, pred:14, pred:
> 240B ADJCALLSTACKUP 0, 0, pred:14, pred:%nore 240B ADJCALLSTACKUP 0, 0, pred:14, pred:%nore
> 256B %vreg0<def> = COPY %R8<kill>; GPR:%vreg0 256B %vreg0<def> = COPY %R8<kill>; GPR:%vreg0
> 304B CMPri %vreg0, 0, pred:14, pred:%noreg, % 304B CMPri %vreg0, 0, pred:14, pred:%noreg, %
> 320B Bcc <BB#2>, pred:1, pred:%CPSR<kill> 320B Bcc <BB#2>, pred:1, pred:%CPSR<kill>
> 336B B <BB#1> 336B B <BB#1>
> Successors according to CFG: BB#2(0x50000000 Successors according to CFG: BB#2(0x50000000
>
> 352B BB#1: derived from LLVM BB %cont 352B BB#1: derived from LLVM BB %cont
> Predecessors according to CFG: BB#0 Predecessors according to CFG: BB#0
> 368B %vreg10<def> = LDRBi12 %vreg0, 8, pred:1 368B %vreg10<def> = LDRBi12 %vreg0, 8, pred:1
> 384B STRBi12 %vreg10, %vreg1, 0, pred:14, pre 384B STRBi12 %vreg10, %vreg1, 0, pred:14, pre
> Successors according to CFG: BB#2(?%) Successors according to CFG: BB#2(?%)
>
> 400B BB#2: derived from LLVM BB %handler 400B BB#2: derived from LLVM BB %handler
> Predecessors according to CFG: BB#0 BB#1 Predecessors according to CFG: BB#0 BB#1
> 416B ADJCALLSTACKDOWN 0, 0, pred:14, pred:%no 416B ADJCALLSTACKDOWN 0, 0, pred:14, pred:%no
> 432B %R0<def> = COPY %vreg0; GPR:%vreg0 432B %R0<def> = COPY %vreg0; GPR:%vreg0
> 448B BL <ga:@free>, <regmask %LR %D8 %D9 %D10 448B BL <ga:@free>, <regmask %LR %D8 %D9 %D10
> ...
> ```
>
>
> ```
> selectOrSplit GPR:%vreg0 [256r,432r:0) 0 at 256r w=5.92840 selectOrSplit GPR:%vreg0 [256r,432r:0) 0 at 256r w=5.92840
> hints: %R8 | hints: %R0 %R8
> assigning %vreg0 to %R8: R8 [256r,432r:0) 0 at 256r | assigning %vreg0 to %R0: R0 [256r,432r:0) 0 at 256r
>
> ```
> %vreg0 now has two COPY hints, and I am guessing that they have the same weight, but for no reason %R0 is hinted before %R0, while just %R8 is hinted on master.
>
>
> ```
> ********** REWRITE VIRTUAL REGISTERS ********** ********** REWRITE VIRTUAL REGISTERS **********
> ********** Function: caller4 ********** Function: caller4
> ********** REGISTER MAP ********** ********** REGISTER MAP **********
> [%vreg0 -> %R8] GPR | [%vreg0 -> %R0] GPR
> ...
>
> ```
> Not sure if it is clear that coalescing with %R8 is generally better than %R0.
>
>
> ```
> # After Thumb2 instruction size reduction pass: # After Thumb2 instruction size reduction pass:
>
> BB#0: derived from LLVM BB %entry BB#0: derived from LLVM BB %entry
> Live Ins: %R0 %R8 %R4 %LR Live Ins: %R0 %R8 %R4 %LR
> ...
> %R2<def> = MOVi 12, pred:14, pred:%noreg, opt:%n %R2<def> = MOVi 12, pred:14, pred:%noreg, opt:%n
> BL_pred <ga:@foo_vararg>, pred:14, pred:%noreg, | BL_pred <ga:@foo_vararg>, pred:14, pred:%noreg,
> CMPri %R8, 0, pred:14, pred:%noreg, %CPSR<imp-de | %R0<def> = MOVr %R8<kill>, pred:14, pred:%noreg,
> > CMPri %R0, 0, pred:14, pred:%noreg, %CPSR<imp-de
> Bcc <BB#2>, pred:1, pred:%CPSR<kill> Bcc <BB#2>, pred:1, pred:%CPSR<kill>
> Successors according to CFG: BB#2(0x50000000 / 0x800 Successors according to CFG: BB#2(0x50000000 / 0x800
>
> BB#1: derived from LLVM BB %cont BB#1: derived from LLVM BB %cont
> Live Ins: %R4 %R8 | Live Ins: %R0 %R4
> Predecessors according to CFG: BB#0 Predecessors according to CFG: BB#0
> %R0<def> = LDRBi12 %R8, 8, pred:14, pred:%noreg; | %R1<def> = LDRBi12 %R0, 8, pred:14, pred:%noreg;
> STRBi12 %R0<kill>, %R4<kill>, 0, pred:14, pred:% | STRBi12 %R1<kill>, %R4<kill>, 0, pred:14, pred:%
> Successors according to CFG: BB#2(?%) Successors according to CFG: BB#2(?%)
>
> BB#2: derived from LLVM BB %handler BB#2: derived from LLVM BB %handler
> Live Ins: %R8 | Live Ins: %R0
> Predecessors according to CFG: BB#0 BB#1 Predecessors according to CFG: BB#0 BB#1
> %R0<def> = MOVr %R8<kill>, pred:14, pred:%noreg, <
> BL <ga:@free>, <regmask %LR %D8 %D9 %D10 %D11 %D BL <ga:@free>, <regmask %LR %D8 %D9 %D10 %D11 %D
> ...
>
> # After If Converter: # After If Converter:
>
> BB#0: derived from LLVM BB %entry BB#0: derived from LLVM BB %entry
> ...
> %R2<def> = MOVi 12, pred:14, pred:%noreg, opt:%n %R2<def> = MOVi 12, pred:14, pred:%noreg, opt:%n
> BL_pred <ga:@foo_vararg>, pred:14, pred:%noreg, BL_pred <ga:@foo_vararg>, pred:14, pred:%noreg,
> CMPri %R8, 0, pred:14, pred:%noreg, %CPSR<imp-de <
> %R0<def> = LDRBi12 %R8, 8, pred:0, pred:%CPSR; m <
> STRBi12 %R0<kill>, %R4<kill>, 0, pred:0, pred:%C <
> %R0<def> = MOVr %R8<kill>, pred:14, pred:%noreg, %R0<def> = MOVr %R8<kill>, pred:14, pred:%noreg,
> > CMPri %R0, 0, pred:14, pred:%noreg, %CPSR<imp-de
> > %R1<def> = LDRBi12 %R0, 8, pred:0, pred:%CPSR; m
> > STRBi12 %R1<kill>, %R4<kill>, 0, pred:0, pred:%C
> BL <ga:@free>, <regmask %LR %D8 %D9 %D10 %D11 %D BL <ga:@free>, <regmask %LR %D8 %D9 %D10 %D11 %D
> %R0<def> = MOVi 1065353216, pred:14, pred:%noreg %R0<def> = MOVi 1065353216, pred:14, pred:%noreg
> %SP<def> = ADDri %SP<kill>, 16, pred:14, pred:%n %SP<def> = ADDri %SP<kill>, 16, pred:14, pred:%n
> %SP<def,tied1> = LDMIA_RET %SP<tied0>, pred:14, %SP<def,tied1> = LDMIA_RET %SP<tied0>, pred
>
> _caller4: _caller4:
> @ BB#0: @ BB#0:
> push {r4, r8, lr} push {r4, r8, lr}
> sub sp, sp, #16 sub sp, sp, #16
> mov r4, r0 mov r4, r0
> mov r0, #11 mov r0, #11
> str r0, [sp, #4] str r0, [sp, #4]
> mov r0, #10 mov r0, #10
> str r0, [sp, #8] str r0, [sp, #8]
> mov r0, #12 mov r0, #12
> str r0, [sp] str r0, [sp]
> mov r8, #0 mov r8, #0
> mov r0, #10 mov r0, #10
> mov r1, #11 mov r1, #11
> mov r2, #12 mov r2, #12
> bl _foo_vararg bl _foo_vararg
> cmp r8, #0 <
> ldrbeq r0, [r8, #8] <
> strbeq r0, [r4] <
> mov r0, r8 mov r0, r8
> > cmp r0, #0
> > ldrbeq r1, [r0, #8]
> > strbeq r1, [r4]
> bl _free bl _free
> mov r0, #1065353216 mov r0, #1065353216
> add sp, sp, #16 add sp, sp, #16
> pop {r4, r8, pc} pop {r4, r8, pc}
> ```
>
mov+cmp generally has worse latency than cmp+mov on superscalar CPUs, unless you're working with something like very recent x86 CPUs which have tricks to hide the cost.
I don't know enough about the register allocator to say if there's some existing code that's supposed to handle this sort of thing.
https://reviews.llvm.org/D38128
More information about the llvm-commits
mailing list