Another problem with "Recommit r265547, and r265610,r265639,r265657"

Mikael Holmén via llvm-commits llvm-commits at lists.llvm.org
Fri Apr 29 02:11:32 PDT 2016


Hi Wei,

On 04/28/2016 08:24 PM, Wei Mi wrote:
>> Ok, I tried to make LiveVariable to run again after RegisterCoalescer but
>> hit a fatal error in LiveVariables since we're not on SSA anymore:
>>
>>    if (!MRI->isSSA())
>>      report_fatal_error("regalloc=... not currently supported with -O0");
>>
>> However, I managed to bring out the big hammer and "repair" LIS at the end
>> of RegisterCoalescer, and then the code compiles succesfully!
>
> That is an interesting result!

It is! Unfortunately I saw I broke several of the basic tests with this 
(I think that my "repair" function has previously only been used later 
in the compiler, and now when run as early as the coalescer it breaks 
something sometimes), but at least it points in the direction that we 
can potentially solve this by doing something in RegisterCoalescer.

>
>> However, I'm still curious... do you say that RegisterCoalescer should work
>> and keep LiveVariables properly updated even without kill/dead-flags? Or is
>> already the input to the coalescer broken since kill flags are missing?
>>
>> Is it a bug that LiveVariables removes the kill-flags when it's run
>> somewhere prior to RegisterCoalescer or is this ok?
>
> Before Live Interval Analysis
> 624B            %vreg7<def> = COPY %vreg29<kill>; aN32_rN_Pairs:%vreg7,%vreg29
>
> After Live Interval Analysis
> 624B            %vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29
>
> I managed to reproduce the same behavior using foo.red.ll locally so
> now I can see how the kill flag is removed by Live Interval Analysis.
>
> It is done deliberately by LiveInterval Analysis in
> LiveRangeCalc::extendToUses. Here is the related comment:
>    // Clear all kill flags. They will be reinserted after register allocation
>    // by LiveIntervalAnalysis::addKillFlags().
>
> This looks reasonable because after LiveInterval is setup, following
> passes mostly depends on LiveInterval instead of dead/kill flags.

Ok!

>
>>
>> So when kill and dead flags should be present and who should update them is
>> still quite unclear for me.
>
> Operand kill and dead flags are initially set by LiveVariables
> Analysis. Seems only TwoAddressInstructionPass and Register Scavenger
> requires that. For the other regalloc related passes including
> RegisterCoalescer, they get dead and kill information directly using
> LiveInterval, but they may update Operand kill and dead flags. I guess
> at two points kill and dead flags should be properly set -- before
> TwoAddressInstructionPass and after regalloc - for scavenger. Other
> points, it is not mandatory.

Ok, thanks.

>
>>
>> And if the register allocator has assumptions about how the input code
>> should look, shouldn't there be some checks that those assumptions are
>> fulfilled rather than ending up with invalid live ranges after regalloc that
>> the verifier shouts about?
>
> I agree with you. Ideally verifier should catch the non-meaningful
> live interval about %vreg29 early -- after RegisterCoalescing when we
> turn on -verify-coalescing. Existing verifier doesn't have it maybe
> because that means to recompute and verify if current LiveInterval
> needs to be updated, and it is costly.
>
>
> A still important question is: what is wrong with the LiveInterval
> update in RegisterCoalescer. Although I can reproduce to know how the
> kill flag is cleaned in LiveInterval Analysis, I havn't reproduced the
> LiveInterval update problem in RegisterCoalescer successfully yet.

Yes, I've digged into this some more and I'm beginning to figure out 
what the problem is.

In the input code there is a lot of copying between several vregs in the 
infinite loop. Some value is materialized, and then it's just copied 
around and around between several vregs and is never actually used. This 
is exposed by the coalescer and yes, it fails to update the live range 
accordingly in one step.

After a few vreg have been coalesced we have:

a0h 
[0B,16r:0)[480r,512r:1)[656r,656d:5)[752r,768r:2)[880r,896r:3)[896r,896d:4) 
  0 at 0B-phi 1 at 480r 2 at 752r 3 at 880r 4 at 896r 5 at 656r
%vreg0 [64r,144r:0)[176B,208r:0)  0 at 64r
%vreg1 [224r,336B:0)[720B,976B:0)  0 at 224r
%vreg2 [208r,336B:0)[720B,976B:0)  0 at 208r
%vreg3 [304r,336B:0)[720B,976B:0)  0 at 304r
%vreg4 [288r,336B:0)[720B,976B:0)  0 at 288r
%vreg6 [560r,576r:0)  0 at 560r
%vreg7 [624r,688r:0)  0 at 624r
%vreg8 [16r,32r:0)  0 at 16r
%vreg9 [32r,128B:0)[176B,240r:0)  0 at 32r
%vreg10 [48r,64r:0)  0 at 48r
%vreg11 [80r,96r:0)  0 at 80r
%vreg12 [368r,400r:0)  0 at 368r
%vreg13 [512r,528r:0)  0 at 512r
%vreg14 [544r,560r:0)  0 at 544r
%vreg16 [192r,224r:0)  0 at 192r
%vreg17 [256r,288r:0)  0 at 256r
%vreg18 [272r,304r:0)  0 at 272r
%vreg19 [736r,944r:0)  0 at 736r
%vreg20 [768r,800r:0)  0 at 768r
%vreg21 [784r,816r:0)  0 at 784r
%vreg22 [800r,848r:0)  0 at 800r
%vreg23 [816r,864r:0)  0 at 816r
%vreg25 [928r,944r:0)  0 at 928r
%vreg26 [240r,256r:0)  0 at 240r
%vreg27 [528r,544r:0)  0 at 528r
%vreg29 
[144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,624r:2)[688r,720B:4) 
0 at 336B-phi 1 at 576r 2 at 608B-phi 3 at 144r 4 at 688r
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function f3 (#0): Properties: <Post SSA, tracking 
liveness, HasVRegs>
Function Live Ins: %a0h in %vreg8

0B	BB#0: derived from LLVM BB %0
	    Live Ins: %a0h
16B		%vreg8<def> = COPY %a0h; aNh_0_7:%vreg8
32B		%vreg9<def> = COPY %vreg8; aNh_0_7:%vreg9,%vreg8
48B		%vreg10<def> = mv32Imm_pseudo 32768, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg10
64B		%vreg0<def> = COPY %vreg10; aN32_0_7:%vreg0,%vreg10
80B		%vreg11<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg11
96B		cmp_nimm16_a16 %vreg11, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg11
112B		brr_cond <BB#1>, pred:2, pred:%CCReg<kill>, pred:0
	    Successors according to CFG: BB#1 BB#6

128B	BB#6:
	    Predecessors according to CFG: BB#0
144B		%vreg29<def> = COPY %vreg0; aN32_rN_Pairs:%vreg29 aN32_0_7:%vreg0
160B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

176B	BB#1: derived from LLVM BB %bb24.preheader
	    Predecessors according to CFG: BB#0
192B		%vreg16<def> = shfts_a32_nimm7_a32 %vreg0, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg16,%vreg0
208B		%vreg2<def> = COPY %vreg0; aN32_0_7:%vreg2,%vreg0
224B		%vreg1<def> = COPY %vreg16; aN32_0_7:%vreg1,%vreg16
240B		%vreg26<def> = mv_ar16_ar16_lo16In32 %vreg9, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg26 aNh_0_7:%vreg9
256B		%vreg17<def> = COPY %vreg26; aN32_0_7:%vreg17,%vreg26
272B		%vreg18<def> = shfts_a32_nimm7_a32 %vreg17, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg18,%vreg17
288B		%vreg4<def> = COPY %vreg17; aN32_0_7:%vreg4,%vreg17
304B		%vreg3<def> = COPY %vreg18; aN32_0_7:%vreg3,%vreg18
320B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

336B	BB#2: derived from LLVM BB %bb2
	    Predecessors according to CFG: BB#4 BB#6
368B		%vreg12<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg12
400B		cmp_nimm16_a16 %vreg12, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg12
416B		brr_cond <BB#4>, pred:3, pred:%CCReg<kill>, pred:0
432B		brr_uncond <BB#3>
	    Successors according to CFG: BB#4 BB#3

448B	BB#3: derived from LLVM BB %bb3
	    Predecessors according to CFG: BB#2
464B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
480B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, %a0h<imp-def>, ...
496B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
512B		%vreg13<def> = COPY %a0h; aNh_0_7:%vreg13
528B		%vreg27<def> = mv_ar16_ar16_lo16In32 %vreg13, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg27 aNh_0_7:%vreg13
544B		%vreg14<def> = COPY %vreg27; aN32_0_7:%vreg14,%vreg27
560B		%vreg6<def> = COPY %vreg14; aN32_rN_Pairs:%vreg6 aN32_0_7:%vreg14
576B		%vreg29<def> = COPY %vreg6; aN32_rN_Pairs:%vreg29,%vreg6
592B		brr_uncond <BB#4>
	    Successors according to CFG: BB#4(?%)

608B	BB#4: derived from LLVM BB %bb13
	    Predecessors according to CFG: BB#2 BB#3
624B		%vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29
640B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
656B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
672B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
688B		%vreg29<def> = COPY %vreg7; aN32_rN_Pairs:%vreg29,%vreg7
704B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

720B	BB#5: derived from LLVM BB %bb24
	    Predecessors according to CFG: BB#1 BB#5
736B		%vreg19<def> = clearAcc32 pred:0, pred:%noreg, pred:0; 
aN32_0_7:%vreg19
752B		libcall_CRT_ll_div_r %vreg19, %vreg19, %vreg2, %vreg1, 
%a0_32<imp-def>, %a1_32<imp-def>, %CCReg<imp-def,dead>, ...; 
aN32_0_7:%vreg19,%vreg2,%vreg1
768B		%vreg20<def> = COPY %a0_32; aN32_0_7:%vreg20
784B		%vreg21<def> = COPY %a1_32<kill>; aN32_0_7:%vreg21
800B		%vreg22<def> = xor_a32_a32_a32 %vreg20, %vreg4, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg22,%vreg20,%vreg4
816B		%vreg23<def> = xor_a32_a32_a32 %vreg21, %vreg3, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg23,%vreg21,%vreg3
832B		ADJCALLSTACKDOWN 4, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
848B		push_any32 %vreg22, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg22
864B		push_any32 %vreg23, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg23
880B		%a0_32<def> = COPY %vreg19; aN32_0_7:%vreg19
896B		callr <ga:@f2>, pred:0, pred:%noreg, pred:0, %a0_32, 
%a0_40<imp-def,dead>, %a1_40<imp-def,dead>, %sp<imp-def>, 
%CCReg<imp-def,dead>, %cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
912B		ADJCALLSTACKUP 4, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
928B		%vreg25<def> = mv16Sym_noLo <ga:@g>, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; rN:%vreg25
944B		mv_a32_r16_rmod1 %vreg19, %vreg25, pred:0, pred:%noreg, pred:0, 
%dc0<imp-use>, %dc1<imp-use>, %ac0<imp-use>, %ac1<imp-use>; 
mem:ST2[@g](align=1) aN32_0_7:%vreg19 rN:%vreg25
960B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

# End machine code for function f3.

Notice that vreg29 is defined in a few places, and then it's copied to 
vreg7, which is then copied back to vreg29. Stupid useless code, but as 
far as I can tell the live ranges are ok here.

Then the coalescer says

bb13:
624B	%vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29
	Considering merging to aN32_rN_Pairs with %vreg7 in %vreg29
		RHS = %vreg7 [624r,688r:0)  0 at 624r
		LHS = %vreg29 
[144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,624r:2)[688r,720B:4) 
0 at 336B-phi 1 at 576r 2 at 608B-phi 3 at 144r 4 at 688r
		merge %vreg7:0 at 624r into %vreg29:2 at 608B --> @608B
		merge %vreg29:4 at 688r into %vreg7:0 at 624r --> @608B
		erased:	688r	%vreg29<def> = COPY %vreg7; aN32_rN_Pairs:%vreg29,%vreg7
		erased:	624r	%vreg7<def> = COPY %vreg29; aN32_rN_Pairs:%vreg7,%vreg29
	Success: %vreg7 -> %vreg29
	Result = %vreg29 [144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,720B:2) 
0 at 336B-phi 1 at 576r 2 at 608B-phi 3 at 144r

and the code now looks like:

********** INTERVALS **********
a0h 
[0B,16r:0)[480r,512r:1)[656r,656d:5)[752r,768r:2)[880r,896r:3)[896r,896d:4) 
  0 at 0B-phi 1 at 480r 2 at 752r 3 at 880r 4 at 896r 5 at 656r
%vreg0 [64r,144r:0)[176B,208r:0)  0 at 64r
%vreg1 [224r,336B:0)[720B,976B:0)  0 at 224r
%vreg2 [208r,336B:0)[720B,976B:0)  0 at 208r
%vreg3 [304r,336B:0)[720B,976B:0)  0 at 304r
%vreg4 [288r,336B:0)[720B,976B:0)  0 at 288r
%vreg6 [560r,576r:0)  0 at 560r
%vreg8 [16r,32r:0)  0 at 16r
%vreg9 [32r,128B:0)[176B,240r:0)  0 at 32r
%vreg10 [48r,64r:0)  0 at 48r
%vreg11 [80r,96r:0)  0 at 80r
%vreg12 [368r,400r:0)  0 at 368r
%vreg13 [512r,528r:0)  0 at 512r
%vreg14 [544r,560r:0)  0 at 544r
%vreg16 [192r,224r:0)  0 at 192r
%vreg17 [256r,288r:0)  0 at 256r
%vreg18 [272r,304r:0)  0 at 272r
%vreg19 [736r,944r:0)  0 at 736r
%vreg20 [768r,800r:0)  0 at 768r
%vreg21 [784r,816r:0)  0 at 784r
%vreg22 [800r,848r:0)  0 at 800r
%vreg23 [816r,864r:0)  0 at 816r
%vreg25 [928r,944r:0)  0 at 928r
%vreg26 [240r,256r:0)  0 at 240r
%vreg27 [528r,544r:0)  0 at 528r
%vreg29 [144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,720B:2)  0 at 336B-phi 
1 at 576r 2 at 608B-phi 3 at 144r
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function f3 (#0): Properties: <Post SSA, tracking 
liveness, HasVRegs>
Function Live Ins: %a0h in %vreg8

0B	BB#0: derived from LLVM BB %0
	    Live Ins: %a0h
16B		%vreg8<def> = COPY %a0h; aNh_0_7:%vreg8
32B		%vreg9<def> = COPY %vreg8; aNh_0_7:%vreg9,%vreg8
48B		%vreg10<def> = mv32Imm_pseudo 32768, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg10
64B		%vreg0<def> = COPY %vreg10; aN32_0_7:%vreg0,%vreg10
80B		%vreg11<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg11
96B		cmp_nimm16_a16 %vreg11, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg11
112B		brr_cond <BB#1>, pred:2, pred:%CCReg<kill>, pred:0
	    Successors according to CFG: BB#1 BB#6

128B	BB#6:
	    Predecessors according to CFG: BB#0
144B		%vreg29<def> = COPY %vreg0; aN32_rN_Pairs:%vreg29 aN32_0_7:%vreg0
160B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

176B	BB#1: derived from LLVM BB %bb24.preheader
	    Predecessors according to CFG: BB#0
192B		%vreg16<def> = shfts_a32_nimm7_a32 %vreg0, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg16,%vreg0
208B		%vreg2<def> = COPY %vreg0; aN32_0_7:%vreg2,%vreg0
224B		%vreg1<def> = COPY %vreg16; aN32_0_7:%vreg1,%vreg16
240B		%vreg26<def> = mv_ar16_ar16_lo16In32 %vreg9, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg26 aNh_0_7:%vreg9
256B		%vreg17<def> = COPY %vreg26; aN32_0_7:%vreg17,%vreg26
272B		%vreg18<def> = shfts_a32_nimm7_a32 %vreg17, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg18,%vreg17
288B		%vreg4<def> = COPY %vreg17; aN32_0_7:%vreg4,%vreg17
304B		%vreg3<def> = COPY %vreg18; aN32_0_7:%vreg3,%vreg18
320B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

336B	BB#2: derived from LLVM BB %bb2
	    Predecessors according to CFG: BB#4 BB#6
368B		%vreg12<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg12
400B		cmp_nimm16_a16 %vreg12, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg12
416B		brr_cond <BB#4>, pred:3, pred:%CCReg<kill>, pred:0
432B		brr_uncond <BB#3>
	    Successors according to CFG: BB#4 BB#3

448B	BB#3: derived from LLVM BB %bb3
	    Predecessors according to CFG: BB#2
464B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
480B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, %a0h<imp-def>, ...
496B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
512B		%vreg13<def> = COPY %a0h; aNh_0_7:%vreg13
528B		%vreg27<def> = mv_ar16_ar16_lo16In32 %vreg13, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg27 aNh_0_7:%vreg13
544B		%vreg14<def> = COPY %vreg27; aN32_0_7:%vreg14,%vreg27
560B		%vreg6<def> = COPY %vreg14; aN32_rN_Pairs:%vreg6 aN32_0_7:%vreg14
576B		%vreg29<def> = COPY %vreg6; aN32_rN_Pairs:%vreg29,%vreg6
592B		brr_uncond <BB#4>
	    Successors according to CFG: BB#4(?%)

608B	BB#4: derived from LLVM BB %bb13
	    Predecessors according to CFG: BB#2 BB#3
640B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
656B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
672B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
704B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

720B	BB#5: derived from LLVM BB %bb24
	    Predecessors according to CFG: BB#1 BB#5
736B		%vreg19<def> = clearAcc32 pred:0, pred:%noreg, pred:0; 
aN32_0_7:%vreg19
752B		libcall_CRT_ll_div_r %vreg19, %vreg19, %vreg2, %vreg1, 
%a0_32<imp-def>, %a1_32<imp-def>, %CCReg<imp-def,dead>, ...; 
aN32_0_7:%vreg19,%vreg2,%vreg1
768B		%vreg20<def> = COPY %a0_32; aN32_0_7:%vreg20
784B		%vreg21<def> = COPY %a1_32<kill>; aN32_0_7:%vreg21
800B		%vreg22<def> = xor_a32_a32_a32 %vreg20, %vreg4, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg22,%vreg20,%vreg4
816B		%vreg23<def> = xor_a32_a32_a32 %vreg21, %vreg3, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg23,%vreg21,%vreg3
832B		ADJCALLSTACKDOWN 4, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
848B		push_any32 %vreg22, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg22
864B		push_any32 %vreg23, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg23
880B		%a0_32<def> = COPY %vreg19; aN32_0_7:%vreg19
896B		callr <ga:@f2>, pred:0, pred:%noreg, pred:0, %a0_32, 
%a0_40<imp-def,dead>, %a1_40<imp-def,dead>, %sp<imp-def>, 
%CCReg<imp-def,dead>, %cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
912B		ADJCALLSTACKUP 4, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
928B		%vreg25<def> = mv16Sym_noLo <ga:@g>, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; rN:%vreg25
944B		mv_a32_r16_rmod1 %vreg19, %vreg25, pred:0, pred:%noreg, pred:0, 
%dc0<imp-use>, %dc1<imp-use>, %ac0<imp-use>, %ac1<imp-use>; 
mem:ST2[@g](align=1) aN32_0_7:%vreg19 rN:%vreg25
960B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

# End machine code for function f3.

Since the only use of vreg29 before the coalescing was the copy to 
vreg7, and now vreg7 and vreg29 have been merged, there are no uses left 
of vreg29!

But the live range is still

%vreg29 [144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,720B:2) 0 at 336B-phi 
1 at 576r 2 at 608B-phi 3 at 144r

So some shrinking of the live range is needed in this case. I saw that 
there is some code in the coalescer to do this:

   if (ShrinkMainRange) {
     LiveInterval &LI = LIS->getInterval(CP.getDstReg());
     shrinkToUses(&LI);
   }

but ShrinkMainRange is false all the time so that code isn't run. 
However, if I force ShrinkMainRange to be true I see:

Shrink: %vreg29 [144r,176B:3)[336B,448B:0)[576r,608B:1)[608B,720B:2) 
0 at 336B-phi 1 at 576r 2 at 608B-phi 3 at 144r
Dead PHI at 336B may separate interval
Dead PHI at 608B may separate interval
Shrunk: %vreg29 [144r,144d:3)[576r,576d:1)  0 at x 1 at 576r 2 at x 3 at 144r
   Split 2 components: %vreg29 [144r,144d:3)[576r,576d:1)  0 at x 1 at 576r 
2 at x 3 at 144r
	Success: %vreg7 -> %vreg29
	Result = %vreg29 [144r,144d:2)  0 at x 1 at x 2 at 144r

so the range for %vreg29 is shrunk, and also the two (now dead) defs are 
split into %vreg29 and %vreg30.

I'm a little concerned about the "0 at x 1 at x" things in %vreg29's range 
which I've no idea if they do any harm and if or how they should be removed.

The code after this merge is now

a0h 
[0B,16r:0)[480r,512r:1)[656r,656d:5)[752r,768r:2)[880r,896r:3)[896r,896d:4) 
  0 at 0B-phi 1 at 480r 2 at 752r 3 at 880r 4 at 896r 5 at 656r
%vreg0 [64r,144r:0)[176B,208r:0)  0 at 64r
%vreg1 [224r,336B:0)[720B,976B:0)  0 at 224r
%vreg2 [208r,336B:0)[720B,976B:0)  0 at 208r
%vreg3 [304r,336B:0)[720B,976B:0)  0 at 304r
%vreg4 [288r,336B:0)[720B,976B:0)  0 at 288r
%vreg6 [560r,576r:0)  0 at 560r
%vreg8 [16r,32r:0)  0 at 16r
%vreg9 [32r,128B:0)[176B,240r:0)  0 at 32r
%vreg10 [48r,64r:0)  0 at 48r
%vreg11 [80r,96r:0)  0 at 80r
%vreg12 [368r,400r:0)  0 at 368r
%vreg13 [512r,528r:0)  0 at 512r
%vreg14 [544r,560r:0)  0 at 544r
%vreg16 [192r,224r:0)  0 at 192r
%vreg17 [256r,288r:0)  0 at 256r
%vreg18 [272r,304r:0)  0 at 272r
%vreg19 [736r,944r:0)  0 at 736r
%vreg20 [768r,800r:0)  0 at 768r
%vreg21 [784r,816r:0)  0 at 784r
%vreg22 [800r,848r:0)  0 at 800r
%vreg23 [816r,864r:0)  0 at 816r
%vreg25 [928r,944r:0)  0 at 928r
%vreg26 [240r,256r:0)  0 at 240r
%vreg27 [528r,544r:0)  0 at 528r
%vreg29 [144r,144d:2)  0 at x 1 at x 2 at 144r
%vreg30 [576r,576d:0)  0 at 576r
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function f3 (#0): Properties: <Post SSA, tracking 
liveness, HasVRegs>
Function Live Ins: %a0h in %vreg8

0B	BB#0: derived from LLVM BB %0
	    Live Ins: %a0h
16B		%vreg8<def> = COPY %a0h; aNh_0_7:%vreg8
32B		%vreg9<def> = COPY %vreg8; aNh_0_7:%vreg9,%vreg8
48B		%vreg10<def> = mv32Imm_pseudo 32768, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg10
64B		%vreg0<def> = COPY %vreg10; aN32_0_7:%vreg0,%vreg10
80B		%vreg11<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg11
96B		cmp_nimm16_a16 %vreg11, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg11
112B		brr_cond <BB#1>, pred:2, pred:%CCReg<kill>, pred:0
	    Successors according to CFG: BB#1 BB#6

128B	BB#6:
	    Predecessors according to CFG: BB#0
144B		%vreg29<def,dead> = COPY %vreg0; aN32_rN_Pairs:%vreg29 aN32_0_7:%vreg0
160B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

176B	BB#1: derived from LLVM BB %bb24.preheader
	    Predecessors according to CFG: BB#0
192B		%vreg16<def> = shfts_a32_nimm7_a32 %vreg0, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg16,%vreg0
208B		%vreg2<def> = COPY %vreg0; aN32_0_7:%vreg2,%vreg0
224B		%vreg1<def> = COPY %vreg16; aN32_0_7:%vreg1,%vreg16
240B		%vreg26<def> = mv_ar16_ar16_lo16In32 %vreg9, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg26 aNh_0_7:%vreg9
256B		%vreg17<def> = COPY %vreg26; aN32_0_7:%vreg17,%vreg26
272B		%vreg18<def> = shfts_a32_nimm7_a32 %vreg17, -31, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>, %cuc<imp-use>; 
aN32_0_7:%vreg18,%vreg17
288B		%vreg4<def> = COPY %vreg17; aN32_0_7:%vreg4,%vreg17
304B		%vreg3<def> = COPY %vreg18; aN32_0_7:%vreg3,%vreg18
320B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

336B	BB#2: derived from LLVM BB %bb2
	    Predecessors according to CFG: BB#4 BB#6
368B		%vreg12<def> = mv_0_ar16_noLo 0, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; aNh_0_7:%vreg12
400B		cmp_nimm16_a16 %vreg12, 0, pred:0, pred:%noreg, pred:0, 
%CCReg<imp-def>; aNh_0_7:%vreg12
416B		brr_cond <BB#4>, pred:3, pred:%CCReg<kill>, pred:0
432B		brr_uncond <BB#3>
	    Successors according to CFG: BB#4 BB#3

448B	BB#3: derived from LLVM BB %bb3
	    Predecessors according to CFG: BB#2
464B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
480B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, %a0h<imp-def>, ...
496B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
512B		%vreg13<def> = COPY %a0h; aNh_0_7:%vreg13
528B		%vreg27<def> = mv_ar16_ar16_lo16In32 %vreg13, pred:0, pred:%noreg, 
pred:0, %ac0<imp-use>, %ac1<imp-use>; aN32_0_7:%vreg27 aNh_0_7:%vreg13
544B		%vreg14<def> = COPY %vreg27; aN32_0_7:%vreg14,%vreg27
560B		%vreg6<def> = COPY %vreg14; aN32_rN_Pairs:%vreg6 aN32_0_7:%vreg14
576B		%vreg30<def,dead> = COPY %vreg6; aN32_rN_Pairs:%vreg30,%vreg6
592B		brr_uncond <BB#4>
	    Successors according to CFG: BB#4(?%)

608B	BB#4: derived from LLVM BB %bb13
	    Predecessors according to CFG: BB#2 BB#3
640B		ADJCALLSTACKDOWN 0, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
656B		callr <ga:@f1>, pred:0, pred:%noreg, pred:0, %a0_40<imp-def,dead>, 
%a1_40<imp-def,dead>, %sp<imp-def>, %CCReg<imp-def,dead>, 
%cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
672B		ADJCALLSTACKUP 0, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
704B		brr_uncond <BB#2>
	    Successors according to CFG: BB#2(?%)

720B	BB#5: derived from LLVM BB %bb24
	    Predecessors according to CFG: BB#1 BB#5
736B		%vreg19<def> = clearAcc32 pred:0, pred:%noreg, pred:0; 
aN32_0_7:%vreg19
752B		libcall_CRT_ll_div_r %vreg19, %vreg19, %vreg2, %vreg1, 
%a0_32<imp-def>, %a1_32<imp-def>, %CCReg<imp-def,dead>, ...; 
aN32_0_7:%vreg19,%vreg2,%vreg1
768B		%vreg20<def> = COPY %a0_32; aN32_0_7:%vreg20
784B		%vreg21<def> = COPY %a1_32<kill>; aN32_0_7:%vreg21
800B		%vreg22<def> = xor_a32_a32_a32 %vreg20, %vreg4, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg22,%vreg20,%vreg4
816B		%vreg23<def> = xor_a32_a32_a32 %vreg21, %vreg3, pred:0, 
pred:%noreg, pred:0, %CCReg<imp-def,dead>; aN32_0_7:%vreg23,%vreg21,%vreg3
832B		ADJCALLSTACKDOWN 4, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>
848B		push_any32 %vreg22, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg22
864B		push_any32 %vreg23, pred:0, pred:%noreg, pred:0, %sp<imp-def>, 
%sp<imp-use>; aN32_0_7:%vreg23
880B		%a0_32<def> = COPY %vreg19; aN32_0_7:%vreg19
896B		callr <ga:@f2>, pred:0, pred:%noreg, pred:0, %a0_32, 
%a0_40<imp-def,dead>, %a1_40<imp-def,dead>, %sp<imp-def>, 
%CCReg<imp-def,dead>, %cuc<imp-def,dead>, %sp<imp-use>, %cuc<imp-use>, ...
912B		ADJCALLSTACKUP 4, 0, pred:0, pred:%noreg, pred:0, 
%sp<imp-def,dead>, %sp<imp-use>
928B		%vreg25<def> = mv16Sym_noLo <ga:@g>, pred:0, pred:%noreg, pred:0, 
%ac0<imp-use>, %ac1<imp-use>; rN:%vreg25
944B		mv_a32_r16_rmod1 %vreg19, %vreg25, pred:0, pred:%noreg, pred:0, 
%dc0<imp-use>, %dc1<imp-use>, %ac0<imp-use>, %ac1<imp-use>; 
mem:ST2[@g](align=1) aN32_0_7:%vreg19 rN:%vreg25
960B		brr_uncond <BB#5>
	    Successors according to CFG: BB#5(?%)

# End machine code for function f3.

which looks reasonable to me. The whole progam now compiles succesfully 
even if I turn off the repair thing I added previously.

So, we just need to figure out in what cases we need to do the 
shrinking, when to set ShrinkMainRange to true. Currently it's only done 
in RegisterCoalescer::addUndefFlag called from 
RegisterCoalescer::updateRegDefsUses so something more is needed.

I suppose the reason for only doing the shrinking sometimes is to save 
compilation time? It should always be ok to run shrinkToUses (on virtual 
registers), but in most cases it's not necessary so that's why it's avoided?

So in my code now I do

   // Somewhat brute solution to #10204. Always shrink live ranges for 
virtual
   // registers. Ideally we could identify the exact cases where it's needed
   // but for now we shrink the range for all vregs.
   if (ShrinkMainRange ||
       TargetRegisterInfo::isVirtualRegister(CP.getDstReg())) {
     LiveInterval &LI = LIS->getInterval(CP.getDstReg());
     shrinkToUses(&LI);
   }

and then it seems to work.

Thanks,
Mikael

>
> Thanks,
> Wei.
>


More information about the llvm-commits mailing list