[PATCH] D15302: [Greedy regalloc] Replace analyzeSiblingValues with something new [Part1]

Wei Mi via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 29 09:47:29 PDT 2016


>>> So even if the compiler says GCN: NumVgprs is 256, there are three
>>>  VGPRs never used.
>>
>>
>> NumVgprs is the number of VGPRs that need to be allocated for the program, so the fact that there are gaps doesn't matter (though this is strange).  If you use only register v255, you still need to allocate all 256 registers.
>>
>>
>
> Hi Tom,
>
> I found with my patch here, the Spill num for the testcase increases
> from 68 to 152, and Reload num increases from 72 to 188. I havn't
> throughly understood what is wrong here, but I can roughly describe
> how the problem happen and say it may be a problem of local splitting,
> instead of my patch.
>
> In the testcase, there are roughly 64 VReg_128 vars overlapping with
> each other consuming all the 256 VGPRs and some other scattered VGPR
> uses. Each VReg_128 var occupies 4 consecutive VGPRs, so VGPR
> registers are allocated in this way: vreg1: VGPR0_VGPR1_VGPR2_VGPR3;
> vreg2: VGPR4_VGPR5_VGPR6_VGPR7; ......
>
> Because we have some other scattered VGPR uses, we cannot allocate all
> the 64 VReg_128 vars in register, so splitting is needed. region
> splitting will not bring trouble because it only tries to fill holes,
> i.e., vregs after the splitting usually will not evict other vregs.
> local splitting can bring a lot of mess to the allocation here.
> Suppose it tries to find a local gap inside BB to split vreg3
> (VReg_128 type). After the local split is done, vreg3 will be splitted
> into vreg3-1 and vreg3-2. vreg3-1 and vreg3-2 have short live ranges
> so both of them have relatively larger weight. vreg3-1 may find a hole
> and is allocated to VGPR2_VGPR3_VGPR4_VGPR5, then vreg3-2 will get a
> hint of  VGPR2_VGPR3_VGPR4_VGPR5 and will evict vreg1
> (VGPR0_VGPR1_VGPR2_VGPR3) and vreg2 (VGPR4_VGPR5_VGPR6_VGPR7) above.
> To find consecutive VGPRs for vreg1 and vreg2, reg alloc will do more
> region splitting/local splitting and more evictions, and causes more
> and more vregs hard to find consecutive VGPRs.
>
> With my patch, it will add one more VReg_128 interval during splitting
> because of hoisting (This is a separate problem I described in a TODO
> about improving hoistCopies in previous reply). To allocate the
> VReg_128 var, it triggers more region splitting and local splitting,
> and makes more vars spilled.
>
> To show the problem, I experimentally turn off local splitting for
> trunk without my patch, the Spill num for the testcase drops from 68
> to 56, and Reload num drops from 72 to 36. When turn off local
> splitting for trunk with my patch, the Spill num for the testcase
> drops from 152 to 24, and Reload num drops from 188 to 24.
>
> So this is probably a separate issue for architecture using
> consecutive combined registers for large data type.
>
> Thanks,
> Wei.


Hi Tom,

Do you think the issue is a blocker for this patch or a separated one?
Want to get your confirmation so I can decide how to push the work
forward.

As for using 254 VGPRs instead of 256 VGPRs, I think it just cannot
find 4 consecutive VGPRs for VReg_128 data. The holes in the end (v254
v255) have no difference with holes in the middle. Is it correct?

Thanks,
Wei.


More information about the llvm-commits mailing list