[llvm] r204690 - Register Allocator: check other options before using a CSR for the first time.

Duncan P. N. Exon Smith dexonsmith at apple.com
Mon Apr 7 09:18:10 PDT 2014


On 2014 Apr 4, at 22:46, Duncan P. N. Exon Smith <dexonsmith at apple.com> wrote:

> On 2014 Mar 24, at 17:16, Manman Ren <manman.ren at gmail.com> wrote:
> 
>> Author: mren
>> Date: Mon Mar 24 19:16:25 2014
>> New Revision: 204690
>> 
>> URL: http://llvm.org/viewvc/llvm-project?rev=204690&view=rev
>> Log:
>> Register Allocator: check other options before using a CSR for the first time.
>> 
>> When register allocator's stage is RS_Spill, we choose spill over using the CSR
>> for the first time, if the spill cost is lower than CSRCost. 
>> When register allocator's stage is < RS_Split, we choose pre-splitting over
>> using the CSR for the first time, if the cost of splitting is lower than
>> CSRCost.
>> 
>> CSRCost is set with command-line option "regalloc-csr-first-time-cost". The
>> default value is 0 to generate the same codes as before this commit.
>> 
>> With a value of 15 (1 << 14 is the entry frequency), I measured performance
>> gain of 3% on 253.perlbmk and 1.7% on 197.parser, with instrumented PGO,
>> on an arm device.
>> 
>> rdar://16162005
>> 
>> Added:
>> llvm/trunk/test/CodeGen/AArch64/ragreedy-csr.ll
>> Modified:
>> llvm/trunk/lib/CodeGen/RegAllocGreedy.cpp
> 
> Hi Manman,
> 
> This commit relies on the entry frequency being 1<<14, but the API
> of BlockFrequencyInfo does not guarantee anything about the entry
> frequency.  While the current implementation happens to use 1<<14,
> the patch I'm working on sets it based on how branchy a particular
> function is, which causes this test to fail.
> 
> The cost value (e.g., 15) needs to be compared to a ratio between
> the actual entry frequency and the block frequency in question.
> 
> I'm happy to fix this, but there are a couple of possible directions
> and I'm not sure which is best.
> 
> 1. Keep the meaning (and name) of CSRCost and
>    -regalloc-csr-first-time-cost unchanged.  This requires scaling
>    the cost by the ratio of entry frequency to 1<<14.

After looking at the code it's interfacing with, it looks like option
(1) is superior.  I've attached a patch that implements that.

However, the test is still failing after the patch.  Digging deeper, the
BFI pass in trunk assigns block frequencies of 0 to the false branches (e.g., cond.false.i.i) due to loss of precision.  Somehow this (or another
change?) changes the control flow in calcGlobalSplitCost().  In particular,
the loop through Cand.ActiveBlocks finds that upper.exit has no RegIn, so
its relatively large block frequency gets added to GlobalCost.

I'm still looking into it, but let me know if you have any insight.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: account-for-variable-entry-freq.patch
Type: application/octet-stream
Size: 6734 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140407/d4b4829f/attachment.obj>


More information about the llvm-commits mailing list