[llvm-commits] Tuning LLVM Greedy Register Allocator to optimize for code size when targeting ARM Thumb 2 instruction set

Tue Jan 24 11:24:09 PST 2012

Hi Jacob, 

Hope this clarifies, see below

From: Jakob Stoklund Olesen [mailto:stoklund at 2pi.dk] 
Sent: Tuesday, January 24, 2012 10:48 AM
To: Chad Rosier
Cc: Zino Benaissa; rajav at codeaurora.org; llvm-commits at cs.uiuc.edu
Subject: Re: [llvm-commits] Tuning LLVM Greedy Register Allocator to
optimize for code size when targeting ARM Thumb 2 instruction set

On Jan 24, 2012, at 10:25 AM, Chad Rosier wrote:

On Jan 23, 2012, at 9:46 PM, Zino Benaissa wrote:

From: Jakob Stoklund Olesen [mailto:stoklund at 2pi.dk] If you don't mind, I
would like you to run a couple of experiments to better understand why this
change improves some benchmarks.

Ø    Sure,  please let me know what you find.

Just to be clear, I believe Jakob was suggesting *you* run the experiments.

Oh, sorry if that wasn't clear.

First of all, is the regHasSizeImpact() hook necessary? Do you get
significantly different results if you pretend this function always returns
2?

Ø    From my experiments,  precision is quite important to maximize code
size gains.

The thing is, the function is using information that isn't yet available at
RA time. For example, you look at <kill> flags, but they will be changed by
the post-RA scheduler moving instructions around. You look at load/store
offsets, but they are not filled in until PEI runs. You can't really know
which instructions can be converted to 2-address form until after RA etc.

So basically, regHasSizeImpact() returns a guess, it has to.

Another guess that is much faster to compute is '2'.

Yes, this is heuristic by definition it is a guess. The way to look at it is
the other way:

1.       If the offset of load/store is too large then dont bother
assigning R0-7

2.       If both operands of ADD are not kill then dont bother assigning
R0-7 

3.       If immediate of ADD is too large dont bother assigning R0-7

The goal is to eliminate  as much as possible candidates that compete for
R0-R7 so that the RA  does a better assignment of R0-R7 (which ultimately
increases 16-bits encoding).

Returning 2 fails to do this. You may  as well return 0 instead of 2 J

1.       you look at <kill> flags, but they will be changed by the post-RA
scheduler moving instructions around  the majority of the cases Destination
operand will reuse the register of  Source 1  operand leading to a 16- bit
encoding.

2.       You look at load/store offsets, but they are not filled in until
PEI runs Is it? What 

I want to know which guess is better, because if there is only a small
difference, we can leave out a lot of code and save compile time.

/jakob

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20120124/50d421e3/attachment.html>