[llvm-commits] X86 FastISel: Emit immediate call arguments locally to save stack size when compiling with -O0

Thu Aug 11 09:36:21 PDT 2011

On Wed, Aug 10, 2011 at 4:16 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Aug 10, 2011, at 4:01 PM, Ivan Krasin wrote:
>
>> Hi Jacob,
>>
>> here is the new version of the patch. Now, the local cache does not
>> cross the function call (with the exception of intrinsics, since they
>> tend to be inlined).
>> I have found it hard to emit definition at last use w/o slowing down
>> -O0 build (which is inapropriate). Please, find the patch attached.
>
> Eric, could you take a look at this patch when you have the time, please?
>
>> I have also ran my small benchmark tool over test-suite bitcode files.
>> The full log is attached to the e-mail, below are some excerpts.
>> first toolchain is the toolchain with the patch
>> second toolchain is the unmodified llvm toolchain.
>>
>> The command line I used to run llc:
>> Release/bin/llc -O0 -stats --time-passes -relocation-model=pic -O0
>> -asm-verbose=false
>
> What architecture did you test?
x86_64-unknown-linux-gnu

>
>> stdin/stdout/stderr were redirected to my tester. Every test has been
>> ran once, so Seconds and WallSeconds could slightly differ if I ran
>> the tool again.
>
> I'd rather not look through 700 K of raw data. Please provide a summary, including the worst regressions.
https://spreadsheets.google.com/a/google.com/spreadsheet/ccc?key=0Ao3TPgpIlZ9ddHVpeDVCSzZHYXlJOTc0N1VKX1BFNVE&hl=en_US#gid=0

There're few major regressions which I believe a show stopper for this
patch (the top of the spreadsheet above):

objinst.llvm.bc	272	344	0	0.0027	245	184	0	0.0026	11.02%	86.96%	0.10%
cast.linked.bc	1053	1224	0	0.0058	924	664	0	0.0051	13.96%	84.34%	0.67%
cast.llvm.bc	1053	1224	0	0.0057	926	680	0	0.005	13.71%	80.00%	0.67%
objinst.linked.bc	320	360	0	0.0038	301	232	0	0.0036	6.31%	55.17%	0.19%

The second column from right is stack size increase if patch is applied.

All of these examples could be described as "one global constant is
used across many calls in one function".
Probably, the fix should be "don't apply an optimization if the block
uses global variables", but I'm not sure if it would help too much.

What do you think?

Ivan
>
> /jakob
>
>