[llvm-commits] X86 FastISel: Emit immediate call arguments locally to save stack size when compiling with -O0

Tue Aug 2 17:43:48 PDT 2011

On Tue, Aug 2, 2011 at 2:01 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>
> On Aug 2, 2011, at 12:55 PM, Ivan Krasin wrote:
>
>> Hi Jacob,
>>
>> On Mon, Aug 1, 2011 at 10:35 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>>
>>> On Aug 1, 2011, at 2:13 PM, Ivan Krasin wrote:
>>>
>>> this patch fixes the FastISel stack allocation strategy issue described
>>> here:
>>> http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-July/041452.html
>>> The solution is to emit immediate int arguments just before the call
>>> (instead of spilling them on stack at the beginning of the function)
>>>
>>> Thanks for working on this, Ivan.
>>> I don't think there is any reason to special-case function arguments. It is
>>> just as bad when fast-isel hoists the immediate in 'x+1'.
>>> I would prefer an approach that handled all immediates.
>> Thanks, it's a good point. I have tried to use double arguments as
>> immediates as well, but FastISel fails on them.
>> I think it also should be fixed, but I would prefer to make it in the next CL.
>
> Floating point constants are probably not important. They will be folded as constant pool loads most of the time, I expect.
>
> What I meant was, don't treat function arguments as a special case. You should handle all integer constants, whether they are used as function arguments or 'add' operands.
>
> Your patch doesn't work or this code, right?
>
>  f(y + 5);
>  f(x + 5);
>
> That '5' gets hoisted to the top of the block and spilled because of the first call. It shouldn't.
>
> Here is how fast isel currently works: It emits instructions bottom-up. When it needs a constant, the code for the constant is emitted to the top of the block, and it makes a note that the constant is now in a virtual register. Further uses of that constant simply get the virtual register.
>
> Here is what you should do: When you need a constant, don't emit the code immediately, but do allocate a virtual register and make a note that the constant needs to be materialized. Whenever you are about to emit a call, materialize all the pending constants. Then insert the call instruction.
>
> That way, virtual registers holding constants will never cross a call instruction, so they won't get spilled (much).
OK, I've got your point and I like it.

>
> For patches like this, please provide measurements of code size and compile time as well.
I agree that metrics are the key to optimizing the code. Here is my
metrics: time and stack used.

/usr/bin/time -f '%U' Release/bin/llc jsinterp.bc -march=x86-64 -O0
0.42

I calculate the amount of stack needed (thx to Rafael for the
suggestion) with the following script:

krasin at krasin$ cat calc_stack.sh
#!/bin/sh

ASM_FILE=$1

(

cat $ASM_FILE | grep "sub.*, %rsp" | sed 's/.*\$\([0-9]*\),.*/\1/' |
awk '{s+=$1} END {print s}'
for i in {1..8}
do
cat $ASM_FILE | grep push | wc -l
done

) | awk '{s+=$1} END {print s}'

Basically, it greps all subtractions from rsp and sums them + 8 * count(pushq).

The example:
./calc_stack.sh ./jsinterp.s
28272

Of course, I will run the test several times to get the median value
and also I need more bitcode files to test than just the example from
Rafael.
(I would be glad to get a suggestion here. By default, I will use
bitcode files from Chromium)

Jakob, are you fine with the proposed metrics?

Ivan

>
> Thanks,
> /jakob
>
>