[LLVMdev] Maybe OT -- what is this compiler optimization formally known as?

Bruce Hoult bruce at hoult.org
Mon Nov 17 22:54:04 PST 2014


Real CPUs don't have the ability to directly calculate expressions such as
your nested set of calls. If you write it that way, the internals of the
compiler, and the eventual machine code, will be much more like the first
version.

But (unless you have "volatile", or -O0) there are no loads or stores
there. On any reasonable modern machine the values will all be in
registers. Only var3 will typically even need any special handling to save
it for later, but that will be in a register.

e.g. on Thumb, and assuming this is the entire function, you want to return
var8 as the result, and the vars can be stored in integer registers, and
the expressions are not too complex:

push r4,lr
mov r0,expression1 // more likely an add, sub, etc with r0 as the
destination
mov r1,expression2
bl call // leaves result in r0
mov r4,r0
mov r0,expression3
mov r1,expression4
bl call
bl call
mov r1,r0
mov r0,r4
bl call
pop r4,pc

The only instructions here which do loads or stores to memory are the push
and the pop. Even the calls don't touch memory (unless those functions are
not self-contained and need to call other functions).

Note that in Thumb and ARM code any function is free to do anything it
wants with r0,r1,r2,r3 ("caller save" or "volatile"). They do not need to
be saved, and you can't expect anything you leave there to still be there
after you call another function. r4,r5,r6,r7 ("callee save") you can assume
will still have their original values after you call a function, and if you
use them then you have to save and restore the value you found there,

This is typical of RISC and newer CPU conventions. PowerPC, MIPS, SPARC and
even x86_64 are essentially the same, other than some not using a "link
register" style of function call.


On Tue, Nov 18, 2014 at 5:35 PM, Hayden Livingston <halivingston at gmail.com>
wrote:

> This is actually not a LLVM issue, but more an issue of my higher
> language,that I'm trying to target LLVM on. I'm seeing that if I do a naive
> translation of my language, it generates additional load stores.
>
> For example (in my high level language):
>
>   var1 = expression1;
>   var2 = expression2;
>   var3 = call(var1, var2);
>   var4 = expression3;
>   var5 = expression4;
>   var6 = call(var4, var5);
>   var7 = call(var6);
>   var8 = call(var3, var7);
>
> This can be effectively represented in my high level language as
>
> var8 = call(call(expression1, expression2),
> call(call(expression3, expression4));
>
> When I think about it naively. I would write code that would start from
> var8 and traverse and try to substitute all "vars" with expressions.
>
> I can imagine not always wanting to do this, for example if
> the expression is duplicated, maybe I do want to pay for the load/store.
> And maybe sometimes the semantics are such that I need to reuse it.
>
> Is there a formal algorithm or classical technique that describes this?
>
> Hayden
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141118/57228163/attachment.html>


More information about the llvm-dev mailing list