[LLVMdev] Passing and returning aggregates (who is responsible for the ABI?)
sabre at nondot.org
Tue Nov 6 09:10:05 PST 2007
On Tue, 6 Nov 2007, Christophe de Dinechin wrote:
>> Probably in good part because, in LLVM, aggregates (or derived types)
>> types exist only in memory, not in registers.
> Thanks, that's precisely where I see a problem. On many recent
> architectures (Itanium being the extreme case), small enough
> aggregates are passed and held in registers. Thinking or designing
> "aggregates == memory" is an obsolete approach ;-) I like the "call"
> instruction because, at least, it got rid of the "arguments == push
> to stack" approach you find in the Java or MISL bytecodes...
Sure. However, an IR is an abstraction layer, it doesn't
necessarily specify how it gets mapped onto the hardware. Also, a variety
of optimizations kick in to improve the code in various ways. For
example, LLVM contains a "scalar replacement of aggregates" pass, which
breaks up aggregates in memory into registers when possible. This is
particularly important for C++ code, which uses lots of small aggregates.
If you have large aggregates, it is almost always better to put them in
memory than in registers.
> As an aside, why do I care? I wanted XL to be efficient on modern
> architectures, so I got rid of "implicit memory accesses" as much as
> I could, e.g. no "this pointer". At one point, I compiled a simple
> program manipulating complex numbers to draw a Julia set. At the
> lowest level of optimization, the XL version was at least 70% faster
> than the C++ version.
Have you tried compiling the C++ version with llvm-gcc? :) The complex
number should certainly be promoted to live in FP registers.
> Why? Because the user-defined complex operations in XL were all done
> in registers, whereas at that level of optimization, the C++ compiler
> was not doing the memory aliasing analysis required to perform
> "register field promotion", elimintate the "this pointer", and turn
> the C++ complex class into registers. In other words, a complex
> addition was 4 loads, two fp adds, and 2 stores for C++, as opposed
> to only the fp adds for XL. Obviously, an IR assuming that aggregates
> are in memory does not help here.
LLVM is designed to do these sorts of things, and it is very good at it.
The only significant current problem is when you have aggregates that are
passed ore returned through function calls. In this case (assuming the
call is not inlined) the optimizer is not able to promote the value from
memory into registers. This is why we want to extend LLVM to support this
in a first-class way.
More information about the llvm-dev