[llvm-commits] some thoughts on lowering for calling conventions

Fri Sep 17 09:58:14 PDT 2010

On Sep 14, 2010, at 6:13 PM, Rafael Espíndola wrote:

> On 14 September 2010 12:49, Bob Wilson <bob.wilson at apple.com> wrote:
>> This patch and an unrelated comment from Eric got me thinking some more about llvm's handling of calling conventions.  I'm not very happy with our current approach.  The specific issue I'm thinking about now is that we lower either too early or too late.
> 
> Nice to have you on board! This has been one of by pet peeves for some time ...

It (obviously) bothers me, too, but I hesitate to say that I'm "on board".  After reflecting on this for a few days, my current thought is that calling convention issues should be handled later in the compiler.  It is generally a good thing for the higher-level IR to be as simple and uncluttered as possible.  I'd like to move away from having frontends generate code with:

* Pad arguments
* Aggregates split into scalars
* Large scalars (e.g., f64) split into smaller scalars

These are unnecessary complications for interprocedural passes and they make the IR cluttered and harder to deal with in general.

In light of that, I'm less and less enthusiastic about your recent patch that splits up 64-bit arguments for AAPCS.  Can you provide some more information about the motivation?  Do you have any data to show whether it makes a difference?  I can imagine scenarios where it might actually be harmful (e.g., if the additional extends and shifts are not removed after inlining a function).

Some more comments below.

> 
>> Lowering in the front-end should be minimized.  It's too early.  Interprocedural analyses and optimizations will suffer. (E.g., when an f64 argument is lowered to a pair of i32 values, it's hard for an analysis to track how that argument is used.) Since we currently support 2 front-ends, it also means that we need to do the front-end lowering in 2 places and keep both of them up to date.  The front-end has to do some lowering in cases where the back-end doesn't have the language-specific information to do the job, but otherwise, I'd like to see the front-end avoid lowering for calling conventions.
>> 
>> Aside from the front-ends, the rest of our calling convention support is handled with selection DAGs.  That is too late.  We'd really like the optimizer to see all the code for splitting up and recombining arguments.  The DAG combiner and machine instruction optimizations clean up some of expanded code, but it's not ideal.  I assume that is the motivation for Rafael's patch here.  Eric is working on fast isel for ARM, and he mentioned to me recently that since fast isel doesn't build selection DAGs, it has to duplicate all the support for lowering calling conventions.
> 
> I agree with this in general. Since these were the only two places I
> could see the lowering being done, it looked better to have it done in the FE.
> 
> The current implementation has some really annoying problems:
> 
> *) Has to be reimplemented for fast isel or any other instruction
> selection solution we want to create.

Yes, this is a nuisance.  I'd still like to see a calling convention lowering pass that could be shared for fast-isel and selection DAGs, but that would be tough because we currently have no way to represent some of the necessary constructs (e.g., physical machine registers) in llvm IR.

> 
> *) Less things are explicit in the IL, so more arch specific knowledge
> is needed in any pass that wants to take advantage of it.

This cuts both ways.  If you expose the low-level details too early, it will make higher-level analyses more difficult.  Making _everything_ explicit in the front-end is not good.

The argument splitting you've done is closely related to type legalization.  Since we're currently doing type legalization on selection DAGs, we have pretty good support for optimizing (e.g., in the DAG combiner) the kind of code that results from splitting up 64-bit values.  If you're seeing cases where you get better code by splitting the arguments in the front-end, that suggests there may be weaknesses in the DAG optimization that we should investigate.  If you have examples of that, I'd like to take a look at them.

> 
> *) I think the IL has multiple ways to represent the "same"
> function. Two examples:
> 
>    *) By extending the use of pad arguments it is probably possible
>     to remove the alignment of byval attributes.
> 
>    *) By having the FE lower floating point argument to integers when
>    the ABI mandates that FP values be passed on integer registers we
>    might be able to drop most of the extra logic for handing the aapcs_vfp
>    calling convention.

I really dislike both of these ideas.  Sorry.  They might simplify some things, but they cause other problems.

> 
> *) Doing it all in the DAG produces code that is hard to read. Things
> are a *lot* better these days, but moving the of C specific
> knowledge out would help.

I don't think I understand this point.  What is it that is hard to read?

> 
> *) There is a mismatch on what the IL means and what it needs. For example,
> on x86-64 clang will compile
> 
> ---------------------------------
> struct foo {
>        long a;
>        long b;
>        char c;
> };
> void f(struct foo x);
> void g(void) {
>        struct foo y = {1, 2, 3};
>        f(y);
> }
> ----------------------------------
> 
> into
> 
> -----------------------------------------
> %struct.foo = type { i64, i64, i8 }
> define void @g() nounwind optsize {
> entry:
>  %agg.tmp = alloca %struct.foo, align 8
>  %0 = bitcast %struct.foo* %agg.tmp to i192*
>  store i192 1020847100762815390427017310442723737601, i192* %0, align 8
>  call void @f(%struct.foo* byval %agg.tmp) nounwind optsize
>  ret void
> }
> declare void @f(%struct.foo* byval)
> ----------------------------------------
> 
> This is bad, because there is no requirement for the caller to have this
> data in memory. If the struct was passed with FCA or similar solution then
> the callee would suffer from not knowing that the data was already in memory
> and something like
> 
> void h(struct foo *x);
> void f(struct foo x){
>        h(&x);
> }
> 
> Would produce low quality code. Some solution where the call instruction
> implicitly does the copy is probably what is needed. That way f, in the above
> example can be declared to take a byval argument and knows it is in memory and
> the IL for g can use a FCA and not have to allocate stack for it.

OK.

> 
> *) Function with variable number of arguments have another set of problems.
> Hopefully a move to use va_arg will let us avoid some dead stores and maybe
> even make it possible to inline some basic va_arg functions :-)

Yes, I haven't looked closely at the va_arg features in llvm but from what I've seen, it definitely looks like a step in the right direction.

> 
> Having two FE is an issue. If we get really serious about lowering in
> the FE it might be possible to factor out some of the ABI bits from
> clang into a mini library and use it in llvm-gcc.

I don't at all want to go in that direction.  The front-ends should only have to do the lowering that cannot be done later (due to language-specific information not available in the IR).

> 
> It not very familiar with IPO to know how hard it is for them to
> follow argument splitting, but cannot be as easy as without splitting.

It's not.

> 
>> Can we do better?
>> 
>> If we had a target-specific lowering for calling conventions at the llvm IR level, instead of for selection DAGs, then we could run that lowering pass after any high-level interprocedural analyses and optimizations but before things like instcombine.  That would avoid the need for front-ends to "pre-lower" things like this patch, avoid duplicating effort across front-ends and across selection DAGs and fast-isel, preserve information for high-level passes, and hopefully give us better optimization of the code resulting from lowering.
>> 
>> Thoughts?  (I fully expect there to be many obstacles to such a change, but I'm curious to hear if there's any consensus about the right solution for the long term.  I'm not necessarily advocating that we change anything right now.)
> 
> I like it, with the only note that when doing something in the FE could
> simplify the IL definition by removing the need for an attribute or a new
> calling convention then we should probably do it there.

I much prefer adding the attributes or other information in the IR to allow the calling convention lowering to be done later.  This will be important as llvm gains more advanced interprocedural passes.