[llvm-commits] some thoughts on lowering for calling conventions

Tue Sep 14 18:13:57 PDT 2010

On 14 September 2010 12:49, Bob Wilson <bob.wilson at apple.com> wrote:
> This patch and an unrelated comment from Eric got me thinking some more about llvm's handling of calling conventions.  I'm not very happy with our current approach.  The specific issue I'm thinking about now is that we lower either too early or too late.

Nice to have you on board! This has been one of by pet peeves for some time ...

> Lowering in the front-end should be minimized.  It's too early.  Interprocedural analyses and optimizations will suffer. (E.g., when an f64 argument is lowered to a pair of i32 values, it's hard for an analysis to track how that argument is used.) Since we currently support 2 front-ends, it also means that we need to do the front-end lowering in 2 places and keep both of them up to date.  The front-end has to do some lowering in cases where the back-end doesn't have the language-specific information to do the job, but otherwise, I'd like to see the front-end avoid lowering for calling conventions.
>
> Aside from the front-ends, the rest of our calling convention support is handled with selection DAGs.  That is too late.  We'd really like the optimizer to see all the code for splitting up and recombining arguments.  The DAG combiner and machine instruction optimizations clean up some of expanded code, but it's not ideal.  I assume that is the motivation for Rafael's patch here.  Eric is working on fast isel for ARM, and he mentioned to me recently that since fast isel doesn't build selection DAGs, it has to duplicate all the support for lowering calling conventions.

I agree with this in general. Since these were the only two places I
could see the lowering being done, it looked better to have it done in the FE.

The current implementation has some really annoying problems:

*) Has to be reimplemented for fast isel or any other instruction
 selection solution we want to create.

*) Less things are explicit in the IL, so more arch specific knowledge
 is needed in any pass that wants to take advantage of it.

*) I think the IL has multiple ways to represent the "same"
 function. Two examples:

    *) By extending the use of pad arguments it is probably possible
     to remove the alignment of byval attributes.

    *) By having the FE lower floating point argument to integers when
    the ABI mandates that FP values be passed on integer registers we
    might be able to drop most of the extra logic for handing the aapcs_vfp
    calling convention.

*) Doing it all in the DAG produces code that is hard to read. Things
 are a *lot* better these days, but moving the of C specific
 knowledge out would help.

*) There is a mismatch on what the IL means and what it needs. For example,
on x86-64 clang will compile

---------------------------------
struct foo {
        long a;
        long b;
        char c;
};
void f(struct foo x);
void g(void) {
        struct foo y = {1, 2, 3};
        f(y);
}
----------------------------------

into

-----------------------------------------
%struct.foo = type { i64, i64, i8 }
define void @g() nounwind optsize {
entry:
  %agg.tmp = alloca %struct.foo, align 8
  %0 = bitcast %struct.foo* %agg.tmp to i192*
  store i192 1020847100762815390427017310442723737601, i192* %0, align 8
  call void @f(%struct.foo* byval %agg.tmp) nounwind optsize
  ret void
}
declare void @f(%struct.foo* byval)
----------------------------------------

This is bad, because there is no requirement for the caller to have this
data in memory. If the struct was passed with FCA or similar solution then
the callee would suffer from not knowing that the data was already in memory
and something like

void h(struct foo *x);
void f(struct foo x){
        h(&x);
}

Would produce low quality code. Some solution where the call instruction
implicitly does the copy is probably what is needed. That way f, in the above
example can be declared to take a byval argument and knows it is in memory and
the IL for g can use a FCA and not have to allocate stack for it.

*) Function with variable number of arguments have another set of problems.
Hopefully a move to use va_arg will let us avoid some dead stores and maybe
even make it possible to inline some basic va_arg functions :-)

Having two FE is an issue. If we get really serious about lowering in
the FE it might be possible to factor out some of the ABI bits from
clang into a mini library and use it in llvm-gcc.

It not very familiar with IPO to know how hard it is for them to
follow argument splitting, but cannot be as easy as without splitting.

> Can we do better?
>
> If we had a target-specific lowering for calling conventions at the llvm IR level, instead of for selection DAGs, then we could run that lowering pass after any high-level interprocedural analyses and optimizations but before things like instcombine.  That would avoid the need for front-ends to "pre-lower" things like this patch, avoid duplicating effort across front-ends and across selection DAGs and fast-isel, preserve information for high-level passes, and hopefully give us better optimization of the code resulting from lowering.
>
> Thoughts?  (I fully expect there to be many obstacles to such a change, but I'm curious to hear if there's any consensus about the right solution for the long term.  I'm not necessarily advocating that we change anything right now.)

I like it, with the only note that when doing something in the FE could
simplify the IL definition by removing the need for an attribute or a new
calling convention then we should probably do it there.

Cheers,
Rafael