[flang-dev] [llvm-dev] [cfe-dev] Clang/LLVM function ABI lowering (was: Re: [RFC] Refactor Clang: move frontend/driver/diagnostics code to LLVM)

Thu Jun 4 14:45:15 PDT 2020

On 4 Jun 2020, at 0:54, James Y Knight via llvm-dev wrote:
> While MLIR may be one part of the solution, I think it's also the case 
> that
> the function-ABI interface between Clang and LLVM is just wrong and 
> should
> be fixed -- independently of whether Clang might use MLIR in the 
> future.
>
> I've mentioned this idea before, I think, but never got around to 
> writing
> up a real proposal. And I still haven't. Maybe this email could 
> inspire
> someone else to work on that.
>
> Essentially, I'd like to see the code in Clang responsible for 
> function
> parameter-type mangling as part of its ABI lowering deleted. 
> Currently,
> there is a secret "LLVM IR" ABI used between Clang and LLVM, which 
> involves
> expanding some arguments into multiple arguments, adding a smattering 
> of
> "inreg" or "byval" attributes, and converting some types into other 
> types.
> All in a completely target-dependent, complex, and undocumented 
> manner.
>
> So, while the IR function syntax appears at first glance to be generic 
> and
> target-independent, that's not at all true. Sadly, in some cases, 
> clang
> must even know how many registers different calling conventions use, 
> and
> count numbers of available registers left, in order to choose the 
> right set
> of those "generic" attributes to put on a parameter.
>
> So: not only does a frontend need to understand the C ABI rules, they 
> also
> need to understand that complex dance for how to convert that into 
> LLVM IR
> -- and that's both completely undocumented, and a huge mess.
>
> Instead, I believe clang should always pass function parameters in a
> "naive" fashion. E.g. if a parameter type is "struct X", the llvm 
> function
> should be lowered to LLVM IR with a function parameter of type 
> %struct.X.
> The decision on whether to then pass that in a register (or multiple
> registers), on the stack, padded and then passed on the stack, etc, 
> should
> be the responsibility of LLVM. Only in the case of C++ types which 
> *must* be
> passed indirectly for correctness, independent of calling convention 
> ABI,
> should clang be explicitly making the decision to pass indirectly.
>
> Of course, the tricky part is that LLVM doesn't -- and shouldn't -- 
> have
> the full C type system available to it, and the full C type system
> typically is required to evaluate the ABI rules (e.g., distinguishing 
> a
> "_Complex float" from a struct containing two floats).
>
> Therefore, in order to communicate the correct ABI information to 
> LLVM, I'd
> like clang to also emit *explicitly-ABI-specific* data (metadata?),
> reflecting the extra information that the ABI rules require the 
> backend to
> know about the type. E.g., for X86_64, clang needs to inform LLVM of 
> the
> classification for each parameter's type into MEMORY, INTEGER, SSE, 
> SSEUP,
> X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to inform 
> LLVM
> when a structure should be treated as a "homogenous aggregate" of
> floating-point or vector type. (In both cases, that information cannot
> correctly be extracted from the LLVM IR struct type, only from the C 
> type
> system.)

These attributes would have to spell out the exact expected treatment by 
the backend in essentially every aggregate case, and the frontend would 
have to carefully select that treatment, and for many ABIs that would 
still require counting registers and so on.  I do actually like this 
approach in many ways, because it provides a path to a world where the 
backend stop permissively compiling everything the frontend throws at it 
and instead emits an error if the frontend asks for something that 
can’t be done, but it’s not going to make things more abstract.

Having worked in this space for years, I am convinced that there are two 
meaningful points for ABI lowering: (1) the high-level source-language 
information and (2) the low-level register and stack conventions.  (1), 
for C interop, is always going to be duplicative of Clang.  You can 
introduce an intermediate library and make Clang copy all relevant 
information out of its AST into that library’s type system, but 
fundamentally “all relevant information” is going to just keep 
expanding and expanding, and Clang is still going to have a ton of 
target-specific ABI lowering code to do that propagation.

John.