[cfe-dev] Clang/LLVM function ABI lowering

Tue Jun 9 02:49:05 PDT 2020

On 04/06/2020 05:54, James Y Knight via cfe-dev wrote:
> Essentially, I'd like to see the code in Clang responsible for function 
> parameter-type mangling as part of its ABI lowering deleted. Currently, 
> there is a secret "LLVM IR" ABI used between Clang and LLVM, which 
> involves expanding some arguments into multiple arguments, adding a 
> smattering of "inreg" or "byval" attributes, and converting some types 
> into other types. All in a completely target-dependent, complex, and 
> undocumented manner.

This has been my biggest complaint about LLVM for almost 15 years. 
We've had long discussions about it and I don't think anyone is actually 
happy with this situation.  Unfortunately, the path forwards isn't quite 
so clear cut.

> So, while the IR function syntax appears at first glance to be generic 
> and target-independent, that's not at all true. Sadly, in some cases, 
> clang must even know how many registers different calling conventions 
> use, and count numbers of available registers left, in order to choose 
> the right set of those "generic" attributes to put on a parameter.
> 
> So: not only does a frontend need to understand the C ABI rules, they 
> also need to understand that complex dance for how to convert that into 
> LLVM IR -- and that's both completely undocumented, and a huge mess.

Completely agreed.  It also leads so some subtle gotchas for other front 
ends.  For example, what is the most efficient way of returning a pair 
of i32s?  Typically, it's packed in an i64, because most 32-bit back 
ends will use both of their ABI's return registers for an i64.

This also causes problems for optimisations, because they have to 
understand the special semantics of sret, the fact that a pair of 
pointers may be packed into an i64 for return (e.g. 
i386-unknown-freebsd) but that shouldn't require their alias analysis to 
treat them as having escaped from the type system, and so on.

> Instead, I believe clang should always pass function parameters in a 
> "naive" fashion. E.g. if a parameter type is "struct X", the llvm 
> function should be lowered to LLVM IR with a function parameter of type 
> %struct.X. The decision on whether to then pass that in a register (or 
> multiple registers), on the stack, padded and then passed on the stack, 
> etc, should be the responsibility of LLVM. Only in the case of C++ types 
> which /must/ be passed indirectly for correctness, independent of 
> calling convention ABI, should clang be explicitly making the decision 
> to pass indirectly.

C++ is not the only place where this causes problems.  A few off the top 
of my head:

  - Bitfield layout is target-ABI dependent.
  - Explicitly aligned fields introduce padding.
  - _Atomic-qualified types may be differently aligned than their 
non-_Atomic variants.
  - Unions need lowering to something else before they can be expressed 
in LLVM IR at all
  - In some situations, C semantics make structure padding important 
(e.g. for structure equality comparison), so the decision on whether it 
needs copying is nontrivial.

One of the proposed solutions was to factor this logic out of Clang and 
expose it as a set of builders, with enough of the type system to be 
able to handle these cases.

> Of course, the tricky part is that LLVM doesn't -- and shouldn't -- have 
> the full C type system available to it, and the full C type system 
> typically is required to evaluate the ABI rules (e.g., distinguishing a 
> "_Complex float" from a struct containing two floats).

I think I'd phrase that somewhat differently.  C/C++ ABIs are defined in 
terms of C types.  Whatever does the lowering *must* have access to the 
full C type system.  That doesn't mean that the LLVM type system must 
have to non-ambiguously, natively, represent the entire C type system, 
only that it must be able to carry that semantic information to the back 
end somehow.

One proposal to remove the implicit contracts with the back ends was to 
expose explicit register and stack controls in the IR, so that functions 
would be tagged with attributes so that the front end would handle the 
lowering to specific registers.

This approach is an improvement for front end developers (they need to 
read the ABI docs for targets that they want to support, but they don't 
need to understand the implicit and undocumented mapping of that into 
LLVM IR), reduced work for back-end developers (they don't need to know 
much about calling conventions, the front end explicitly tells them 
where to put parameters and where to find return values), but imposes 
extra work on optimisations to preserve this (or not, but to understand 
when they can change it and how: setting the calling convention to 
fast_cc is a lot easier than manually tweaking the set of registers that 
are used for parameters).

> Therefore, in order to communicate the correct ABI information to LLVM, 
> I'd like clang to also emit /explicitly-ABI-specific/ data (metadata?), 
> reflecting the extra information that the ABI rules require the backend 
> to know about the type. E.g., for X86_64, clang needs to inform LLVM of 
> the classification for each parameter's type into MEMORY, INTEGER, SSE, 
> SSEUP, X87, X87UP, COMPLEX_X87. Or, for PPC64 elfv2, Clang needs to 
> inform LLVM when a structure should be treated as a "homogenous 
> aggregate" of floating-point or vector type. (In both cases, that 
> information cannot correctly be extracted from the LLVM IR struct type, 
> only from the C type system.)

Metadata can be lost at arbitrary points, so isn't sufficient for this. 
Function attributes would be adequate, if we only care about calling 
conventions and not things like structure layout being handled here.

> We should document what data is needed, for each architecture/abi. This 
> required data should be as straightforward an application of the ABI 
> document's rules as possible -- and be only the minimum data necessary.

I think the minimum first step would be documenting what those 
conventions are currently.

> If this is done, frontends (either a new one, or Clang itself) who want 
> to use the C ABI have a significantly simpler task. It remains 
> non-trivial -- you do still need to understand ABI-specific rules, and 
> write ABI-specific code to generate ABI-specific metadata. But, at least 
> the interface boundary has become something which is 
> readily-understandable and implementable based on the ABI documents.

This does seem like it will simplify parts of the clang / LLVM 
interface.  How will it affect other ABIs?  For example, the Haskell, 
HiPE, and Swift calling conventions are not defined in terms of C types, 
will they also need to define that same set of lowering instructions? 
What would those look like?

> All that said, an MLIR encoding of the C type system can still be useful 
> -- it could contain the code which distills the C types into the 
> ABI-specific metadata. But, I  see that as less important than getting 
> the fundamentals in LLVM-IR into a better shape. Even frontends without 
> a C type system representation should still be able to generate LLVM IR 
> which conforms in their own manner to the documented ABIs -- without it 
> being super painful. Also, the code in Clang now is really confusing, 
> and nearly unmaintainable; it would be a clear improvement to be able to 
> eliminate the majority of it, not just move it into an MLIR dialect.

I am less convinced that the code could be eliminated (equivalent logic 
would be needed, at least).  I am; however, hugely in favour of moving 
it closer to the back ends so that someone maintaining a Target doesn't 
need to also maintain code in Clang's CodeGen layer to do part of the 
lowering.

David