<!DOCTYPE html>

<html>

<head>

<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">

</head>

<body>

<div style="font-family:sans-serif"><div style="white-space:normal">

<p dir="auto">On 4 Jun 2020, at 22:45, James Y Knight wrote:</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">On Thu, Jun 4, 2020 at 5:45 PM John McCall <rjmccall@apple.com> wrote:</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">These attributes would have to spell out the exact expected treatment by<br>

the backend in essentially every aggregate case, and the frontend would<br>

have to carefully select that treatment, and for many ABIs that would<br>

still require counting registers and so on.</p>

</blockquote><p dir="auto">I don't have all the ABIs memorized, but I don't think it would be the case<br>

that the frontend would need to count registers for any of the ABIs I know<br>

of.</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">Well, worst case, I suppose that either such targets would have to do<br>

something special in the frontend like they do now, or they’d need to<br>

use more fine-grained attributes than maybe the ABI suggests.  We’d need<br>

some of the latter anyway — I think there’s some weird situation on<br>

x86-64 where Clang passes some aggregates in both integer and FP registers<br>

due to an early bug (or possibly an ambiguity in the ABI?).</p>


<p dir="auto">I agree that ABIs in practice lower types in a position-invariant way<br>

and then check if they’ve run out of registers.</p>


<p dir="auto">Obviously the frontend would need to continue handling mandatory-indirect<br>

cases like non-trivial C++ types.  Would the frontend handle other<br>

indirect cases on targets like ARM64 that use indirect parameters instead<br>

of the stack argument area for large aggregates, or would the frontend<br>

just mark the argument as <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">abi(“indirect”)</code> and let it be handled by<br>

the backend?</p>


<p dir="auto">What would this actually look like in IR?  Something like this?</p>


<pre style="background-color:#F7F7F7; border-radius:5px 5px 5px 5px; margin-left:15px; margin-right:15px; max-width:90vw; overflow-x:auto; padding:5px" bgcolor="#F7F7F7"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0" bgcolor="#F7F7F7">  %tmp = alloca %MyType, align 8

  call void @MakeMyType(sret %MyType* %tmp)

  %arg = load %MyType, %MyType* %tmp, align 8

  call void @UseMyTypee(abi(“sse”) align 8 %MyType %arg)

</code></pre>


<p dir="auto">Or would we stop using <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">sret</code> as well, and this would just be:</p>


<pre style="background-color:#F7F7F7; border-radius:5px 5px 5px 5px; margin-left:15px; margin-right:15px; max-width:90vw; overflow-x:auto; padding:5px" bgcolor="#F7F7F7"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0" bgcolor="#F7F7F7">  %tmp = alloca %MyType, align 8

  %ret = call abi(“sse”) align 8 %MyType @MakeMyType()

  store %MyType %rete, %MyType* %tmp, align 8

  %arg = load %MyType, %MyType* %tmp, align 8

  call void @UseMyTypee(abi(“sse”) align 8 %MyType %arg)

</code></pre>


<p dir="auto">John.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">I see this as consisting of two independent pieces:<br>

1. Examining the parameter types, and distilling the important information<br>

about each type, *for a given ABI*, into a blob of ABI-specific data.<br>

2. Actually choosing whether to pass a given parameter in a register, or on<br>

the stack, or split up the parameter into multiple registers, etc.<br>

<br>

Step 1 should be done within Clang. The amount of data generated from this<br>

step, for the ABIs I'm familiar with, is small, and can be derived based<br>

only on the frontend type (not location in parameter list, etc).<br>

Step 2 should be done within LLVM, based on the data passed down in the IR.<br>

This of course does need to count registers, among other things.<br>

<br>

So, taking an example from the RISC-V ABI. Given an argument of type:<br>

  struct X { short s; double d; };<br>

Or, similarly,<br>

  struct X __attribute__((packed)) { struct { short i; } s[1]; double d; };<br>

  struct X { short s; double __attribute__((aligned(256))) d; };<br>

<br>

Clang would need to encode metadata saying that this type may be able to be<br>

passed via "INT+FLOAT" register-passing, having the INT of size 2 at offset<br>

0, and FLOAT of size 8 at offset 8/2/256 respectively, for the 3 types<br>

above. (Or maybe the metadata should store a GEP path, rather than<br>

size+offset?)<br>

<br>

Then, LLVM, seeing an argument with the INT+FLOAT ABI rule, would allocate<br>

it to registers/stack as follows:<br>

1. If you're using hardware float, and FLEN >= 8, and XLEN >= 2, and if<br>

there is at least one floating point and one integer register available,<br>

then: Copy the data at the provided offsets into one floating point<br>

register and one integer register (with bits beyond the integer size<br>

undefined).<br>

2. Otherwise, fallback to common aggregate handling rules:<br>

  a. If size is < XLEN,<br>

    i. and if there's 1 integer register available: Pass the struct (as<br>

laid out in memory) in an integer register.<br>

    ii. otherwise: Pass on stack, with alignment min(stack_alignment,<br>

max(type_alignment, XLEN))<br>

  b. If size < XLEN*2,<br>

    i. and there are 2 registers available: Pass the struct (as laid out in<br>

memory) in two integer registers.<br>

    ii. and there is 1 integer register available: Pass XLEN-sized half the<br>

struct in a register, and the other XLEN-sized half on the stack.<br>

    iii. otherwise: Pass the aggregate on stack, with alignment as before.<br>

  c. Otherwise, "pass by reference" -- make a copy on the stack outside the<br>

parameter-passing area, aligned appropriately for its type and then pass a<br>

pointer to that memory in the usual way for passing a scalar.<br>

(leaving out the varargs rules for simplicity).<br>

<br>

There's a lot of rules there, but the frontend shouldn't need to know about<br>

almost all of it -- the frontend only needs to evaluate whether the struct<br>

type matches the specification for INT+FLOAT (and so on, for the other<br>

categories of special handling), and encode that categorization into the<br>

IR.<br>

<br>

Unfortunately, today, Clang *does* know all those rules I listed above --<br>

and LLVM *also* has to know most of them! This is not a good situation.<br>

<br>

I do actually like this</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">approach in many ways, because it provides a path to a world where the<br>

backend stop permissively compiling everything the frontend throws at it<br>

and instead emits an error if the frontend asks for something that<br>

can’t be done, but it’s not going to make things more abstract.</p>

</blockquote><p dir="auto">It doesn't make things more abstract, no. There's still going to be<br>

ABI-specific code in the frontend. But, it separates the concerns better,<br>

and can make the IR required from a frontend more clearly derived from the<br>

ABI.<br>

<br>

</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">Having worked in this space for years, I am convinced that there are two<br>

meaningful points for ABI lowering: (1) the high-level source-language<br>

information and (2) the low-level register and stack conventions.  (1),<br>

for C interop, is always going to be duplicative of Clang.  You can<br>

introduce an intermediate library and make Clang copy all relevant<br>

information out of its AST into that library’s type system, but<br>

fundamentally “all relevant information” is going to just keep<br>

expanding and expanding, and Clang is still going to have a ton of<br>

target-specific ABI lowering code to do that propagation.</p>

</blockquote><p dir="auto">I definitely think it's infeasible to provide all possibly-relevant<br>

information about the frontend language type to LLVM in a ABI-independent<br>

manner. But, providing ABI-specific metadata makes the problem<br>

feasible, because for any particular ABI, the set of parameters derived<br>

from the frontend type system will be small.</p>

</blockquote></div>

<div style="white-space:normal">

</div>

</div>

</body>

</html>