[llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated
Eli Friedman via llvm-dev
llvm-dev at lists.llvm.org
Mon Jan 27 18:47:51 PST 2020
Reply inline. (Sorry about the formatting; I can't figure out how to avoid destroying it in Outlook.)
From: Reid Kleckner <rnk at google.com>
Sent: Monday, January 27, 2020 4:58 PM
To: Eli Friedman <efriedma at quicinc.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: [EXT] Re: [llvm-dev] [RFC] Replacing inalloca with llvm.call.setup and preallocated
>> “llvm.call.setup must have exactly one corresponding call site”: Normal IR rules would allow cloning the call site (in jump threading), or erasing the call site (if there’s a noreturn call in an argument). What’s the benefit of enforcing this rule, as opposed to just saying all the call sites must have the same signature?
> I think we could cope with unreachable code elimination deleting a paired call site (zero or one), but code duplication creating a second call site could be problematic. The call setup doesn't describe the prototype of the main call site, so if there were multiple call sites, the backend would have to pick one call site arbitrarily or compare the call sites when setting up the call. If there are zero call sites, the backend can create static allocas of the appropriate type to satisfy the allocations. Of course, an IR pass (instcombine?) should do this transform first if it sees it. Maybe we could have CGP take care of it, too.
It doesn’t seem like multiple call sites should be a problem if they’re sufficiently similar? If the argument layout for each callsite is the same, it doesn’t matter which callsite the backend chooses to compute the layout.
> Nested setup is OK, but the verifier rule that there must be a paired call site should make it impossible to do in a loop. I guess we should have some rule to reject the following:
%cs1 = llvm.call.setup()
%cs2 = llvm.call.setup()
call void @cs1() [ "callsetup"(token %cs1) ]
call void @cs2() [ "callsetup"(token %cs2) ]
I think in general, there can be arbitrary control flow between a token and its uses, as long as the definition dominates the use. So you could call llvm.call.setup repeatedly in a loop, then call some function using the callsetup token in a different loop, unless some rule specific to callsetup forbids it.
It would be nice to make the rules strong enough to ensure we can statically compute the size of the stack frame at any point (assuming no dynamic allocas). Code generated by clang would be statically well-nested, I think; not sure how hard it would be to ensure optimizations maintain that invariant.
Connecting nested llvm.call.setups using tokens might make it easier for passes to reason about the nesting, since the region nest would be explicitly encoded.
>> How does this interact with other dynamic stack allocations? Should we switch VLAs to use a similar mechanism? (The problems with dynamic alloca in general aren’t as terrible, but it might still benefit: for example, it’s much easier to transform a dynamic allocation into a static allocation.)
> VLAs could use something like this, but they are generally of unknown size while call sites have a known fixed size. I think that makes them pretty different.
I don’t think we need to implement it at the same time, but the systems would interact, so it might be worth planning out.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the llvm-dev