[PATCH] D74094: Reapply: [IRGen] Emit lifetime intrinsics around temporary aggregate argument allocas

Wed Sep 20 09:38:33 PDT 2023

nickdesaulniers added a comment.

In D74094#4648319 <https://reviews.llvm.org/D74094#4648319>, @rjmccall wrote:

>> That's better than the status quo (no lifetime markers at all) but still suboptimal, and I don't think it will solve the particular case I care about (a particular Linux kernel driver written in C which is passing many aggregates by value).
>
> Ah, all in the same full-expression?  Then yeah, my idea wouldn't be good enough to optimize your situation.

Yeah, here's an example:
https://github.com/torvalds/linux/blob/2cf0f715623872823a72e451243bbf555d10d032/drivers/gpu/drm/amd/display/dc/dml/calcs/dce_calcs.c#L564-L570
These nested function calls take a struct by value and return the same type. The struct contains a single `unsigned long long`.  When targeting 32b ARM, we're kind of hamstrung (I think) by https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#result-return; clang will "unwrap" this aggregate for other targets, but not 32b ARM, I think because of that final bullet point about composite types.  Though, reading the next section https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#parameter-passing, I suspect clang may also be missing out the logic referring to "NCRN."  Perhaps LLVM performs that later, but by not unwrapping the parameters, the omission of lifetime markers is painful.

I'm certain I could rewrite this code to not use a 1-element aggregate, but I think this is a common case of discrepancies where GCC does much better than clang in terms of stack usage. D74094 <https://reviews.llvm.org/D74094> causes a 5x reduction in stack usage for this function in particular for 32b arm targets, so I'd rather target the root issue here which I think is clang.

>> Do I want to create a flag for that? Not particularly; it does mean supporting BOTH codegen patterns; I'd rather have a flag to control the optimization or not do it at all rather than support 2 different optimizations.
>
> The received wisdom here is that it's very helpful to have a flag because it lets users test the effect of different changes independently.  Users mostly don't live on trunk, and LLVM releases are roughly every six months, so when users get a new compiler it's going to have at least six months of accumulated changes in it.  If a project breaks with a new compiler, it's really great to be able to isolate which change is causing the problem without having to rebuild the compiler.
>
> The fact that we have dynamic checking of this from ASan does help a lot, though.

Ok, I will add that, and a release note about it and that ASan can help detect this as well.

>> One thing I noticed, simply printing the expression when we were about to add the lifetime cleanup; I noticed some examples would trigger on CXXConstructExpr, CXXTemporaryObjectExpr, and CXXOperatorCallExpr. Those feel very C++ specific; I do wonder if we could do something differently for C, but I think we should not since as @rjmccall said "we'd have a consistent rule for all types and targets" which IMO is nicer+simpler.
>
> Well, if the language rules are different, they're different.  (More on that in a second.)  As far as I know, the lifetime of a parameter variable in C is never specified as outliving the call, and the entire concept of a full-expression is a C++ism.  We could do something more aggressive in C.

I could only find language in the latest draft about lifetime of a parameter, nothing about lifetime of arguments. See footnote 206 in https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf pdf page 178.

>> Converting the above widget example to C, I noticed that we're also missing the lifetime markers on compound literals!
>
> Yeah, so this is probably because the lifetime of a compound literal is somewhat surprising in C: it's actually the containing block scope.  We could emit lifetime markers, but we'd have to make sure we emitted them in the right place.  In pure C, I think this is only detectable if the user actually takes the address of the literal; unlike C++, C doesn't implicitly expose the address of temporaries or have destructors that allow their lifetime to be directly tested.  But in ObjC, you could have a compound literal with weak/strong references in it, which arguably are required to be destroyed at the close of the enclosing block.

Ack; will track that as a separate TODO.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74094/new/

https://reviews.llvm.org/D74094