[cfe-dev] Smart Pointer Lifetime Optimizations

Mon Jun 8 19:31:24 PDT 2020

Richard and John, thanks for those thoughts and comments.

> Yes, the destructor for the source (the local variable, as opposed to the
temporary) is the one that you can’t eliminate without proving that a
reference to it isn’t escaped by the first call and then mutated during the
second.

and

> Here, both destructors run after the call to 'owner' returns, and they
destroy two different objects. You can't eliminate either.

I get it now. The link really helped, thanks, Richard.

> Perhaps we should expose LLVM's nocapture attribute to the source level?

This would be very interesting and probably not too difficult to do. I can
add it to my backlog of things to investigate.

> trivial_abi is an existing, ABI-stable attribute, so changing the ABI of
std::unique_ptr for types that are already trivial_abi is just as much of
an ABI break as changing it in general would be.

We could make it unstable-ABI only. Or add another config flag to enable
it. Looks like Richard's patch takes this a step further anyway, though.

On Mon, Jun 8, 2020 at 6:14 PM Richard Smith <richard at metafoo.co.uk> wrote:

> On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> On 7 Jun 2020, at 18:52, Zoe Carver wrote:
>>
>> John, sorry for my delayed response.
>>
>>
>> *> Can you explain in more detail which destructor you think you can
>> eliminate?*
>>
>>
>> In the Godbolt link in my original post, there are two unique_ptrs, %3
>> and %
>> 4. They are both passed to the move constructor (as the source and
>> destination). In the move constructor, the source's held pointer is nulled
>> out. If the next use of the pointer is the destructor (as is the case in
>> the Godbolt example) then, the destructor will be a no-op. In the
>> optimized
>> code, it's a bit more difficult to see. But, the dead store is still
>> there.
>> Two instructions before the call to owner, null is stored into %8. Then,
>> in
>> the 3rd block (%17) that same pointer (%8) is loaded and compared against
>> null (which is always true).
>>
>> Yes, the destructor for the source (the local variable, as opposed to the
>> temporary) is the one that you can’t eliminate without proving that a
>> reference to it isn’t escaped by the first call and then mutated during
>> the
>> second.
>>
> Concretely: https://godbolt.org/z/4guY_m
>
> Here, both destructors run after the call to 'owner' returns, and they
> destroy two different objects. You can't eliminate either.
>
>> You wouldn’t be the first person to be surprised by the result of this
>> sort
>> of analysis, but I’m afraid it’s what we’re working with.
>>
>> Unfortunately, there’s really no way to eliminate this one without either
>> interprocedural information or language changes. trivial_abi eliminates
>> the other one because it changes the convention for passing by value, but
>> to
>> pass an “immutably borrowed” value in C++ we have to pass by reference,
>> which
>> allows the reference to be escaped and accessed (and even mutated, if the
>> original object wasn’t declared const) as long as those accesses happen
>> before destruction.
>>
> Perhaps we should expose LLVM's nocapture attribute to the source level?
>
>> *> Probably more importantly, though, we could allow unstable-ness to be
>> detected with a type trait, and that would allow the standard library to
>> take advantage of it. *
>>
>>
>> We could actually do this for trivial_abi types too. If we added a builtin
>> type trait to check if a type has the trivial_abi attribute, libc++ could
>> conditionally give unique_ptr the trivial_abi attribute if its base type
>> also had the attribute. Additionally, we could add a config macro that
>> would do this globally when libc++ is in unstable ABI mode.
>>
>> Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
>> is an existing, ABI-stable attribute, so changing the ABI of
>> std::unique_ptr
>> for types that are already trivial_abi is just as much of an ABI break
>> as changing it in general would be. You could try to justify it by saying
>> that there just aren’t very many trivial_abi types yet, or that
>> trivial_abi
>> is a vendor-specific attribute that’s unlikely to be used on a type with a
>> stable ABI because non-Clang implementations wouldn’t be able to compile
>> it compatibly, but those aren’t terribly convincing arguments to me.
>>
> I guess I should finish https://reviews.llvm.org/D63748 at some point.
> (Though I think we probably shouldn't enable it in libc++ unstable ABI
> configurations by default, since it also changes observable program
> semantics due to altering destruction order, and is arguably non-conforming
> for the same reason.)
>
>> John.
>>
>> Best,
>>
>> Zoe
>>
>> On Sat, Jun 6, 2020 at 2:07 PM John McCall <rjmccall at apple.com> wrote:
>>
>> On 6 Jun 2020, at 13:47, Zoe Carver wrote:
>>
>> John,
>>
>> Thanks, those are good points. I think we can still remove one of the
>> destructors (which could also be done by a more powerful DSE+load
>> propagation) but, you're right; one needs to stay.
>>
>> Can you explain in more detail which destructor you think you can
>> eliminate?
>>
>> This can only be optimized with a more global, interprocedural
>>
>> optimization that shifts responsibility to owner to destroy its argument.
>>
>> I'll think about implementing something like this, but I suspect any
>> possible optimizations will already happen with inlining and analysis.
>>
>> Yeah. For the narrow case of std::unique_ptr, since its operations
>> are easily inlined and can be easily optimized after copy propagation,
>> there’s not much more that can be done at a high level.
>>
>> Note that trivial_abi (if it could be adopted on std::unique_ptr)
>> also changes the ABI to make the callee responsible for destruction.
>> So as part of getting a more efficient low-level ABI, you also get a
>> more optimizable high-level one.
>>
>> One idea I’ve personally been kicking around is some way to mark
>> declarations as having an “unstable ABI”: basically, a guarantee that
>> all the code that uses them will be compiled with a single toolchain,
>> and therefore a license for the implementation to alter the ABI however
>> it likes with any code that uses any of those declarations.
>>
>> A type would be unstable if it was composed even partially from a
>> declaration marked unstable. So class Unstable would be unstable,
>> but so would const Unstable * — and, crucially, so would
>> std::unique_ptr<Unstable>. But for soundness reasons, this would
>> need to ignore type sugar (so no marking typedefs), and it wouldn’t
>> be able to automatically descend into fields.
>>
>> There are a few ways that we could use that directly in the compiler.
>> The big restriction is that you’re not breaking ABI globally and so
>> you always need an unstable “contaminant” that permits using the
>> unstable ABI. For example, we can’t just change function ABIs
>> for all unstable functions because function pointers have to remain
>> compatible. On the other hand, programs aren’t allowed to call
>> function pointers under the wrong type, so if the function type is
>> unstable, we can change anything we want about its ABI.
>>
>> (For functions specifically, there’s another option: we could emit
>> the functions with an unstable ABI and then introduce thunks that
>> adapt the calling convention when the address is taken. But that’s
>> a non-trivial code-size hit that we might have to do unconditionally.
>> It also can’t adapt a callee-destroy ABI into a caller-destroy one
>> without introducing an extra move, which isn’t necessarily semantically
>> allowed.)
>>
>> Probably more importantly, though, we could allow unstable-ness to
>> be detected with a type trait, and that would allow the standard
>> library to take advantage of it. So std::unique_ptr<int> would
>> be stuck with the stable ABI, but std::unique_ptr<Unstable> could
>> switch to be trivial_abi.
>>
>> That does leave the problem of actually doing the annotation.
>> Adding an attribute to every class is probably beyond what people
>> would accept. There are several ways to do mass annotation. Pragmas
>> are problematic because you don’t want to accidentally leave the
>> pragma on when you exit a file and then have it cover a system
>> include. We do have some pragmas that prevent file changes while
>> the pragma is active, which is a decent solution for that problem.
>> An alternative is to mark namespaces. That probably needs to be
>> lexical: that is, you wouldn’t be able to mark the entire clang
>> namespace, you would mark a specific namespace clang declaration
>> in a single header. But that’s still much more manageable, and
>> after all, the cost to missing an annotation is just a missed
>> optimization.
>>
>> We could also implicitly make all anonymous-namespace declarations
>> unstable.
>>
>> John.
>>
>> Thanks for the response,
>> Zoe
>>
>> On Fri, Jun 5, 2020 at 1:09 PM John McCall <rjmccall at apple.com> wrote:
>>
>> On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:
>>
>> Hello all,
>>
>>
>> I'm planning to do some work to add lifetime optimization passes for smart
>> pointers and reference-counted objects. I'll use this email as a sort of
>> proposal for what I hope to do.
>>
>>
>> *Scope*
>>
>>
>> As I'm developing the pass, I'm trying to keep it general and create
>> utilities that could work across multiple smart pointers. But, right now,
>> I'm focussing on unique_ptr and applying specific ownership optimizations
>> to
>> unique_ptr only.
>>
>>
>> *unique_ptr Optimzations*
>>
>>
>> The pass I'm currently developing adds a single, simple, optimization:
>> constant fold the destructor based on ownership information. unique_ptr
>> has
>> a lot of ownership information communicated with reference semantics. When
>> a
>> unique_ptr is moved into another function, that function takes over
>> ownership of the unique_ptr, and subsequent destructors can be eliminated
>> (because they will be no-ops). Otherwise, branchless functions are often
>> complicated after inlining unique_ptr's destructor so, this optimization
>> should be fairly beneficial.
>>
>>
>> unique_ptr's reset and release methods both complicate this optimization a
>> bit. Because they are also able to transfer and remove ownership, all
>> unknown instructions must be ignored. However, in the future, knowledge of
>> those methods might be able to make the pass more robust.
>>
>>
>> With unique_ptr, it's difficult to prove liveness. So, it is hard to
>> constant fold the destructor call to always be there. Maybe in the future,
>> this would be possible, though (with sufficient analysis).
>>
>>
>> Last, an optimization that I hope to do is lowering the unique_ptr to a
>> raw
>> pointer if all lifetime paths are known. I think removing this layer of
>> abstraction would make it easier for other optimization passes to be
>> successful. Eventually, we may even be able to specialize functions that
>> used to take a unique_ptr to now take a raw pointer, if the argument's
>> lifetime was also able to be fully analyzed.
>>
>>
>> *Lifetime Annotations*
>>
>>
>> Right now, the pass relies on (pre-inlined) function calls to generate
>> ownership information. Another approach would be to add ownership
>> annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).
>>
>>
>> *ARC Optimizations*
>>
>>
>> There are a huge number of large and small ARC optimizations already in
>> LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
>> because unique_ptr doesn't actually do any reference counting. But, later
>> on, when I start working on generalizing this pass to support more smart
>> pointers (specifically shared_ptr) I think the ARC optimization pass, and
>> especially the utilities it contains, could be very beneficial. If anyone
>> has experience with ARC optimizations, I'd love to hear your thoughts on
>> extending them to other reference counted objects.
>>
>>
>> *trivial_abi and Hidden References*
>>
>>
>> Arthur O'Dwyer made a good point, which is that a lot of these
>> optimizations can be applied when with the trivial_abi attribute. However,
>> given that's not a standard attribute and these optimizations only
>> *happen*
>> to work with trivial_abi (i.e., in a more complicated program, they may
>> not
>> continue to work). I think lifetime utilities and specific lifetime
>> optimization passes are still beneficial (especially if they can be
>> applied
>> to other smart pointers in the future).
>>
>>
>> Because all smart pointers have non-trivial destructors, they are always
>> passed by hidden references. With unique_ptr, this is as simple as
>> bit-casting the pointer member to unique_ptr, which would allow for it to
>> be lowered to a single raw pointer instead of a stack-allocated object.
>> Even without the trival_abi attribute, I think this is an optimization
>> that
>> could be done.
>>
>>
>> *Results*
>>
>>
>> Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
>> pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>
>>
>> For reference, here are the before and after results:
>>
>> Clang trunk (four branches): Compiler Explorer
>> <https://godbolt.org/z/bsJFty>
>>
>> With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru
>>
>> Unfortunately, these are not legal optimizations for your test case:
>>
>> -
>>
>> guaranteed is permitted to escape a reference (or pointer) to the
>> object it was passed. Tat references and pointers remain valid
>> until the object goes out of scope.
>> -
>>
>> The object can be mutated through that reference because the underlying
>> object is not const. Being passed a const reference is not a
>> semantic contract in C++.
>> -
>>
>> Through a combination of the above, the call to owner may change
>> the value of p, and so the caller may not rely on it still being
>> in a trivially-destructible state after that call.
>> -
>>
>> owner may leave the value of its parameter object in a
>> non-trivially-destructible state, and under the Itanium C++ ABI,
>> cleaning
>> up that object is the caller’s responsibility. I agree that this is a
>> bad rule for optimization purposes, but it’s the rule. This can only be
>> optimized with a more global, interprocedural optimization that shifts
>> responsibility to owner to destroy its argument.
>>
>> John.
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200608/659c00f3/attachment-0001.html>