[cfe-dev] Smart Pointer Lifetime Optimizations

Mon Jun 8 18:13:58 PDT 2020

On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <cfe-dev at lists.llvm.org>
wrote:

> On 7 Jun 2020, at 18:52, Zoe Carver wrote:
>
> John, sorry for my delayed response.
>
>
> *> Can you explain in more detail which destructor you think you can
> eliminate?*
>
>
> In the Godbolt link in my original post, there are two unique_ptrs, %3 and
> %
> 4. They are both passed to the move constructor (as the source and
> destination). In the move constructor, the source's held pointer is nulled
> out. If the next use of the pointer is the destructor (as is the case in
> the Godbolt example) then, the destructor will be a no-op. In the optimized
> code, it's a bit more difficult to see. But, the dead store is still there.
> Two instructions before the call to owner, null is stored into %8. Then, in
> the 3rd block (%17) that same pointer (%8) is loaded and compared against
> null (which is always true).
>
> Yes, the destructor for the source (the local variable, as opposed to the
> temporary) is the one that you can’t eliminate without proving that a
> reference to it isn’t escaped by the first call and then mutated during the
> second.
>
Concretely: https://godbolt.org/z/4guY_m

Here, both destructors run after the call to 'owner' returns, and they
destroy two different objects. You can't eliminate either.

> You wouldn’t be the first person to be surprised by the result of this sort
> of analysis, but I’m afraid it’s what we’re working with.
>
> Unfortunately, there’s really no way to eliminate this one without either
> interprocedural information or language changes. trivial_abi eliminates
> the other one because it changes the convention for passing by value, but
> to
> pass an “immutably borrowed” value in C++ we have to pass by reference,
> which
> allows the reference to be escaped and accessed (and even mutated, if the
> original object wasn’t declared const) as long as those accesses happen
> before destruction.
>
Perhaps we should expose LLVM's nocapture attribute to the source level?

> *> Probably more importantly, though, we could allow unstable-ness to be
> detected with a type trait, and that would allow the standard library to
> take advantage of it. *
>
>
> We could actually do this for trivial_abi types too. If we added a builtin
> type trait to check if a type has the trivial_abi attribute, libc++ could
> conditionally give unique_ptr the trivial_abi attribute if its base type
> also had the attribute. Additionally, we could add a config macro that
> would do this globally when libc++ is in unstable ABI mode.
>
> Hmm. That doesn’t just fall out from any analysis I see. trivial_abi
> is an existing, ABI-stable attribute, so changing the ABI of
> std::unique_ptr
> for types that are already trivial_abi is just as much of an ABI break
> as changing it in general would be. You could try to justify it by saying
> that there just aren’t very many trivial_abi types yet, or that
> trivial_abi
> is a vendor-specific attribute that’s unlikely to be used on a type with a
> stable ABI because non-Clang implementations wouldn’t be able to compile
> it compatibly, but those aren’t terribly convincing arguments to me.
>
I guess I should finish https://reviews.llvm.org/D63748 at some point.
(Though I think we probably shouldn't enable it in libc++ unstable ABI
configurations by default, since it also changes observable program
semantics due to altering destruction order, and is arguably non-conforming
for the same reason.)

> John.
>
> Best,
>
> Zoe
>
> On Sat, Jun 6, 2020 at 2:07 PM John McCall <rjmccall at apple.com> wrote:
>
> On 6 Jun 2020, at 13:47, Zoe Carver wrote:
>
> John,
>
> Thanks, those are good points. I think we can still remove one of the
> destructors (which could also be done by a more powerful DSE+load
> propagation) but, you're right; one needs to stay.
>
> Can you explain in more detail which destructor you think you can
> eliminate?
>
> This can only be optimized with a more global, interprocedural
>
> optimization that shifts responsibility to owner to destroy its argument.
>
> I'll think about implementing something like this, but I suspect any
> possible optimizations will already happen with inlining and analysis.
>
> Yeah. For the narrow case of std::unique_ptr, since its operations
> are easily inlined and can be easily optimized after copy propagation,
> there’s not much more that can be done at a high level.
>
> Note that trivial_abi (if it could be adopted on std::unique_ptr)
> also changes the ABI to make the callee responsible for destruction.
> So as part of getting a more efficient low-level ABI, you also get a
> more optimizable high-level one.
>
> One idea I’ve personally been kicking around is some way to mark
> declarations as having an “unstable ABI”: basically, a guarantee that
> all the code that uses them will be compiled with a single toolchain,
> and therefore a license for the implementation to alter the ABI however
> it likes with any code that uses any of those declarations.
>
> A type would be unstable if it was composed even partially from a
> declaration marked unstable. So class Unstable would be unstable,
> but so would const Unstable * — and, crucially, so would
> std::unique_ptr<Unstable>. But for soundness reasons, this would
> need to ignore type sugar (so no marking typedefs), and it wouldn’t
> be able to automatically descend into fields.
>
> There are a few ways that we could use that directly in the compiler.
> The big restriction is that you’re not breaking ABI globally and so
> you always need an unstable “contaminant” that permits using the
> unstable ABI. For example, we can’t just change function ABIs
> for all unstable functions because function pointers have to remain
> compatible. On the other hand, programs aren’t allowed to call
> function pointers under the wrong type, so if the function type is
> unstable, we can change anything we want about its ABI.
>
> (For functions specifically, there’s another option: we could emit
> the functions with an unstable ABI and then introduce thunks that
> adapt the calling convention when the address is taken. But that’s
> a non-trivial code-size hit that we might have to do unconditionally.
> It also can’t adapt a callee-destroy ABI into a caller-destroy one
> without introducing an extra move, which isn’t necessarily semantically
> allowed.)
>
> Probably more importantly, though, we could allow unstable-ness to
> be detected with a type trait, and that would allow the standard
> library to take advantage of it. So std::unique_ptr<int> would
> be stuck with the stable ABI, but std::unique_ptr<Unstable> could
> switch to be trivial_abi.
>
> That does leave the problem of actually doing the annotation.
> Adding an attribute to every class is probably beyond what people
> would accept. There are several ways to do mass annotation. Pragmas
> are problematic because you don’t want to accidentally leave the
> pragma on when you exit a file and then have it cover a system
> include. We do have some pragmas that prevent file changes while
> the pragma is active, which is a decent solution for that problem.
> An alternative is to mark namespaces. That probably needs to be
> lexical: that is, you wouldn’t be able to mark the entire clang
> namespace, you would mark a specific namespace clang declaration
> in a single header. But that’s still much more manageable, and
> after all, the cost to missing an annotation is just a missed
> optimization.
>
> We could also implicitly make all anonymous-namespace declarations
> unstable.
>
> John.
>
> Thanks for the response,
> Zoe
>
> On Fri, Jun 5, 2020 at 1:09 PM John McCall <rjmccall at apple.com> wrote:
>
> On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:
>
> Hello all,
>
>
> I'm planning to do some work to add lifetime optimization passes for smart
> pointers and reference-counted objects. I'll use this email as a sort of
> proposal for what I hope to do.
>
>
> *Scope*
>
>
> As I'm developing the pass, I'm trying to keep it general and create
> utilities that could work across multiple smart pointers. But, right now,
> I'm focussing on unique_ptr and applying specific ownership optimizations
> to
> unique_ptr only.
>
>
> *unique_ptr Optimzations*
>
>
> The pass I'm currently developing adds a single, simple, optimization:
> constant fold the destructor based on ownership information. unique_ptr has
> a lot of ownership information communicated with reference semantics. When
> a
> unique_ptr is moved into another function, that function takes over
> ownership of the unique_ptr, and subsequent destructors can be eliminated
> (because they will be no-ops). Otherwise, branchless functions are often
> complicated after inlining unique_ptr's destructor so, this optimization
> should be fairly beneficial.
>
>
> unique_ptr's reset and release methods both complicate this optimization a
> bit. Because they are also able to transfer and remove ownership, all
> unknown instructions must be ignored. However, in the future, knowledge of
> those methods might be able to make the pass more robust.
>
>
> With unique_ptr, it's difficult to prove liveness. So, it is hard to
> constant fold the destructor call to always be there. Maybe in the future,
> this would be possible, though (with sufficient analysis).
>
>
> Last, an optimization that I hope to do is lowering the unique_ptr to a raw
> pointer if all lifetime paths are known. I think removing this layer of
> abstraction would make it easier for other optimization passes to be
> successful. Eventually, we may even be able to specialize functions that
> used to take a unique_ptr to now take a raw pointer, if the argument's
> lifetime was also able to be fully analyzed.
>
>
> *Lifetime Annotations*
>
>
> Right now, the pass relies on (pre-inlined) function calls to generate
> ownership information. Another approach would be to add ownership
> annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).
>
>
> *ARC Optimizations*
>
>
> There are a huge number of large and small ARC optimizations already in
> LLVM. For unique_ptr specifically, I'm not sure these are of any benefit
> because unique_ptr doesn't actually do any reference counting. But, later
> on, when I start working on generalizing this pass to support more smart
> pointers (specifically shared_ptr) I think the ARC optimization pass, and
> especially the utilities it contains, could be very beneficial. If anyone
> has experience with ARC optimizations, I'd love to hear your thoughts on
> extending them to other reference counted objects.
>
>
> *trivial_abi and Hidden References*
>
>
> Arthur O'Dwyer made a good point, which is that a lot of these
> optimizations can be applied when with the trivial_abi attribute. However,
> given that's not a standard attribute and these optimizations only *happen*
> to work with trivial_abi (i.e., in a more complicated program, they may not
> continue to work). I think lifetime utilities and specific lifetime
> optimization passes are still beneficial (especially if they can be applied
> to other smart pointers in the future).
>
>
> Because all smart pointers have non-trivial destructors, they are always
> passed by hidden references. With unique_ptr, this is as simple as
> bit-casting the pointer member to unique_ptr, which would allow for it to
> be lowered to a single raw pointer instead of a stack-allocated object.
> Even without the trival_abi attribute, I think this is an optimization that
> could be done.
>
>
> *Results*
>
>
> Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart
> pointer lifetime optimizations pass. <https://reviews.llvm.org/D81288>
>
> For reference, here are the before and after results:
>
> Clang trunk (four branches): Compiler Explorer
> <https://godbolt.org/z/bsJFty>
>
> With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru
>
> Unfortunately, these are not legal optimizations for your test case:
>
> -
>
> guaranteed is permitted to escape a reference (or pointer) to the
> object it was passed. Tat references and pointers remain valid
> until the object goes out of scope.
> -
>
> The object can be mutated through that reference because the underlying
> object is not const. Being passed a const reference is not a
> semantic contract in C++.
> -
>
> Through a combination of the above, the call to owner may change
> the value of p, and so the caller may not rely on it still being
> in a trivially-destructible state after that call.
> -
>
> owner may leave the value of its parameter object in a
> non-trivially-destructible state, and under the Itanium C++ ABI,
> cleaning
> up that object is the caller’s responsibility. I agree that this is a
> bad rule for optimization purposes, but it’s the rule. This can only be
> optimized with a more global, interprocedural optimization that shifts
> responsibility to owner to destroy its argument.
>
> John.
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20200608/2c46cf84/attachment-0001.html>