[cfe-dev] Smart Pointer Lifetime Optimizations
    John McCall via cfe-dev 
    cfe-dev at lists.llvm.org
       
    Mon Jun  8 20:09:21 PDT 2020
    
    
  
On 8 Jun 2020, at 22:52, John McCall wrote:
> On 8 Jun 2020, at 21:13, Richard Smith wrote:
>> On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev 
>> <cfe-dev at lists.llvm.org>
>> wrote:
>>> *> Probably more importantly, though, we could allow unstable-ness 
>>> to be
>>> detected with a type trait, and that would allow the standard 
>>> library to
>>> take advantage of it. *
>>>
>>>
>>> We could actually do this for trivial_abi types too. If we added a 
>>> builtin
>>> type trait to check if a type has the trivial_abi attribute, libc++ 
>>> could
>>> conditionally give unique_ptr the trivial_abi attribute if its base 
>>> type
>>> also had the attribute. Additionally, we could add a config macro 
>>> that
>>> would do this globally when libc++ is in unstable ABI mode.
>>>
>>> Hmm. That doesn’t just fall out from any analysis I see. 
>>> trivial_abi
>>> is an existing, ABI-stable attribute, so changing the ABI of
>>> std::unique_ptr
>>> for types that are already trivial_abi is just as much of an ABI 
>>> break
>>> as changing it in general would be. You could try to justify it by 
>>> saying
>>> that there just aren’t very many trivial_abi types yet, or that
>>> trivial_abi
>>> is a vendor-specific attribute that’s unlikely to be used on a 
>>> type with a
>>> stable ABI because non-Clang implementations wouldn’t be able to 
>>> compile
>>> it compatibly, but those aren’t terribly convincing arguments to 
>>> me.
>>>
>> I guess I should finish https://reviews.llvm.org/D63748 at some 
>> point.
>> (Though I think we probably shouldn't enable it in libc++ unstable 
>> ABI
>> configurations by default, since it also changes observable program
>> semantics due to altering destruction order, and is arguably 
>> non-conforming
>> for the same reason.)
>
> It definitely changes observable semantics, but it’s not *obviously*
> non-conforming; [expr.call]p7 gives us a lot of flexibility here:
>
>   It is implementation-defined whether the lifetime of a parameter
>   ends when the function in which it is defined returns or at the
>   end of the enclosing full-expression.
>
> And note that MSVC has traditionally destroyed parameters in the 
> callee.
> IIRC the standard actually originally specified that parameters were
> always destroyed at the end of the call and only changed it due to
> Itanium doing otherwise.
>
> Now, it’s possible that the copy-elision rules have an unfortunate
> impact here.  IIRC an object initialized with an elided copy is 
> supposed
> to take on the longer of the two natural lifetimes.  Does that mean 
> that
> if you have a parameter initialized by an elided copy from a 
> temporary,
> the parameter needs to live until the end of the calling 
> full-expression
> like the temporary would have?  If so, you either wouldn’t be able 
> to
> use a callee-destroy ABI or you wouldn’t be allowed to elide copies
> into parameters, and the latter seems unacceptable.
>
> Even if it’s conforming, I’m sure there are bugs about e.g. the 
> proper
> ordering with things like function-try-blocks and exception 
> specifications.
If you were just thinking about [class.temporary]’s strict order
rules for non-extended temporaries, though, I don’t think parameters
are actually temporaries for this purpose.  There’s discussion of
temporaries for parameters and return values in [class.temporary],
but only for (a particular sense of) trivially-copyable types, and
it’s only a formalism for allowing the objects to be passed directly.
John.
>
> John.
>
>>
>>> John.
>>>
>>> Best,
>>>
>>> Zoe
>>>
>>> On Sat, Jun 6, 2020 at 2:07 PM John McCall <rjmccall at apple.com> 
>>> wrote:
>>>
>>> On 6 Jun 2020, at 13:47, Zoe Carver wrote:
>>>
>>> John,
>>>
>>> Thanks, those are good points. I think we can still remove one of 
>>> the
>>> destructors (which could also be done by a more powerful DSE+load
>>> propagation) but, you're right; one needs to stay.
>>>
>>> Can you explain in more detail which destructor you think you can
>>> eliminate?
>>>
>>> This can only be optimized with a more global, interprocedural
>>>
>>> optimization that shifts responsibility to owner to destroy its 
>>> argument.
>>>
>>> I'll think about implementing something like this, but I suspect any
>>> possible optimizations will already happen with inlining and 
>>> analysis.
>>>
>>> Yeah. For the narrow case of std::unique_ptr, since its operations
>>> are easily inlined and can be easily optimized after copy 
>>> propagation,
>>> there’s not much more that can be done at a high level.
>>>
>>> Note that trivial_abi (if it could be adopted on std::unique_ptr)
>>> also changes the ABI to make the callee responsible for destruction.
>>> So as part of getting a more efficient low-level ABI, you also get a
>>> more optimizable high-level one.
>>>
>>> One idea I’ve personally been kicking around is some way to mark
>>> declarations as having an “unstable ABI”: basically, a guarantee 
>>> that
>>> all the code that uses them will be compiled with a single 
>>> toolchain,
>>> and therefore a license for the implementation to alter the ABI 
>>> however
>>> it likes with any code that uses any of those declarations.
>>>
>>> A type would be unstable if it was composed even partially from a
>>> declaration marked unstable. So class Unstable would be unstable,
>>> but so would const Unstable * — and, crucially, so would
>>> std::unique_ptr<Unstable>. But for soundness reasons, this would
>>> need to ignore type sugar (so no marking typedefs), and it 
>>> wouldn’t
>>> be able to automatically descend into fields.
>>>
>>> There are a few ways that we could use that directly in the 
>>> compiler.
>>> The big restriction is that you’re not breaking ABI globally and 
>>> so
>>> you always need an unstable “contaminant” that permits using the
>>> unstable ABI. For example, we can’t just change function ABIs
>>> for all unstable functions because function pointers have to remain
>>> compatible. On the other hand, programs aren’t allowed to call
>>> function pointers under the wrong type, so if the function type is
>>> unstable, we can change anything we want about its ABI.
>>>
>>> (For functions specifically, there’s another option: we could emit
>>> the functions with an unstable ABI and then introduce thunks that
>>> adapt the calling convention when the address is taken. But that’s
>>> a non-trivial code-size hit that we might have to do 
>>> unconditionally.
>>> It also can’t adapt a callee-destroy ABI into a caller-destroy one
>>> without introducing an extra move, which isn’t necessarily 
>>> semantically
>>> allowed.)
>>>
>>> Probably more importantly, though, we could allow unstable-ness to
>>> be detected with a type trait, and that would allow the standard
>>> library to take advantage of it. So std::unique_ptr<int> would
>>> be stuck with the stable ABI, but std::unique_ptr<Unstable> could
>>> switch to be trivial_abi.
>>>
>>> That does leave the problem of actually doing the annotation.
>>> Adding an attribute to every class is probably beyond what people
>>> would accept. There are several ways to do mass annotation. Pragmas
>>> are problematic because you don’t want to accidentally leave the
>>> pragma on when you exit a file and then have it cover a system
>>> include. We do have some pragmas that prevent file changes while
>>> the pragma is active, which is a decent solution for that problem.
>>> An alternative is to mark namespaces. That probably needs to be
>>> lexical: that is, you wouldn’t be able to mark the entire clang
>>> namespace, you would mark a specific namespace clang declaration
>>> in a single header. But that’s still much more manageable, and
>>> after all, the cost to missing an annotation is just a missed
>>> optimization.
>>>
>>> We could also implicitly make all anonymous-namespace declarations
>>> unstable.
>>>
>>> John.
>>>
>>> Thanks for the response,
>>> Zoe
>>>
>>> On Fri, Jun 5, 2020 at 1:09 PM John McCall <rjmccall at apple.com> 
>>> wrote:
>>>
>>> On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:
>>>
>>> Hello all,
>>>
>>>
>>> I'm planning to do some work to add lifetime optimization passes for 
>>> smart
>>> pointers and reference-counted objects. I'll use this email as a 
>>> sort of
>>> proposal for what I hope to do.
>>>
>>>
>>> *Scope*
>>>
>>>
>>> As I'm developing the pass, I'm trying to keep it general and create
>>> utilities that could work across multiple smart pointers. But, right 
>>> now,
>>> I'm focussing on unique_ptr and applying specific ownership 
>>> optimizations
>>> to
>>> unique_ptr only.
>>>
>>>
>>> *unique_ptr Optimzations*
>>>
>>>
>>> The pass I'm currently developing adds a single, simple, 
>>> optimization:
>>> constant fold the destructor based on ownership information. 
>>> unique_ptr has
>>> a lot of ownership information communicated with reference 
>>> semantics. When
>>> a
>>> unique_ptr is moved into another function, that function takes over
>>> ownership of the unique_ptr, and subsequent destructors can be 
>>> eliminated
>>> (because they will be no-ops). Otherwise, branchless functions are 
>>> often
>>> complicated after inlining unique_ptr's destructor so, this 
>>> optimization
>>> should be fairly beneficial.
>>>
>>>
>>> unique_ptr's reset and release methods both complicate this 
>>> optimization a
>>> bit. Because they are also able to transfer and remove ownership, 
>>> all
>>> unknown instructions must be ignored. However, in the future, 
>>> knowledge of
>>> those methods might be able to make the pass more robust.
>>>
>>>
>>> With unique_ptr, it's difficult to prove liveness. So, it is hard to
>>> constant fold the destructor call to always be there. Maybe in the 
>>> future,
>>> this would be possible, though (with sufficient analysis).
>>>
>>>
>>> Last, an optimization that I hope to do is lowering the unique_ptr 
>>> to a raw
>>> pointer if all lifetime paths are known. I think removing this layer 
>>> of
>>> abstraction would make it easier for other optimization passes to be
>>> successful. Eventually, we may even be able to specialize functions 
>>> that
>>> used to take a unique_ptr to now take a raw pointer, if the 
>>> argument's
>>> lifetime was also able to be fully analyzed.
>>>
>>>
>>> *Lifetime Annotations*
>>>
>>>
>>> Right now, the pass relies on (pre-inlined) function calls to 
>>> generate
>>> ownership information. Another approach would be to add ownership
>>> annotations, such as the lifetime intrinsics (i.e. 
>>> llvm.lifetime.start).
>>>
>>>
>>> *ARC Optimizations*
>>>
>>>
>>> There are a huge number of large and small ARC optimizations already 
>>> in
>>> LLVM. For unique_ptr specifically, I'm not sure these are of any 
>>> benefit
>>> because unique_ptr doesn't actually do any reference counting. But, 
>>> later
>>> on, when I start working on generalizing this pass to support more 
>>> smart
>>> pointers (specifically shared_ptr) I think the ARC optimization 
>>> pass, and
>>> especially the utilities it contains, could be very beneficial. If 
>>> anyone
>>> has experience with ARC optimizations, I'd love to hear your 
>>> thoughts on
>>> extending them to other reference counted objects.
>>>
>>>
>>> *trivial_abi and Hidden References*
>>>
>>>
>>> Arthur O'Dwyer made a good point, which is that a lot of these
>>> optimizations can be applied when with the trivial_abi attribute. 
>>> However,
>>> given that's not a standard attribute and these optimizations only 
>>> *happen*
>>> to work with trivial_abi (i.e., in a more complicated program, they 
>>> may not
>>> continue to work). I think lifetime utilities and specific lifetime
>>> optimization passes are still beneficial (especially if they can be 
>>> applied
>>> to other smart pointers in the future).
>>>
>>>
>>> Because all smart pointers have non-trivial destructors, they are 
>>> always
>>> passed by hidden references. With unique_ptr, this is as simple as
>>> bit-casting the pointer member to unique_ptr, which would allow for 
>>> it to
>>> be lowered to a single raw pointer instead of a stack-allocated 
>>> object.
>>> Even without the trival_abi attribute, I think this is an 
>>> optimization that
>>> could be done.
>>>
>>>
>>> *Results*
>>>
>>>
>>> Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt 
>>> Smart
>>> pointer lifetime optimizations pass. 
>>> <https://reviews.llvm.org/D81288>
>>>
>>> For reference, here are the before and after results:
>>>
>>> Clang trunk (four branches): Compiler Explorer
>>> <https://godbolt.org/z/bsJFty>
>>>
>>> With optimizations (branchless): https://pastebin.com/raw/mQ2r6pru
>>>
>>> Unfortunately, these are not legal optimizations for your test case:
>>>
>>> -
>>>
>>> guaranteed is permitted to escape a reference (or pointer) to the
>>> object it was passed. Tat references and pointers remain valid
>>> until the object goes out of scope.
>>> -
>>>
>>> The object can be mutated through that reference because the 
>>> underlying
>>> object is not const. Being passed a const reference is not a
>>> semantic contract in C++.
>>> -
>>>
>>> Through a combination of the above, the call to owner may change
>>> the value of p, and so the caller may not rely on it still being
>>> in a trivially-destructible state after that call.
>>> -
>>>
>>> owner may leave the value of its parameter object in a
>>> non-trivially-destructible state, and under the Itanium C++ ABI,
>>> cleaning
>>> up that object is the caller’s responsibility. I agree that this 
>>> is a
>>> bad rule for optimization purposes, but it’s the rule. This can 
>>> only be
>>> optimized with a more global, interprocedural optimization that 
>>> shifts
>>> responsibility to owner to destroy its argument.
>>>
>>> John.
>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
    
    
More information about the cfe-dev
mailing list