<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
</head>
<body>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">Hello all,<br>
<br>
<br>
I'm planning to do some work to add lifetime optimization passes for smart<br>
pointers and reference-counted objects. I'll use this email as a sort of<br>
proposal for what I hope to do.<br>
<br>
<br>
*Scope*<br>
<br>
<br>
As I'm developing the pass, I'm trying to keep it general and create<br>
utilities that could work across multiple smart pointers. But, right now,<br>
I'm focussing on unique_ptr and applying specific ownership optimizations to<br>
unique_ptr only.<br>
<br>
<br>
*unique_ptr Optimzations*<br>
<br>
<br>
The pass I'm currently developing adds a single, simple, optimization:<br>
constant fold the destructor based on ownership information. unique_ptr has<br>
a lot of ownership information communicated with reference semantics. When a<br>
unique_ptr is moved into another function, that function takes over<br>
ownership of the unique_ptr, and subsequent destructors can be eliminated<br>
(because they will be no-ops). Otherwise, branchless functions are often<br>
complicated after inlining unique_ptr's destructor so, this optimization<br>
should be fairly beneficial.<br>
<br>
<br>
unique_ptr's reset and release methods both complicate this optimization a<br>
bit. Because they are also able to transfer and remove ownership, all<br>
unknown instructions must be ignored. However, in the future, knowledge of<br>
those methods might be able to make the pass more robust.<br>
<br>
<br>
With unique_ptr, it's difficult to prove liveness. So, it is hard to<br>
constant fold the destructor call to always be there. Maybe in the future,<br>
this would be possible, though (with sufficient analysis).<br>
<br>
<br>
Last, an optimization that I hope to do is lowering the unique_ptr to a raw<br>
pointer if all lifetime paths are known. I think removing this layer of<br>
abstraction would make it easier for other optimization passes to be<br>
successful. Eventually, we may even be able to specialize functions that<br>
used to take a unique_ptr to now take a raw pointer, if the argument's<br>
lifetime was also able to be fully analyzed.<br>
<br>
<br>
*Lifetime Annotations*<br>
<br>
<br>
Right now, the pass relies on (pre-inlined) function calls to generate<br>
ownership information. Another approach would be to add ownership<br>
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).<br>
<br>
<br>
*ARC Optimizations*<br>
<br>
<br>
There are a huge number of large and small ARC optimizations already in<br>
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit<br>
because unique_ptr doesn't actually do any reference counting. But, later<br>
on, when I start working on generalizing this pass to support more smart<br>
pointers (specifically shared_ptr) I think the ARC optimization pass, and<br>
especially the utilities it contains, could be very beneficial. If anyone<br>
has experience with ARC optimizations, I'd love to hear your thoughts on<br>
extending them to other reference counted objects.<br>
<br>
<br>
*trivial_abi and Hidden References*<br>
<br>
<br>
Arthur O'Dwyer made a good point, which is that a lot of these<br>
optimizations can be applied when with the trivial_abi attribute. However,<br>
given that's not a standard attribute and these optimizations only *happen*<br>
to work with trivial_abi (i.e., in a more complicated program, they may not<br>
continue to work). I think lifetime utilities and specific lifetime<br>
optimization passes are still beneficial (especially if they can be applied<br>
to other smart pointers in the future).<br>
<br>
<br>
Because all smart pointers have non-trivial destructors, they are always<br>
passed by hidden references. With unique_ptr, this is as simple as<br>
bit-casting the pointer member to unique_ptr, which would allow for it to<br>
be lowered to a single raw pointer instead of a stack-allocated object.<br>
Even without the trival_abi attribute, I think this is an optimization that<br>
could be done.<br>
<br>
<br>
*Results*<br>
<br>
<br>
Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart<br>
pointer lifetime optimizations pass. <<a href="https://reviews.llvm.org/D81288" style="color:#777">https://reviews.llvm.org/D81288</a>><br>
<br>
For reference, here are the before and after results:<br>
<br>
Clang trunk (four branches): Compiler Explorer<br>
<<a href="https://godbolt.org/z/bsJFty" style="color:#777">https://godbolt.org/z/bsJFty</a>><br>
<br>
With optimizations (branchless): <a href="https://pastebin.com/raw/mQ2r6pru" style="color:#777">https://pastebin.com/raw/mQ2r6pru</a></p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Unfortunately, these are not legal optimizations for your test case:</p>
<ul>
<li><p dir="auto"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">guaranteed</code> is permitted to escape a reference (or pointer) to the<br>
object it was passed. Tat references and pointers remain valid<br>
until the object goes out of scope.</p></li>
<li><p dir="auto">The object can be mutated through that reference because the underlying<br>
object is not <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">const</code>. Being passed a <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">const</code> reference is not a<br>
semantic contract in C++.</p></li>
<li><p dir="auto">Through a combination of the above, the call to <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">owner</code> may change<br>
the value of <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">p</code>, and so the caller may not rely on it still being<br>
in a trivially-destructible state after that call.</p></li>
<li><p dir="auto"><code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">owner</code> may leave the value of its parameter object in a<br>
non-trivially-destructible state, and under the Itanium C++ ABI, cleaning<br>
up that object is the caller’s responsibility. I agree that this is a<br>
bad rule for optimization purposes, but it’s the rule. This can only be<br>
optimized with a more global, interprocedural optimization that shifts<br>
responsibility to <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">owner</code> to destroy its argument.</p></li>
</ul>
<p dir="auto">John.</p>
</div>
</div>
</body>
</html>