<div dir="ltr"><div dir="ltr"><div>Richard and John, thanks for those thoughts and comments.</div><div><br></div><div><div>> Yes, the destructor for the source (the local variable, as opposed to the temporary) is the one that you can’t eliminate without proving that a reference to it isn’t escaped by the first call and then mutated during the second.</div><div><br></div><div>and</div><div><br></div><div>> Here, both destructors run after the call to 'owner' returns, and they destroy two different objects. You can't eliminate either.</div><div><br></div><div>I get it now. The link really helped, thanks, Richard. </div><div><br></div><div>> Perhaps we should expose LLVM's nocapture attribute to the source level?</div><div><br></div><div>This would be very interesting and probably not too difficult to do. I can add it to my backlog of things to investigate. </div><div><br></div><div>> trivial_abi is an existing, ABI-stable attribute, so changing the ABI of std::unique_ptr for types that are already trivial_abi is just as much of an ABI break as changing it in general would be.</div><div><br></div><div>We could make it unstable-ABI only. Or add another config flag to enable it. Looks like Richard's patch takes this a step further anyway, though.</div></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Jun 8, 2020 at 6:14 PM Richard Smith <<a href="mailto:richard@metafoo.co.uk">richard@metafoo.co.uk</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr">On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a>> wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><u></u>
<div>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 7 Jun 2020, at 18:52, Zoe Carver wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">John, sorry for my delayed response.<br>
<br>
<br>
*> Can you explain in more detail which destructor you think you can<br>
eliminate?*<br>
<br>
<br>
In the Godbolt link in my original post, there are two unique_ptrs, %3 and %<br>
4. They are both passed to the move constructor (as the source and<br>
destination). In the move constructor, the source's held pointer is nulled<br>
out. If the next use of the pointer is the destructor (as is the case in<br>
the Godbolt example) then, the destructor will be a no-op. In the optimized<br>
code, it's a bit more difficult to see. But, the dead store is still there.<br>
Two instructions before the call to owner, null is stored into %8. Then, in<br>
the 3rd block (%17) that same pointer (%8) is loaded and compared against<br>
null (which is always true).</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Yes, the destructor for the source (the local variable, as opposed to the<br>
temporary) is the one that you can’t eliminate without proving that a<br>
reference to it isn’t escaped by the first call and then mutated during the<br>
second.</p></div></div></div></blockquote><div>Concretely: <a href="https://godbolt.org/z/4guY_m" target="_blank">https://godbolt.org/z/4guY_m</a></div><div><br></div><div>Here, both destructors run after the call to 'owner' returns, and they destroy two different objects. You can't eliminate either.</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">You wouldn’t be the first person to be surprised by the result of this sort<br>
of analysis, but I’m afraid it’s what we’re working with.</p>
<p dir="auto">Unfortunately, there’s really no way to eliminate this one without either<br>
interprocedural information or language changes. <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">trivial_abi</code> eliminates<br>
the other one because it changes the convention for passing by value, but to<br>
pass an “immutably borrowed” value in C++ we have to pass by reference, which<br>
allows the reference to be escaped and accessed (and even mutated, if the<br>
original object wasn’t declared <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">const</code>) as long as those accesses happen<br>
before destruction.</p></div></div></div></blockquote><div>Perhaps we should expose LLVM's nocapture attribute to the source level? </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div style="font-family:sans-serif"><div style="white-space:normal">
</div>
<div style="white-space:normal"><blockquote style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">*> Probably more importantly, though, we could allow unstable-ness to be<br>
detected with a type trait, and that would allow the standard library to<br>
take advantage of it. *<br>
<br>
<br>
We could actually do this for trivial_abi types too. If we added a builtin<br>
type trait to check if a type has the trivial_abi attribute, libc++ could<br>
conditionally give unique_ptr the trivial_abi attribute if its base type<br>
also had the attribute. Additionally, we could add a config macro that<br>
would do this globally when libc++ is in unstable ABI mode.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">Hmm. That doesn’t just fall out from any analysis I see. <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">trivial_abi</code><br>
is an existing, ABI-stable attribute, so changing the ABI of <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">std::unique_ptr</code><br>
for types that are already <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">trivial_abi</code> is just as much of an ABI break<br>
as changing it in general would be. You could try to justify it by saying<br>
that there just aren’t very many <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">trivial_abi</code> types yet, or that <code style="background-color:rgb(247,247,247);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;margin:0px;padding:0px 0.4em" bgcolor="#F7F7F7">trivial_abi</code><br>
is a vendor-specific attribute that’s unlikely to be used on a type with a<br>
stable ABI because non-Clang implementations wouldn’t be able to compile<br>
it compatibly, but those aren’t terribly convincing arguments to me.</p></div></div></div></blockquote><div>I guess I should finish <a href="https://reviews.llvm.org/D63748" target="_blank">https://reviews.llvm.org/D63748</a> at some point. (Though I think we probably shouldn't enable it in libc++ unstable ABI configurations by default, since it also changes observable program semantics due to altering destruction order, and is arguably non-conforming for the same reason.)</div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">John.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(119,119,119);color:rgb(119,119,119);margin:0px 0px 5px;padding-left:5px"><p dir="auto">Best,<br>
<br>
Zoe<br>
<br>
On Sat, Jun 6, 2020 at 2:07 PM John McCall <<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>> wrote:<br>
</p>
<blockquote style="border-left-width:2px;border-left-style:solid;border-left-color:rgb(153,153,153);color:rgb(153,153,153);margin:0px 0px 5px;padding-left:5px"><p dir="auto">On 6 Jun 2020, at 13:47, Zoe Carver wrote:<br>
<br>
John,<br>
<br>
Thanks, those are good points. I think we can still remove one of the<br>
destructors (which could also be done by a more powerful DSE+load<br>
propagation) but, you're right; one needs to stay.<br>
<br>
Can you explain in more detail which destructor you think you can<br>
eliminate?<br>
<br>
This can only be optimized with a more global, interprocedural<br>
<br>
optimization that shifts responsibility to owner to destroy its argument.<br>
<br>
I'll think about implementing something like this, but I suspect any<br>
possible optimizations will already happen with inlining and analysis.<br>
<br>
Yeah. For the narrow case of std::unique_ptr, since its operations<br>
are easily inlined and can be easily optimized after copy propagation,<br>
there’s not much more that can be done at a high level.<br>
<br>
Note that trivial_abi (if it could be adopted on std::unique_ptr)<br>
also changes the ABI to make the callee responsible for destruction.<br>
So as part of getting a more efficient low-level ABI, you also get a<br>
more optimizable high-level one.<br>
<br>
One idea I’ve personally been kicking around is some way to mark<br>
declarations as having an “unstable ABI”: basically, a guarantee that<br>
all the code that uses them will be compiled with a single toolchain,<br>
and therefore a license for the implementation to alter the ABI however<br>
it likes with any code that uses any of those declarations.<br>
<br>
A type would be unstable if it was composed even partially from a<br>
declaration marked unstable. So class Unstable would be unstable,<br>
but so would const Unstable * — and, crucially, so would<br>
std::unique_ptr<Unstable>. But for soundness reasons, this would<br>
need to ignore type sugar (so no marking typedefs), and it wouldn’t<br>
be able to automatically descend into fields.<br>
<br>
There are a few ways that we could use that directly in the compiler.<br>
The big restriction is that you’re not breaking ABI globally and so<br>
you always need an unstable “contaminant” that permits using the<br>
unstable ABI. For example, we can’t just change function ABIs<br>
for all unstable functions because function pointers have to remain<br>
compatible. On the other hand, programs aren’t allowed to call<br>
function pointers under the wrong type, so if the function type is<br>
unstable, we can change anything we want about its ABI.<br>
<br>
(For functions specifically, there’s another option: we could emit<br>
the functions with an unstable ABI and then introduce thunks that<br>
adapt the calling convention when the address is taken. But that’s<br>
a non-trivial code-size hit that we might have to do unconditionally.<br>
It also can’t adapt a callee-destroy ABI into a caller-destroy one<br>
without introducing an extra move, which isn’t necessarily semantically<br>
allowed.)<br>
<br>
Probably more importantly, though, we could allow unstable-ness to<br>
be detected with a type trait, and that would allow the standard<br>
library to take advantage of it. So std::unique_ptr<int> would<br>
be stuck with the stable ABI, but std::unique_ptr<Unstable> could<br>
switch to be trivial_abi.<br>
<br>
That does leave the problem of actually doing the annotation.<br>
Adding an attribute to every class is probably beyond what people<br>
would accept. There are several ways to do mass annotation. Pragmas<br>
are problematic because you don’t want to accidentally leave the<br>
pragma on when you exit a file and then have it cover a system<br>
include. We do have some pragmas that prevent file changes while<br>
the pragma is active, which is a decent solution for that problem.<br>
An alternative is to mark namespaces. That probably needs to be<br>
lexical: that is, you wouldn’t be able to mark the entire clang<br>
namespace, you would mark a specific namespace clang declaration<br>
in a single header. But that’s still much more manageable, and<br>
after all, the cost to missing an annotation is just a missed<br>
optimization.<br>
<br>
We could also implicitly make all anonymous-namespace declarations<br>
unstable.<br>
<br>
John.<br>
<br>
Thanks for the response,<br>
Zoe<br>
<br>
On Fri, Jun 5, 2020 at 1:09 PM John McCall <<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>> wrote:<br>
<br>
On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:<br>
<br>
Hello all,<br>
<br>
<br>
I'm planning to do some work to add lifetime optimization passes for smart<br>
pointers and reference-counted objects. I'll use this email as a sort of<br>
proposal for what I hope to do.<br>
<br>
<br>
*Scope*<br>
<br>
<br>
As I'm developing the pass, I'm trying to keep it general and create<br>
utilities that could work across multiple smart pointers. But, right now,<br>
I'm focussing on unique_ptr and applying specific ownership optimizations<br>
to<br>
unique_ptr only.<br>
<br>
<br>
*unique_ptr Optimzations*<br>
<br>
<br>
The pass I'm currently developing adds a single, simple, optimization:<br>
constant fold the destructor based on ownership information. unique_ptr has<br>
a lot of ownership information communicated with reference semantics. When<br>
a<br>
unique_ptr is moved into another function, that function takes over<br>
ownership of the unique_ptr, and subsequent destructors can be eliminated<br>
(because they will be no-ops). Otherwise, branchless functions are often<br>
complicated after inlining unique_ptr's destructor so, this optimization<br>
should be fairly beneficial.<br>
<br>
<br>
unique_ptr's reset and release methods both complicate this optimization a<br>
bit. Because they are also able to transfer and remove ownership, all<br>
unknown instructions must be ignored. However, in the future, knowledge of<br>
those methods might be able to make the pass more robust.<br>
<br>
<br>
With unique_ptr, it's difficult to prove liveness. So, it is hard to<br>
constant fold the destructor call to always be there. Maybe in the future,<br>
this would be possible, though (with sufficient analysis).<br>
<br>
<br>
Last, an optimization that I hope to do is lowering the unique_ptr to a raw<br>
pointer if all lifetime paths are known. I think removing this layer of<br>
abstraction would make it easier for other optimization passes to be<br>
successful. Eventually, we may even be able to specialize functions that<br>
used to take a unique_ptr to now take a raw pointer, if the argument's<br>
lifetime was also able to be fully analyzed.<br>
<br>
<br>
*Lifetime Annotations*<br>
<br>
<br>
Right now, the pass relies on (pre-inlined) function calls to generate<br>
ownership information. Another approach would be to add ownership<br>
annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).<br>
<br>
<br>
*ARC Optimizations*<br>
<br>
<br>
There are a huge number of large and small ARC optimizations already in<br>
LLVM. For unique_ptr specifically, I'm not sure these are of any benefit<br>
because unique_ptr doesn't actually do any reference counting. But, later<br>
on, when I start working on generalizing this pass to support more smart<br>
pointers (specifically shared_ptr) I think the ARC optimization pass, and<br>
especially the utilities it contains, could be very beneficial. If anyone<br>
has experience with ARC optimizations, I'd love to hear your thoughts on<br>
extending them to other reference counted objects.<br>
<br>
<br>
*trivial_abi and Hidden References*<br>
<br>
<br>
Arthur O'Dwyer made a good point, which is that a lot of these<br>
optimizations can be applied when with the trivial_abi attribute. However,<br>
given that's not a standard attribute and these optimizations only *happen*<br>
to work with trivial_abi (i.e., in a more complicated program, they may not<br>
continue to work). I think lifetime utilities and specific lifetime<br>
optimization passes are still beneficial (especially if they can be applied<br>
to other smart pointers in the future).<br>
<br>
<br>
Because all smart pointers have non-trivial destructors, they are always<br>
passed by hidden references. With unique_ptr, this is as simple as<br>
bit-casting the pointer member to unique_ptr, which would allow for it to<br>
be lowered to a single raw pointer instead of a stack-allocated object.<br>
Even without the trival_abi attribute, I think this is an optimization that<br>
could be done.<br>
<br>
<br>
*Results*<br>
<br>
<br>
Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart<br>
pointer lifetime optimizations pass. <<a href="https://reviews.llvm.org/D81288" style="color:rgb(153,153,153)" target="_blank">https://reviews.llvm.org/D81288</a>><br>
<br>
For reference, here are the before and after results:<br>
<br>
Clang trunk (four branches): Compiler Explorer<br>
<<a href="https://godbolt.org/z/bsJFty" style="color:rgb(153,153,153)" target="_blank">https://godbolt.org/z/bsJFty</a>><br>
<br>
With optimizations (branchless): <a href="https://pastebin.com/raw/mQ2r6pru" style="color:rgb(153,153,153)" target="_blank">https://pastebin.com/raw/mQ2r6pru</a><br>
<br>
Unfortunately, these are not legal optimizations for your test case:<br>
<br>
-<br>
<br>
guaranteed is permitted to escape a reference (or pointer) to the<br>
object it was passed. Tat references and pointers remain valid<br>
until the object goes out of scope.<br>
-<br>
<br>
The object can be mutated through that reference because the underlying<br>
object is not const. Being passed a const reference is not a<br>
semantic contract in C++.<br>
-<br>
<br>
Through a combination of the above, the call to owner may change<br>
the value of p, and so the caller may not rely on it still being<br>
in a trivially-destructible state after that call.<br>
-<br>
<br>
owner may leave the value of its parameter object in a<br>
non-trivially-destructible state, and under the Itanium C++ ABI,<br>
cleaning<br>
up that object is the caller’s responsibility. I agree that this is a<br>
bad rule for optimization purposes, but it’s the rule. This can only be<br>
optimized with a more global, interprocedural optimization that shifts<br>
responsibility to owner to destroy its argument.<br>
<br>
John.<br>
<br>
</p>
</blockquote></blockquote></div>
<div style="white-space:normal">
</div>
</div>
</div>
_______________________________________________<br>
cfe-dev mailing list<br>
<a href="mailto:cfe-dev@lists.llvm.org" target="_blank">cfe-dev@lists.llvm.org</a><br>
<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" rel="noreferrer" target="_blank">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>
</blockquote></div></div>
</blockquote></div>