<!DOCTYPE html>

<html>

<head>

<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">

</head>

<body>

<div style="font-family:sans-serif"><div style="white-space:normal">

<p dir="auto">On 8 Jun 2020, at 21:13, Richard Smith wrote:</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">On Mon, 8 Jun 2020 at 00:22, John McCall via cfe-dev <cfe-dev@lists.llvm.org><br>

wrote:</p>

<blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">You wouldn’t be the first person to be surprised by the result of this sort<br>

of analysis, but I’m afraid it’s what we’re working with.<br>

<br>

Unfortunately, there’s really no way to eliminate this one without either<br>

interprocedural information or language changes. trivial_abi eliminates<br>

the other one because it changes the convention for passing by value, but<br>

to<br>

pass an “immutably borrowed” value in C++ we have to pass by reference,<br>

which<br>

allows the reference to be escaped and accessed (and even mutated, if the<br>

original object wasn’t declared const) as long as those accesses happen<br>

before destruction.<br>

</p>

</blockquote><p dir="auto">Perhaps we should expose LLVM's nocapture attribute to the source level?</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">I think we have with <code style="background-color:#F7F7F7; border-radius:3px; margin:0; padding:0 0.4em" bgcolor="#F7F7F7">__attribute__((noescape))</code>.  Of course, adopting it<br>

systematically would be hugely invasive.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">*> Probably more importantly, though, we could allow unstable-ness to be<br>

detected with a type trait, and that would allow the standard library to<br>

take advantage of it. *<br>

<br>

<br>

We could actually do this for trivial_abi types too. If we added a builtin<br>

type trait to check if a type has the trivial_abi attribute, libc++ could<br>

conditionally give unique_ptr the trivial_abi attribute if its base type<br>

also had the attribute. Additionally, we could add a config macro that<br>

would do this globally when libc++ is in unstable ABI mode.<br>

<br>

Hmm. That doesn’t just fall out from any analysis I see. trivial_abi<br>

is an existing, ABI-stable attribute, so changing the ABI of<br>

std::unique_ptr<br>

for types that are already trivial_abi is just as much of an ABI break<br>

as changing it in general would be. You could try to justify it by saying<br>

that there just aren’t very many trivial_abi types yet, or that<br>

trivial_abi<br>

is a vendor-specific attribute that’s unlikely to be used on a type with a<br>

stable ABI because non-Clang implementations wouldn’t be able to compile<br>

it compatibly, but those aren’t terribly convincing arguments to me.<br>

</p>

</blockquote><p dir="auto">I guess I should finish <a href="https://reviews.llvm.org/D63748" style="color:#777">https://reviews.llvm.org/D63748</a> at some point.<br>

(Though I think we probably shouldn't enable it in libc++ unstable ABI<br>

configurations by default, since it also changes observable program<br>

semantics due to altering destruction order, and is arguably non-conforming<br>

for the same reason.)</p>

</blockquote></div>

<div style="white-space:normal">


<p dir="auto">It definitely changes observable semantics, but it’s not <em>obviously</em><br>

non-conforming; [expr.call]p7 gives us a lot of flexibility here:</p>


<p dir="auto">It is implementation-defined whether the lifetime of a parameter<br>

  ends when the function in which it is defined returns or at the<br>

  end of the enclosing full-expression.</p>


<p dir="auto">And note that MSVC has traditionally destroyed parameters in the callee.<br>

IIRC the standard actually originally specified that parameters were<br>

always destroyed at the end of the call and only changed it due to<br>

Itanium doing otherwise.</p>


<p dir="auto">Now, it’s possible that the copy-elision rules have an unfortunate<br>

impact here.  IIRC an object initialized with an elided copy is supposed<br>

to take on the longer of the two natural lifetimes.  Does that mean that<br>

if you have a parameter initialized by an elided copy from a temporary,<br>

the parameter needs to live until the end of the calling full-expression<br>

like the temporary would have?  If so, you either wouldn’t be able to<br>

use a callee-destroy ABI or you wouldn’t be allowed to elide copies<br>

into parameters, and the latter seems unacceptable.</p>


<p dir="auto">Even if it’s conforming, I’m sure there are bugs about e.g. the proper<br>

ordering with things like function-try-blocks and exception specifications.</p>


<p dir="auto">John.</p>


</div>

<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><blockquote style="border-left:2px solid #777; color:#999; margin:0 0 5px; padding-left:5px; border-left-color:#999"><p dir="auto">John.<br>

<br>

Best,<br>

<br>

Zoe<br>

<br>

On Sat, Jun 6, 2020 at 2:07 PM John McCall <rjmccall@apple.com> wrote:<br>

<br>

On 6 Jun 2020, at 13:47, Zoe Carver wrote:<br>

<br>

John,<br>

<br>

Thanks, those are good points. I think we can still remove one of the<br>

destructors (which could also be done by a more powerful DSE+load<br>

propagation) but, you're right; one needs to stay.<br>

<br>

Can you explain in more detail which destructor you think you can<br>

eliminate?<br>

<br>

This can only be optimized with a more global, interprocedural<br>

<br>

optimization that shifts responsibility to owner to destroy its argument.<br>

<br>

I'll think about implementing something like this, but I suspect any<br>

possible optimizations will already happen with inlining and analysis.<br>

<br>

Yeah. For the narrow case of std::unique_ptr, since its operations<br>

are easily inlined and can be easily optimized after copy propagation,<br>

there’s not much more that can be done at a high level.<br>

<br>

Note that trivial_abi (if it could be adopted on std::unique_ptr)<br>

also changes the ABI to make the callee responsible for destruction.<br>

So as part of getting a more efficient low-level ABI, you also get a<br>

more optimizable high-level one.<br>

<br>

One idea I’ve personally been kicking around is some way to mark<br>

declarations as having an “unstable ABI”: basically, a guarantee that<br>

all the code that uses them will be compiled with a single toolchain,<br>

and therefore a license for the implementation to alter the ABI however<br>

it likes with any code that uses any of those declarations.<br>

<br>

A type would be unstable if it was composed even partially from a<br>

declaration marked unstable. So class Unstable would be unstable,<br>

but so would const Unstable * — and, crucially, so would<br>

std::unique_ptr<Unstable>. But for soundness reasons, this would<br>

need to ignore type sugar (so no marking typedefs), and it wouldn’t<br>

be able to automatically descend into fields.<br>

<br>

There are a few ways that we could use that directly in the compiler.<br>

The big restriction is that you’re not breaking ABI globally and so<br>

you always need an unstable “contaminant” that permits using the<br>

unstable ABI. For example, we can’t just change function ABIs<br>

for all unstable functions because function pointers have to remain<br>

compatible. On the other hand, programs aren’t allowed to call<br>

function pointers under the wrong type, so if the function type is<br>

unstable, we can change anything we want about its ABI.<br>

<br>

(For functions specifically, there’s another option: we could emit<br>

the functions with an unstable ABI and then introduce thunks that<br>

adapt the calling convention when the address is taken. But that’s<br>

a non-trivial code-size hit that we might have to do unconditionally.<br>

It also can’t adapt a callee-destroy ABI into a caller-destroy one<br>

without introducing an extra move, which isn’t necessarily semantically<br>

allowed.)<br>

<br>

Probably more importantly, though, we could allow unstable-ness to<br>

be detected with a type trait, and that would allow the standard<br>

library to take advantage of it. So std::unique_ptr<int> would<br>

be stuck with the stable ABI, but std::unique_ptr<Unstable> could<br>

switch to be trivial_abi.<br>

<br>

That does leave the problem of actually doing the annotation.<br>

Adding an attribute to every class is probably beyond what people<br>

would accept. There are several ways to do mass annotation. Pragmas<br>

are problematic because you don’t want to accidentally leave the<br>

pragma on when you exit a file and then have it cover a system<br>

include. We do have some pragmas that prevent file changes while<br>

the pragma is active, which is a decent solution for that problem.<br>

An alternative is to mark namespaces. That probably needs to be<br>

lexical: that is, you wouldn’t be able to mark the entire clang<br>

namespace, you would mark a specific namespace clang declaration<br>

in a single header. But that’s still much more manageable, and<br>

after all, the cost to missing an annotation is just a missed<br>

optimization.<br>

<br>

We could also implicitly make all anonymous-namespace declarations<br>

unstable.<br>

<br>

John.<br>

<br>

Thanks for the response,<br>

Zoe<br>

<br>

On Fri, Jun 5, 2020 at 1:09 PM John McCall <rjmccall@apple.com> wrote:<br>

<br>

On 5 Jun 2020, at 14:45, Zoe Carver via cfe-dev wrote:<br>

<br>

Hello all,<br>

<br>

<br>

I'm planning to do some work to add lifetime optimization passes for smart<br>

pointers and reference-counted objects. I'll use this email as a sort of<br>

proposal for what I hope to do.<br>

<br>

<br>

*Scope*<br>

<br>

<br>

As I'm developing the pass, I'm trying to keep it general and create<br>

utilities that could work across multiple smart pointers. But, right now,<br>

I'm focussing on unique_ptr and applying specific ownership optimizations<br>

to<br>

unique_ptr only.<br>

<br>

<br>

*unique_ptr Optimzations*<br>

<br>

<br>

The pass I'm currently developing adds a single, simple, optimization:<br>

constant fold the destructor based on ownership information. unique_ptr has<br>

a lot of ownership information communicated with reference semantics. When<br>

a<br>

unique_ptr is moved into another function, that function takes over<br>

ownership of the unique_ptr, and subsequent destructors can be eliminated<br>

(because they will be no-ops). Otherwise, branchless functions are often<br>

complicated after inlining unique_ptr's destructor so, this optimization<br>

should be fairly beneficial.<br>

<br>

<br>

unique_ptr's reset and release methods both complicate this optimization a<br>

bit. Because they are also able to transfer and remove ownership, all<br>

unknown instructions must be ignored. However, in the future, knowledge of<br>

those methods might be able to make the pass more robust.<br>

<br>

<br>

With unique_ptr, it's difficult to prove liveness. So, it is hard to<br>

constant fold the destructor call to always be there. Maybe in the future,<br>

this would be possible, though (with sufficient analysis).<br>

<br>

<br>

Last, an optimization that I hope to do is lowering the unique_ptr to a raw<br>

pointer if all lifetime paths are known. I think removing this layer of<br>

abstraction would make it easier for other optimization passes to be<br>

successful. Eventually, we may even be able to specialize functions that<br>

used to take a unique_ptr to now take a raw pointer, if the argument's<br>

lifetime was also able to be fully analyzed.<br>

<br>

<br>

*Lifetime Annotations*<br>

<br>

<br>

Right now, the pass relies on (pre-inlined) function calls to generate<br>

ownership information. Another approach would be to add ownership<br>

annotations, such as the lifetime intrinsics (i.e. llvm.lifetime.start).<br>

<br>

<br>

*ARC Optimizations*<br>

<br>

<br>

There are a huge number of large and small ARC optimizations already in<br>

LLVM. For unique_ptr specifically, I'm not sure these are of any benefit<br>

because unique_ptr doesn't actually do any reference counting. But, later<br>

on, when I start working on generalizing this pass to support more smart<br>

pointers (specifically shared_ptr) I think the ARC optimization pass, and<br>

especially the utilities it contains, could be very beneficial. If anyone<br>

has experience with ARC optimizations, I'd love to hear your thoughts on<br>

extending them to other reference counted objects.<br>

<br>

<br>

*trivial_abi and Hidden References*<br>

<br>

<br>

Arthur O'Dwyer made a good point, which is that a lot of these<br>

optimizations can be applied when with the trivial_abi attribute. However,<br>

given that's not a standard attribute and these optimizations only *happen*<br>

to work with trivial_abi (i.e., in a more complicated program, they may not<br>

continue to work). I think lifetime utilities and specific lifetime<br>

optimization passes are still beneficial (especially if they can be applied<br>

to other smart pointers in the future).<br>

<br>

<br>

Because all smart pointers have non-trivial destructors, they are always<br>

passed by hidden references. With unique_ptr, this is as simple as<br>

bit-casting the pointer member to unique_ptr, which would allow for it to<br>

be lowered to a single raw pointer instead of a stack-allocated object.<br>

Even without the trival_abi attribute, I think this is an optimization that<br>

could be done.<br>

<br>

<br>

*Results*<br>

<br>

<br>

Here's the unique_ptr pass I've been talking about: ⚙ D81288 Opt Smart<br>

pointer lifetime optimizations pass. <<a href="https://reviews.llvm.org/D81288" style="color:#999">https://reviews.llvm.org/D81288</a>><br>

<br>

For reference, here are the before and after results:<br>

<br>

Clang trunk (four branches): Compiler Explorer<br>

<<a href="https://godbolt.org/z/bsJFty" style="color:#999">https://godbolt.org/z/bsJFty</a>><br>

<br>

With optimizations (branchless): <a href="https://pastebin.com/raw/mQ2r6pru" style="color:#999">https://pastebin.com/raw/mQ2r6pru</a><br>

<br>

Unfortunately, these are not legal optimizations for your test case:<br>

<br>

-<br>

<br>

guaranteed is permitted to escape a reference (or pointer) to the<br>

object it was passed. Tat references and pointers remain valid<br>

until the object goes out of scope.<br>

-<br>

<br>

The object can be mutated through that reference because the underlying<br>

object is not const. Being passed a const reference is not a<br>

semantic contract in C++.<br>

-<br>

<br>

Through a combination of the above, the call to owner may change<br>

the value of p, and so the caller may not rely on it still being<br>

in a trivially-destructible state after that call.<br>

-<br>

<br>

owner may leave the value of its parameter object in a<br>

non-trivially-destructible state, and under the Itanium C++ ABI,<br>

cleaning<br>

up that object is the caller’s responsibility. I agree that this is a<br>

bad rule for optimization purposes, but it’s the rule. This can only be<br>

optimized with a more global, interprocedural optimization that shifts<br>

responsibility to owner to destroy its argument.<br>

<br>

John.<br>

<br>

_______________________________________________<br>

cfe-dev mailing list<br>

cfe-dev@lists.llvm.org<br>

<a href="https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev" style="color:#999">https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev</a><br>

</p>

</blockquote></blockquote></div>

<div style="white-space:normal">

</div>

</div>

</body>

</html>