<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Jul 20, 2015 at 3:50 PM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div class="h5"><blockquote type="cite"><div>On Jul 17, 2015, at 4:56 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:</div><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 17, 2015 at 3:23 PM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div><div><blockquote type="cite"><div>On Jul 17, 2015, at 2:49 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>> wrote:</div><div><div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Fri, Jul 17, 2015 at 2:05 PM, Philip Reames <span dir="ltr"><<a href="mailto:listmail@philipreames.com" target="_blank">listmail@philipreames.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div bgcolor="#FFFFFF" text="#000000"><div><div>
<br>
<br>
<div>On 07/16/2015 02:38 PM, Richard Smith
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Jul 16, 2015 at 2:03 PM, John
McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word">
<div>
<div>
<div>
<blockquote type="cite">
<div>On Jul 16, 2015, at 11:46 AM, Richard Smith
<<a href="mailto:richard@metafoo.co.uk" target="_blank">richard@metafoo.co.uk</a>>
wrote:</div>
<div>
<div dir="ltr">
<div class="gmail_extra">
<div class="gmail_quote">On Thu, Jul 16,
2015 at 11:29 AM, John McCall <span dir="ltr"><<a href="mailto:rjmccall@apple.com" target="_blank">rjmccall@apple.com</a>></span>
wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span>> On
Jul 15, 2015, at 10:11 PM, Hal
Finkel <<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>>
wrote:<br>
><br>
> Hi everyone,<br>
><br>
> C++11 added features that allow
for certain parts of the class
hierarchy to be closed, specifically
the 'final' keyword and the
semantics of anonymous namespaces,
and I think we take advantage of
these to enhance our ability to
perform devirtualization. For
example, given this situation:<br>
><br>
> struct Base {<br>
> virtual void foo() = 0;<br>
> };<br>
><br>
> void external();<br>
> struct Final final : Base {<br>
> void foo() {<br>
> external();<br>
> }<br>
> };<br>
><br>
> void dispatch(Base *B) {<br>
> B->foo();<br>
> }<br>
><br>
> void opportunity(Final *F) {<br>
> dispatch(F);<br>
> }<br>
><br>
> When we optimize this code, we
do the expected thing and inline
'dispatch' into 'opportunity' but we
don't devirtualize the call to
foo(). The fact that we know what
the vtable of F is at that callsite
is not exploited. To a lesser
extent, we can do similar things for
final virtual methods, and derived
classes in anonymous namespaces
(because Clang could determine
whether or not a class (or method)
there is effectively final).<br>
><br>
> One possibility might be to
@llvm.assume to say something about
what the vtable ptr of F might
be/contain should it be needed later
when we emit the initial IR for
'opportunity' (and then teach the
optimizer to use that information),
but I'm not at all sure that's the
best solution. Thoughts?<br>
<br>
</span>The problem with any sort of
@llvm.assume-encoded information about
memory contents is that C++ does
actually allow you to replace objects
in memory, up to and including stuff
like:<br>
<br>
{<br>
MyClass c;<br>
<br>
// Reuse the storage temporarily.
UB to access the object through ‘c’
now.<br>
c.~MyClass();<br>
auto c2 = new (&c)
MyOtherClass();<br>
<br>
// The storage has to contain a
‘MyClass’ when it goes out of scope.<br>
c2->~MyOtherClass();<br>
new (&c) MyClass();<br>
}<br>
<br>
The standard frontend devirtualization
optimizations are permitted under a
couple of different language rules,
specifically that:<br>
1. If you access an object through an
l-value of a type, it has to
dynamically be an object of that type
(potentially a subobject).<br>
2. Object replacement as above only
“forwards” existing formal references
under specific conditions, e.g. the
dynamic type has to be the same,
‘const’ members have to have the same
value, etc. Using an unforwarded
reference (like the name of the local
variable ‘c’ above) doesn’t formally
refer to a valid object and thus has
undefined behavior.<br>
<br>
You can apply those rules much more
broadly than the frontend does, of
course; but those are the language
tools you get.</blockquote>
<div><br>
</div>
<div>Right. Our current plan for
modelling this is:</div>
<div><br>
</div>
<div>1) Change the meaning of the
existing !invariant.load metadata (or
add another parallel metadata kind) so
that it allows load-load forwarding
(even if the memory is not known to be
unmodified between the loads) if:</div>
</div>
</div>
</div>
</div>
</blockquote>
<div><br>
</div>
</div>
</div>
invariant.load currently allows the load to be
reordered pretty aggressively, so I think you need a
new metadata.</div>
</div>
</blockquote>
<div><br>
</div>
<div>Our thoughts were:</div>
<div>1) The existing !invariant.load is redundant because
it's exactly equivalent to a call to @llvm.invariant.start
and a load.</div>
<div>2) The new semantics are a more strict form of the old
semantics, so no special action is required to upgrade old
IR.</div>
<div>... so changing the meaning of the existing metadata
seemed preferable to adding a new,
similar-but-not-quite-identical, form of the metadata. But
either way seems fine.</div>
</div>
</div>
</div>
</blockquote></div></div>
I'm going to argue pretty strongly in favour of the new form of
metadata. We've spent a lot of time getting !invariant.load working
well for use cases like the "length" field in a Java array and I'd
really hate to give that up.<br>
<br>
(One way of framing this is that the current !invariant.load gives a
guarantee that there can't be a @llvm.invariant.end call anywhere in
the program and that any @llvm.invariant.start occurs outside the
visible scope of the compilation unit (Module, LTO, what have you)
and must have executed before any code contained in said module
which can describe the memory location can execute. FYI, that last
bit of strange wording is to allow initialization inside a malloc
like function which returns a noalias pointer.)<br></div></blockquote><div><br></div><div>I had overlooked that !invariant.load also applies for loads /before/ the invariant load. I agree that this is different both from what we're proposing and from what you can achieve with @llvm.invariant.start. I would expect that you can use our metadata for the length in a Java array -- it seems like it'd be straightforward for you to arrange that all loads of the array field have the metadata (and that you use the same operand on all of them) -- but there's no real motivation behind reusing the existing metadata besides simplicity and cleanliness.</div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div bgcolor="#FFFFFF" text="#000000">
I'm definitely open to working together on a revised version of a
more general invariant mechanism. In particular, we don't have a
good way of modelling Java's "final" fields* in the IR today since
the initialization logic may be visible to the compiler. Coming up
with something which supports both use cases would be really
useful.<br></div></blockquote><div><br></div><div>This seems like something that our proposed mechanism may be able to support; we intend to use it for const and reference data members in C++, though the semantics of those are not quite the same.</div></div></div></div></div></blockquote><div><br></div></div></div>ObjC (and Swift, and probably a number of other languages) has a optimization opportunity where there’s a global variable that’s known to be constant after its initialization. (For the initiated, I’m talking here primarily about ivar offset variables.) However, that initialization is run lazily, and it’s only at specific points within the program that we can guarantee that it’s already been performed. (Namely, before ivar accesses or after message sends to the class (but not to instances, because of nil).) Those points usually guarantee the initialization of more than one variable, and contrariwise, there are often several such points that would each individually suffice to establish the guarantee for a particular load, allowing it to be hoisted/reordered/combined at will.</div><div><br></div><div>So e.g.</div><div><br></div><div> if (cond) {</div><div> // Here there’s an operation that proves to us that A, B, and C are initialized.</div><div> } else {</div><div> // Here there’s an operation that proves it for just A and B.</div><div> }</div><div><br></div><div> for (;;) {</div><div> // Here we load A. This should be hoist able out of this loop, independently of whatever else happens in this loop.</div><div> }</div><div><br></div><div>This is actually the situation where ObjC currently uses !invariant.load, except that we can only safely use it in specific functions (ObjC method implementations) that guarantee initialization before entry and which can never be inlined.</div><div><br></div><div>Now, I think something like invariant.start would help with this, except that I’m concerned that we’d have to eagerly emit what might be dozens of invariant.starts at every point that established the guarantee, which would be pretty wasteful even for optimized builds. If we’re designing new metadata anyway, or generalizing existing metadata, can we try to make this more scalable, so that e.g. I can use a single intrinsic with a list of the invariants it establishes, ideally in a way that’s sharable between calls?</div></div></blockquote><div><br></div><div>It seems we have three different use cases:</div><div><br></div><div>1) This invariant applies to this load and all future loads of this pointer (ObjC / Swift constants, Java final members)</div><div>2) This invariant applies to this load and all past and future loads of this pointer (Java array length)</div></div></div></div></div></blockquote><div><br></div></div></div>Hmm. I’m not really seeing what you’re saying about past and future. The difference is that the invariant holds, but only after a certain point; reordering the load earlier across side-effects etc. is fine (great, even), it just can’t cross a particular line.</div><div><br></div><div>I assume the only reason that bounding the invariant isn’t important for Philip's array-length is that the initialization is done opaquely, so that the optimizer can’t see the pointer until it’s been properly initialized. That is, the bound is enforced by SSA.</div><div><br></div><div>I think the real difference here is whether SSA value identity can give us sufficient information or not.</div><div><br></div><div>For C++ vtables and const/reference members, it does, because the semantics are dependent on “blessed” object references (the result of the new-expression, the name of the local variable, etc.) which mostly have undefined behavior if re-written. Maybe you need to make some effort to not have GEPs down to base classes break the optimization, but that’s probably easy.</div><div><br></div><div>For the Java cases, it still does, because the object becomes immutable past a certain point (the return from the new operation), which also narrows most/all of the subsequent accesses to a single value. So you can bless the object at that point; sure, you theoretically give up some optimization opportunities if ‘this’ was stored aside during construction, but it’s not a big deal.</div><div><br></div><div>For my ObjC/Swift cases, it’s not good enough, because the addresses that become immutable are global. Each load is going to re-derive the address from the global, so no amount of SSA value propagation is going to help; thus the barrier has to be more like a memory barrier than just an SSA barrier. (The biggest distinguishing feature from actual atomic memory barriers is that dominating barriers trivialize later barriers, regardless of what happens in between.)</div><div><br></div><div>(Of course, Swift has situations more like the C++ and Java opportunities, too.)</div><div><br></div><div>So maybe these are really different problems and there’s no useful shared generalization.</div></div></blockquote><div><br></div><div>I'm increasingly thinking that's the case; we now intend to propose adding new metadata rather than extending / generalizing !invariant.load.</div></div></div></div>