[LLVMdev] Optimization hints for "constant" loads

Daniel Berlin dberlin at dberlin.org
Thu Dec 4 13:30:50 PST 2014


On Tue Dec 02 2014 at 10:03:23 AM Andrew Trick <atrick at apple.com> wrote:

> On Dec 1, 2014, at 3:22 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>
> On 12/01/2014 02:42 PM, Andrew Trick wrote:
>
>
>  On Dec 1, 2014, at 2:21 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
>
> On 12/01/2014 11:14 AM, Andrew Trick wrote:
>
>
>  On Oct 21, 2014, at 4:03 PM, Philip Reames <listmail at philipreames.com>
> wrote:
>
> Sanjoy made a good point.  We don't actually need a new variant of
> "invariant.start".  Simply using an invariant.start with no uses gives us a
> notion of an invariant region with no end.  (Since the result doesn't
> escape, there can be no end hidden inside a function call.)   This seems
> like a simple notion to exploit and is strict step forward from where we
> are, without a new intrinsic.
>
>
>  I'm coming back to this because it is a sticking point for me in all of
> the proposals that were floated informally on the commits list. I would
> really like this to work, but can't prove that it's safe.
>
> As I said in the other thread, !invariant.load and llvm.invariant.* are
> not the same.  This thread is discussing the later, not the former.  The
> other thread is discussing the former, but not the later.
>
>
>  Given:
>
>  %a = newArray()
> %pLen = gep(%a, <length offset>)
> invariant.start(<size>, %pLen)
> %len = load %pLen
>
>  %o = foo() // may deallocate %a
> store %something
> load %o
>
>  I want to claim that the LLVM optimizer cannot do the following (but
> don't have a complete line of reasoning yet):
>
>  %a = newArray()
> %pLen = gep(%a, <length offset>)
> invariant.start(<size>, %pLen)
> %len = load %pLen
>
>  %o = foo() // may deallocate %a
> if ptrtoint(%o) == ptrtoint(%pLen)
>   load %pLen
>   store %something // Oops... this might alias
> else
>   store %something
>   load %o
>
> Just to make sure we're on the same page, it *would* be legal for the
> optimizer to construct:
> %a = newArray()
> %pLen = gep(%a, <length offset>)
> invariant.start(<size>, %pLen)
> %len = load %pLen
>
>  %o = foo() // may deallocate %a
> if ptrtoint(%o) == ptrtoint(%pLen)
>   store %something // Oops... this might alias
>   load %pLen <--- This is now after the store
>  else
>   store %something
>   load %o
>
> Are we in agreement here?
>
>
>  It’s fine in the sense that memory access has not been reordered… yet.
>
> I think I've actually convinced myself my example is not okay, but for a
> completely different reason.  :)  See my last comment below.
>
>
>  But is there an implicit invariant.end(%pLen) at the call to foo(), at
> which point the array can be freed and another object allocated in its
> place? I don’t think so.
>
> Given the current semantics, if the result of the invariant.start is not
> passed to foo, we can assume foo does not contain an invariant.end.
> However, for foo to allocate a new object in the same memory, it must write
> to that memory.  Doing so without ending the previous invariant region is
> ill defined.  As such, your example isn't a valid test.
>
> We have two cases:
> a) The invariant region hasn't ended.  As a result, the input IR is ill
> defined and the problematic reordering can happen.
> b) The invariant region ended (and we either encountered a
> llvm.invariant.end, or saw the result of invariant.start passed to foo). In
> this case, the location is no longer invariant and the reordering isn't
> legal and wouldn't happen.
>
> As I think came up before, this would imply that an invariant.end can't be
> dropped without dropping it's corresponding invariant.start.  That's
> problematic.
>
>  The way I interpreted the conclusion of this email thread is that
> invariant.start can be used without invariant.end.
>
> We had a proposal for this, but I don't believe it's actually been
> implemented yet.  I believe this proposal is still consistent with
> everything I said above though.
>
> Since the token returned by the intrinsic never escapes, you can just
> assume the memory is invariant at all uses. The problem here is that the
> optimizer could (in theory) introduce new uses of the pointer after the
> object lifetime. This is the same problem that you yourself have raised. I
> hoped we could sidestep the problem because it crops up with any
> representation that attaches invariance to a pointer value. If we have an
> answer for this, then we can easily debate other representations like
> broadening !invariant.load metadata semantics and introducing new
> intrinsics.
>
> I agree there is a general issue here.  However, I don't actually think
> it's an issue of invariance.  Instead, I see this as being a problem of
> *derefenceability*.  Regardless of the 'invariant-ness' of the memory in
> question, it's problematic for the optimizer to insert uses after a
> deallocation because the memory transitions from dereferenceable to
> non-dereferenceable.  (i.e. Consider a malloc routine allocates one page
> per object, and a free which protects the page freed.)  Any transformation
> which inserts such a load is dangerous.
>
> What your example really shows is that we can have two names for the same
> *memory address*.  One of those names can be dereferenceable while the
> other is not.
>
> The dynamic check of the pointer values essentially allows the optimizer
> to chose which of the two names for the same physical address it wants to
> use.  I think that is valid, but it's odd to say the least.  The particular
> example you've shown isn't legal since the optimizer is choosing to insert
> a load to a non-dereferenceable location and thus could introduce a fault.
>
>
> Sorry, I didn’t follow your response. I was showing that the optimizer can
> theoretically substitute the use of one pointer value (%o) with another
> pointer value (%pLen) after checking value equivalence. Doing that creates
> multiple uses of the same pointer value where not all uses dereference the
> same memory object. To me, that means we cannot associate a pointer
> value/address with an object-specific attribute, like invariance, without
> tying it to a closed lifetime.
>


Agreed.


This is actually one of the nice things memory ssa provides/provided GCC.
You got versioning of the global memory state for free, and lifetime info
embedded directly into the IR in the normal SSA way.

For a given instruction, the state of the virtual uses for the instruction
was sufficient to identify whether the memory state was the same, so value
substitution was easy.

You could also do the inverse, and prove a better result than memory ssa
already had given you - that the virtual uses were really the same due to
non-aliasing.

This made value numbering loads/stores and substitution very easy.

I wish we could prevent the optimizer from doing this.
>
> -Andy
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20141204/e7d86a36/attachment.html>


More information about the llvm-dev mailing list