[llvm-dev] invariant.load metadata semantics

Mon Aug 29 13:21:40 PDT 2016

Hi Justin,

Justin Lebar wrote:
 > Sanjoy, can you clarify this bit about JVM array lengths?
 >
 > Presumably the same pointer can point to two arrays of different lengths
 > during a program's execution.

Yes, if you define "same pointer" as "pointers with the same bitwise
representation".  However, we do not have placement new or free, and
we do not have to worry about pointers that are visible to the program
being re-used.  That is, a pointer pointing to an array in the heap
will not be re-used for any other allocation while the program can
still reach it.

 > Does this mean that you're relying on
 > invariant.load having function scope?  That is, correctness depends on
 > the pointer not being reused for an array of a different length between
 > the first invariant load of that array length in a function and the last
 > (possibly not invariant) load in the function?

Are you talking about cases like these:

void f(array_t* a) {
   // Not sure if this is well-defined C++, but you get the picture
   int l = a->length;  // invariant
   delete a;
   new(a) array_t(other_length);
   int l2 = a->length; // also invariant
}

A case like above cannot happen for us since we don't have placement
new.  An allocation always returns a logically new reference, which
goes away once the application can provably no longer look at that
reference (not just load from it, but also compare it to other
references etc.).  An allocation can return a "logically new" location
that happens to be bitwise equal to something it returned before (in
fact, it has to, otherwise we'll need infinite memory :) ), but the GC
(this includes the stuff we've added to LLVM) makes sure that we
re-use only locations that are no longer visible to the application.

Another way to look at this is, the GC provides an illusion of an
infinitely large heap, where every "new" returns a new distinct
reference.  For instance, in cases like:

void do_something(array_t* a);

void f(array_t* a) {
   do_something(a);
   array_t* b = gc_new_array(...);
   boolean c = (a == b);
}

`c` is *always* false.  Since we have a use of `a` when computing `c`,
it _cannot_ have gone away at the point of the comparison[0].

 > This seems like a very weak form of invariance.  For example, when you
 > inline, you would have to transform invariant loads to the scoped
 > invariant thing.

I don't think we have to, but if you have some specific examples, I'll
be more than happy to comment on those.  And thanks for bringing this
up!

-- Sanjoy

[0]: Actually it can, since we can first fold `c` to `false`, after
   which the GC can re-use the space for `a` when allocating `b`.  But
   that falls into "as-if" type optimization.