[LLVMdev] Optimization hints for "constant" loads

Tue Sep 9 22:11:28 PDT 2014

I'm looking at how to optimize IR which contains reads from a field 
which is known to be initialized exactly once.  I've got an idea on how 
to approach this, but wanted to see if others have alternate ideas or 
similar problems which might spark discussion.  It feels like there's a 
potentially generally useful optimization hint here if we can generalize 
it sufficiently without loosing optimization potential.

The problem:
struct array {
private:
   // once initialized 'len' never changes
   int len;
   // data can be modified at will
   char data[0];
public:
   static array* make(int len) {
     array* a = ... allocate uninitialized space
     a->len = len;
     return a;
   }
};
void access(array* a, int idx) {
   if( idx >= 0 && idx <- a->len ) {
     a->data[idx] = 5;
   }
}
void foo(array* a) {
   for(int i = 0; i < a->len; i++) {
     access(a, i);
   }
}
// assume 'access' is inlined into 'foo' and the loop is unrolled a time 
or two

To phrase that again in english, I've got a field which is initialized 
once, with naive code which reads from it many times.  I know at IR 
generation time that a load from array::len is special, but I loose this 
information as soon as I generate IR.  In particular, I run into 
aliasing problems where the location a->len is considered 'mayalias' 
with unrelated stores thus preventing value forwarding, LICM, and other 
desirable optimizations.

Existing Approaches:
1) Use TBAA -  By tagging loads and stores to the two fields of the 
array struct as disjoint branches in the TBAA tree, I can inform LLVM 
that a load of 'len' never aliases with a store through 'data'.  This 
mostly works, and enables many loads to be forwarded by GVN, but (due to 
the intervening stores) is a complete loss in EarlyCSE and (due to 
intervening calls) LICM.
    a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could 
improve the situation in EarlyCSE.
2) Use "invariant.load" metadata - This metadata indicates that the 
field loaded from is initialized before the execution of the code being 
compiled.  In particular, "invariant.load" implies that the load is not 
control dependent on any branch, is safe to speculate, and that no write 
aliases the location read from.  This mostly works, but only if 
'array::make' is never visible in the IR.   As soon as 'array::make' 
gets inlined, all bets are off and mis-compiles may result.
    a) Also, in practice, only LICM really knows about 
"invariant.load".  It would be pretty straight forward to teach both 
EarlyCSE and GVN about them though. 
http://llvm.org/bugs/show_bug.cgi?id=20806

New Approaches:
(This isn't so much "being proposed" as "being put forward for discussion".)

1) Introduce a new metadata type "initialized-before-load" which implies 
that the value of any two loads tagged with the metadata along any 
execution path must yield the same result.

This doesn't give much freedom to the 'first' load; it's still control 
dependent, can't be reordered with preceding stores, can't be lifted out 
of loops, etc...  It can however be reordered with following stores or 
sunk out of a loop provided the loop body is known to execute at least 
once.

The second load has a lot more freedom.  Provided that there is always 
another load to the same location (with the metadata) provable preceding 
it on all paths, it can be reordered freely, lifted over control 
branches, lifted out of loops, etc... Importantly, it is also legal to 
forward the value of a preceding load to a later load provided that a) 
both have the metadata and b) that one load executes strictly before the 
other.

A store marked "initialized-before-load" is undefined if there is a load 
with the same metadata on the same location preceding it. There may be 
*multiple* stores to the location along a path, provided that the first 
load is strictly after *all* of them.

This seems staight forward to implement in EarlyCSE and LICM.  I haven't 
looked closely at GVN, but expect it's probably not hard either.

2) Introduce a slightly different metadata "initialized-once". Semantics 
are very similar to the preceding except that there can only be a single 
store to the location along any path.

Value forwarding from the store to a following load (with metadata) is 
allowed regardless of potentially aliasing intervening stores.

This was actually my original idea, but it has a couple of problems.  
First, it breaks on surprisingly common initialization patterns such as 
default initialization followed by real initialization.  Secondly, I'm 
not sure the optimizer would always preserve the "write once" property.  
In particular, the optimizer is free to divide a large write into 
several smaller ones (assuming the write is not atomic.)

Thoughts?  Suggestions?  Similar sounding problems this might be able to 
solve?

Philip