[LLVMdev] Optimization hints for "constant" loads

Philip Reames listmail at philipreames.com
Mon Sep 29 15:33:34 PDT 2014


On 09/28/2014 01:22 AM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Philip Reames" <listmail at philipreames.com>
>> To: llvmdev at cs.uiuc.edu
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, nicholas at mxc.ca
>> Sent: Wednesday, September 10, 2014 12:11:28 AM
>> Subject: Optimization hints for "constant" loads
>>
>> I'm looking at how to optimize IR which contains reads from a field
>> which is known to be initialized exactly once.  I've got an idea on
>> how
>> to approach this, but wanted to see if others have alternate ideas or
>> similar problems which might spark discussion.  It feels like there's
>> a
>> potentially generally useful optimization hint here if we can
>> generalize
>> it sufficiently without loosing optimization potential.
>>
>> The problem:
>> struct array {
>> private:
>>     // once initialized 'len' never changes
>>     int len;
>>     // data can be modified at will
>>     char data[0];
>> public:
>>     static array* make(int len) {
>>       array* a = ... allocate uninitialized space
>>       a->len = len;
>>       return a;
>>     }
>> };
>> void access(array* a, int idx) {
>>     if( idx >= 0 && idx <- a->len ) {
>>       a->data[idx] = 5;
>>     }
>> }
>> void foo(array* a) {
>>     for(int i = 0; i < a->len; i++) {
>>       access(a, i);
>>     }
>> }
>> // assume 'access' is inlined into 'foo' and the loop is unrolled a
>> time
>> or two
>>
>> To phrase that again in english, I've got a field which is
>> initialized
>> once, with naive code which reads from it many times.  I know at IR
>> generation time that a load from array::len is special, but I loose
>> this
>> information as soon as I generate IR.  In particular, I run into
>> aliasing problems where the location a->len is considered 'mayalias'
>> with unrelated stores thus preventing value forwarding, LICM, and
>> other
>> desirable optimizations.
>>
>> Existing Approaches:
> I think that, at least in theory, we already have a solution to this problem. If, after the initialization is complete, you insert a call to the llvm.invariant.start intrinsic (and perhaps also do so at the start of any routine you know can be called only after initialization is complete), that should convey the information you want.
>
> I've never worked with this intrinsic before, but if this does not work, I'd be really curious to know why.
Huh.  You're right, this should work.  I'm going to run some experiments 
and see what happens in practice.

Thanks for the pointer on this!  Not sure how I managed to miss this 
approach.
>
> Thanks again,
> Hal
>
>
>> 1) Use TBAA -  By tagging loads and stores to the two fields of the
>> array struct as disjoint branches in the TBAA tree, I can inform LLVM
>> that a load of 'len' never aliases with a store through 'data'.  This
>> mostly works, and enables many loads to be forwarded by GVN, but (due
>> to
>> the intervening stores) is a complete loss in EarlyCSE and (due to
>> intervening calls) LICM.
>>      a) Things like http://llvm.org/bugs/show_bug.cgi?id=20805 could
>> improve the situation in EarlyCSE.
>> 2) Use "invariant.load" metadata - This metadata indicates that the
>> field loaded from is initialized before the execution of the code
>> being
>> compiled.  In particular, "invariant.load" implies that the load is
>> not
>> control dependent on any branch, is safe to speculate, and that no
>> write
>> aliases the location read from.  This mostly works, but only if
>> 'array::make' is never visible in the IR.   As soon as 'array::make'
>> gets inlined, all bets are off and mis-compiles may result.
>>      a) Also, in practice, only LICM really knows about
>> "invariant.load".  It would be pretty straight forward to teach both
>> EarlyCSE and GVN about them though.
>> http://llvm.org/bugs/show_bug.cgi?id=20806
>>
>> New Approaches:
>> (This isn't so much "being proposed" as "being put forward for
>> discussion".)
>>
>> 1) Introduce a new metadata type "initialized-before-load" which
>> implies
>> that the value of any two loads tagged with the metadata along any
>> execution path must yield the same result.
>>
>> This doesn't give much freedom to the 'first' load; it's still
>> control
>> dependent, can't be reordered with preceding stores, can't be lifted
>> out
>> of loops, etc...  It can however be reordered with following stores
>> or
>> sunk out of a loop provided the loop body is known to execute at
>> least
>> once.
>>
>> The second load has a lot more freedom.  Provided that there is
>> always
>> another load to the same location (with the metadata) provable
>> preceding
>> it on all paths, it can be reordered freely, lifted over control
>> branches, lifted out of loops, etc... Importantly, it is also legal
>> to
>> forward the value of a preceding load to a later load provided that
>> a)
>> both have the metadata and b) that one load executes strictly before
>> the
>> other.
>>
>> A store marked "initialized-before-load" is undefined if there is a
>> load
>> with the same metadata on the same location preceding it. There may
>> be
>> *multiple* stores to the location along a path, provided that the
>> first
>> load is strictly after *all* of them.
>>
>> This seems staight forward to implement in EarlyCSE and LICM.  I
>> haven't
>> looked closely at GVN, but expect it's probably not hard either.
>>
>> 2) Introduce a slightly different metadata "initialized-once".
>> Semantics
>> are very similar to the preceding except that there can only be a
>> single
>> store to the location along any path.
>>
>> Value forwarding from the store to a following load (with metadata)
>> is
>> allowed regardless of potentially aliasing intervening stores.
>>
>> This was actually my original idea, but it has a couple of problems.
>> First, it breaks on surprisingly common initialization patterns such
>> as
>> default initialization followed by real initialization.  Secondly,
>> I'm
>> not sure the optimizer would always preserve the "write once"
>> property.
>> In particular, the optimizer is free to divide a large write into
>> several smaller ones (assuming the write is not atomic.)
>>
>>
>>
>> Thoughts?  Suggestions?  Similar sounding problems this might be able
>> to
>> solve?
>>
>> Philip
>>
>>




More information about the llvm-dev mailing list