[LLVMdev] Optimization hints for "constant" loads
Philip Reames
listmail at philipreames.com
Mon Oct 13 14:59:08 PDT 2014
On 10/11/2014 06:01 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Philip Reames" <listmail at philipreames.com>
>> To: "Hal Finkel" <hfinkel at anl.gov>
>> Cc: nicholas at mxc.ca, llvmdev at cs.uiuc.edu
>> Sent: Friday, October 10, 2014 8:07:55 PM
>> Subject: Re: Optimization hints for "constant" loads
>>
>>
>> On 09/28/2014 01:22 AM, Hal Finkel wrote:
>>> ----- Original Message -----
>>>> From: "Philip Reames" <listmail at philipreames.com>
>>>> To: llvmdev at cs.uiuc.edu
>>>> Cc: "Hal Finkel" <hfinkel at anl.gov>, nicholas at mxc.ca
>>>> Sent: Wednesday, September 10, 2014 12:11:28 AM
>>>> Subject: Optimization hints for "constant" loads
>>>>
>>>> I'm looking at how to optimize IR which contains reads from a
>>>> field
>>>> which is known to be initialized exactly once. I've got an idea
>>>> on
>>>> how
>>>> to approach this, but wanted to see if others have alternate ideas
>>>> or
>>>> similar problems which might spark discussion. It feels like
>>>> there's
>>>> a
>>>> potentially generally useful optimization hint here if we can
>>>> generalize
>>>> it sufficiently without loosing optimization potential.
>>>>
>>>> The problem:
>>>> struct array {
>>>> private:
>>>> // once initialized 'len' never changes
>>>> int len;
>>>> // data can be modified at will
>>>> char data[0];
>>>> public:
>>>> static array* make(int len) {
>>>> array* a = ... allocate uninitialized space
>>>> a->len = len;
>>>> return a;
>>>> }
>>>> };
>>>> void access(array* a, int idx) {
>>>> if( idx >= 0 && idx <- a->len ) {
>>>> a->data[idx] = 5;
>>>> }
>>>> }
>>>> void foo(array* a) {
>>>> for(int i = 0; i < a->len; i++) {
>>>> access(a, i);
>>>> }
>>>> }
>>>> // assume 'access' is inlined into 'foo' and the loop is unrolled
>>>> a
>>>> time
>>>> or two
>>>>
>>>> To phrase that again in english, I've got a field which is
>>>> initialized
>>>> once, with naive code which reads from it many times. I know at
>>>> IR
>>>> generation time that a load from array::len is special, but I
>>>> loose
>>>> this
>>>> information as soon as I generate IR. In particular, I run into
>>>> aliasing problems where the location a->len is considered
>>>> 'mayalias'
>>>> with unrelated stores thus preventing value forwarding, LICM, and
>>>> other
>>>> desirable optimizations.
>>>>
>>>> Existing Approaches:
>>> I think that, at least in theory, we already have a solution to
>>> this problem. If, after the initialization is complete, you insert
>>> a call to the llvm.invariant.start intrinsic (and perhaps also do
>>> so at the start of any routine you know can be called only after
>>> initialization is complete), that should convey the information
>>> you want.
>>>
>>> I've never worked with this intrinsic before, but if this does not
>>> work, I'd be really curious to know why.
>> From some experiments, the existing invariant intrinsics appear
>> nearly
>> useless for my purposes. The problem is that any call can hide a
>> invariant.end intrinsic. As a result, the optimizer must
>> conservatively
>> assume that any call clobbers the "invariant" location. This makes
>> the
>> intrinsic a non-starter in it's current form.
>>
>> ; Function Attrs: uwtable
>> define zeroext i1 @_Z4testv() #0 {
>> %1 = tail call noalias i8* @_Znwm(i64 4) #3
>> %2 = bitcast i8* %1 to i32*
>> store i32 5, i32* %2, align 4, !tbaa !1
>> %discard = call {}* @llvm.invariant.start(i64 4, i8* %1)
>> tail call void @_Z6escapePi(i32* %2)
>> %3 = load i32* %2, align 4, !tbaa !1
>> %4 = icmp eq i32 %3, 5 <-- this conditional should be folded away
>> and is not
>> ret i1 %4
>> }
>>
>> declare {}* @llvm.invariant.start(i64, i8*)
>>
>> ; Function Attrs: nobuiltin
>> declare noalias i8* @_Znwm(i64) #1
>>
>> declare void @_Z6escapePi(i32*) #2
>>
>> It also appears that the intrinsic has limited implementation in the
>> optimizer. Even surprisingly simple cases don't appear to kick in.
>> Consider:
>> define zeroext i1 @_Z4testv() #0 {
>> %1 = tail call noalias i8* @_Znwm(i64 4) #4
>> %2 = bitcast i8* %1 to i32*
>> store i32 5, i32* %2, align 4, !tbaa !1
>> %discard = tail call {}* @llvm.invariant.start(i64 4, i8* %1)
>> %3 = load i32* %2, align 4, !tbaa !1
>> %4 = icmp eq i32 %3, 5 <-- This conditional should be folded
>> tail call void @_Z6escapePi(i32* %2, i1 %4)
>> %5 = load i32* %2, align 4, !tbaa !1
>> %6 = icmp eq i32 %5, 5
>> ret i1 %6
>> }
>>
>>
>> We could extend the existing intrinsic with a notion of
>> invariant.start
>> that has no end. This could be as simple as adding a boolean
>> parameter
>> to the intrinsic. I think this could be made to work. There'd be
>> other
>> implementation work needed to make it actually useful, the validity
>> based on dominance model used by assumes seems like an obvious
>> candidate.
>>
>> Thinking through this, I see a couple of places that we'd change:
>> - isLocationKnownInvariant(Value* Loc, Instruction* Cntx) and related
>> analysis pass (analogous to assumptions)
>> - The new "no end" version is easy (dominance)
>> - The older version is a graph walk looking paths which include
>> either an end, or call.
>> - eagerly propagate known invariant location store values in EarlyCSE
>> (strictly by dominance for the new intrinsic form)
>> - Add an InstCombine rule to fold a store to a known invariant
>> location
>> to undef
>> - Teach GVN about it (gets the non dominance cases), could also do
>> fun
>> things which phis of invariant locations, but not sure that's
>> actually
>> useful
>> - Teach LICM to lift loads of known invariant locations out of loops
>>
>> Does this seem like a workable approach?
> Yes, I think this sounds quite reasonable.
I'll start throwing together some patches. It'll take a couple of weeks
due to current distractions, but I'll get something up for review when I
can.
Philip
More information about the llvm-dev
mailing list