[LLVMdev] Optimization hints for "constant" loads

Andrew Trick atrick at apple.com
Tue Oct 21 10:58:04 PDT 2014

> On Oct 21, 2014, at 12:29 AM, Sanjoy Das <sanjoy at playingwithpointers.com> wrote:
>> I've never realy understood how the llvm.invariant intrinsics could be
>> put into practice. There is the problem that "end" can occur anywhere
>> as you suggested fixing with a flag.
> I was under this impression too, but after re-reading the language
> reference I'm not so sure -- it says about invariant.start: "This
> intrinsic indicates that
> until an llvm.invariant.end that uses the return value, the referenced
> memory location is constant and unchanging.".  I think this implies
> that if the result of an `llvm.invariant.start` doesn't escape, we can
> safely conclude that there can be no matching `llvm.invariant.end`.
> I'm still a little hazy on all of the use cases you want to support,
> but one problem with llvm.safe_cast is, as you note, stores to the
> pointer passed to safe_cast will not be forwarded to loads from the
> pointer it returns.  I think this issue can be solved by modeling
> `llvm.safe_cast` not as a transformation on a pointer, but as a check.
> Specifically, I imagine a readonly intrinsic `llvm.assert_immutable`
> that "asserts" that the pointer passed to it is immutable, and unwinds
> if that fact isn't true (these are only imaginary semantics -- we know
> the intrinsic will never actually unwind).  This means
> `llvm.assert_immutable` can't be marked nounwind (otherwise it would
> just get optimized away); but since it doesn't need to unwind to the
> same frame, a CallInst is sufficient (an InvokeInst would've increased
> the CFG's complexity).
> So
>  %p = unaliased address
>  store 42, %p
>  %a = llvm.safe_cast %p ; just made this up
>  %val = load %a !invariant
> becomes
>  %p = unaliased address
>  store 42, %p
>  call void @llvm.assert_immutable(%p)
>  %val = load %p !invariant
> AFAICT, "load %p" can not, in general, be re-ordered to before the
> call to @llvm.assert_immutable because we don't know what condition it
> uses to decide if it should unwind.  There are cases where I think
> such a re-ordering would be legal (an unused %val is one example,
> another example is where the only use of %val is a comparison with
> undef) but if I understand correctly, re-ordering a load with the call
> to @llvm.assert_immutable can only be a performance loss -- it will
> change dominance in a way that we'll "forget" that a load was
> immutable.  And I think in most of the cases that are interesting to
> optimize, the re-ordering will not be legal.
> It is still legal to value forward the store though:
>  %p = unaliased address
>  store 42, %p
>  call void @llvm.assert_immutable(%p)
>  %val = 42
> because *if* assert_immutable does not unwind, %val will see the
> dominating store, because assert_immutable is readonly.
> Again,
>  %p1 = ...
>  %p2 = llvm.safe_cast %p1
>  %a = select i1 %is_immutable, %p1, %p2
>  load %a !invariant
> could become
>  %p1 = ...
>  if (not %is_immutable) {
>    call llvm.assert_immutable(%p1)
>  }
>  load %p1
> or, if we wish to reduce control flow, we could define
> `llvm.assert_immutable` to be a no-op on null:
>  %p1 = ...
>  %a = select i1 %is_immutable, null, %p1,
>  call llvm.assert_immutable(%a)
>  load %p1 !invariant

Just so you know, I am not a big fan of adding cast nodes into the use-def chain. It has to be a last resort. I proposed it this time because I don't see another way to fully express the semantics, and in many cases the casts are limited to the initialization code. I think the optimizer could be taught to store->load forward across them with some effort.

AFAICT The only difference between assert_immutable and invariant.start is that assert_immutable "maythrow", and we don't need to find all uses of the pointer.

assert_immutable will handle your use case safely as long as you don't mark it "readonly" (loads can move above readonly calls).

The problems that we will face with assert_immutable are:

- Because it is maythrow and is not readonly, there is a dependence between assert_immutable and *all* subsequent loads. This could seriously impact optimization.

- Since the invariant load is not actually marked invariant, it can't be reordered with other stores and nonreadonly calls.

- It does not allow us in the future to speculatively hoist certain invariant loads above maythrow calls and branches.

So, I'm ok with assert_immutable as a solution to some subset of problems. But it is incomplete and suboptimal. I suspect that introducing non-readonly calls will do more harm than the benefit you get from marking some loads invariant. 

Also note that this could get interesting when we introduce TBAA on calls:

We would like to hoist loads above calls that have non-interfering TBAA tags, but also may throw (patchpoints and stackmaps are marked maythrow). 

Say we declare that loads can be hoisted above maythrow calls with TBAA. If TBAA is dropped it becomes conservative. Then you could give your assert_immutable calls TBAA that only interferes with the array length load. That would solve some of your optimization issues, but only really works well if most of your loads and calls have precise TBAA tags.

The more aggressive solution I was proposing was to mark loads that are guarded with explicit safety checks as "maySpeculate". If we do that we definitely need a way to create a data dependence on a cast - this approach is incompatible with assert_immutable.

> `llvm.assert_immutable` does with control dependencies what
> `llvm.safe_cast` does with data dependencies.
>> llvm.invariant is inherently control dependent, which doesn't really
>> allow the optimizer to hoist the invariant loads the way we want it
>> to.
> I'm curious about why you think so -- do you have examples that
> demonstrate this?  And is this a flaw in how llvm implements llvm.invariant
> or something fundamental about control dependencies?

I am saying exactly what you said just above the quote: llvm.invariant and llvm.assert_immutable guarantee safety by relying on implicit control dependence. That doesn't really give us full freedom to optimize the guarded invariant loads. They still can't be optimized across other non-readonly calls or hoisted above branches or out of loops. This is fundamental to the representation - it's not an implementation detail.


> -- Sanjoy

More information about the llvm-dev mailing list