[LLVMdev] Proposal for safe-to-execute meta-data for heap accesses

Thu Nov 7 21:39:11 PST 2013

Hi!

Previously in the "Add a 'notrap' function attribute?” thread we had discussed a way to use meta-data to specify that a load does not trap under certain conditions.  Andy and Hal and I talked about this more in private and I just wanted to summarize what I think we arrived at.  First I’ll summarize the original !notrap meta-data, then I’ll just mention why it’s hard to get it right, and finally I’ll show the new safe-to-execute proposal.

PROBLEM

The reason for this is that without such meta-data, it’s difficult to hoist loads around branches.  Here’s a great example:

	for (…)
		if (%c)
			%v = load %p

Let’s say that %c is *not* loop invariant, but %p is loop invariant and the loop never clobbers the location pointed-to by %p.  Let’s assume that we already know that %c is usually try and so we’d like to hoist the load out of the loop.  But in this scenario, any phase that wants to hoist the load must conservatively assume that %c might guard whether the load is safe-to-execute.  There is no other way for an LLVM phase to determine what makes any load safe-to-execute, so every branch on any path to that load is conservatively assumed to be a safety guard.  Even if %c is known to be unrelated to whether or not %p is a valid pointer - something that arises trivially in many non-C languages - LLVM will still fail to hoist in this case.

PREVIOUS PROPOSAL

Previously we had debated a meta-data format like:

	%v = load %p !notrap !{ %c }

The idea being that if there exists a program point P that is dominated by %c and where %c is provably true, then it is safe to emit that load at P.  Ideally this would allow an LLVM frontend to express what constraints must be satisfied for the load to be safe-to-execute.  For example, a Java frontend might use !{ %c } to describe a null check.

The downside of such meta-data is that !{ %c } is not an ordinary SSA use.  You can’t reason about it the way you would ordinarily reason about uses.  For example:

	if (%c)
		%v = load %p !notrap !{ %c }

If !{ %c } were an ordinary use then you could transform this to:

	if (%c)
		%v = load %p !notrap !{ 1 }

But then the world breaks: it would seem that the load is hoistable to anywhere.  So, we would have to come up with some special new rules for what you can do to a !{ %c } use; likely this would involve conservatively erasing the meta-data if you did *anything* to %c.

NEW PROPOSAL

The solution is to introduce meta-data that is explicit about how the safe-to-execute condition ought to be evaluated.  Instead of an SSA use, we can have meta-data that says:

	%v = load %p !notrap !{ @f, <args> }

where @f is a function in the current module and this function returns i1, and <args> is zero or more arguments to pass to @f.  As with any meta-data, this doesn’t imply anything different if you wanted to just execute the code: executing the load doesn’t imply calling @f; indeed if you dropped the meta-data the execution of this would still be the same.

What the meta-data is telling you is two-fold:

	1) If you could prove that at some program point P, a call to @f with arguments <args> would return true, then it is safe to execute the load at P and doing so will definitely not trap.

	2) Wherever you see a load with !notrap !{ @f, <args> }, it means that someone has already proved that @f(<args>) is true at that point and execution is undefined (i.e. your computer can do improv) if @f(<args>) was false at that point.

Here are two examples of how you can use this to optimize things.

- Hoisting loads out of a loop for IR generated from a type-safe language that has NULL.  Consider a loop like:

	if (%p == 0)
		call <something that throws a Null Pointer Exception>
	for (…) {
		if (%c) {
			%p2 = gep %p, ...
			%v = load %p2 !notrap !{ @isNotNull, %p }
		}
	}

And let’s assume that @isNotNull just does a null comparison on its pointer argument.  In this code, it would now be possible to hoist the load out of the loop because we can indeed prove that at the loop pre-header, @isNotNull must return true because of the condition on %p.

- Control flow simplification from already-proven things:

	%p2 = gep %p, ...
	%v = load %p2 !notrap !{ @isNotNull, %p }
	if (%p == 0)
		some dead code

It is safe for a phase to conclude that the if statement cannot be taken because %p cannot be null after the load.

Thoughts?

-Filip