[LLVMdev] How best to represent assume statements in LLVM IR?

Tue Oct 2 14:43:51 PDT 2012

On 10/02/2012 12:52 AM, Duncan Sands wrote:
> Hi Philip,
>
>> Does anyone have any suggestions on how to best represent an assumption
>> statement(*) in IR?
>
> good question!  There have been various attempts, for example Nick tried
> teaching the optimizers to not prune the branch to unreachable in
>
>    br %cond, label %assumption_holds, %assumption_doesnt_hold
> assumption_doesnt_hold:
>    unreachable
>
> This then leads to %cond being replaced with true everywhere downstream,
> which
> is good.  Unfortunately it also causes a bunch of other optimizations to
> not
> occur, and the overall result was not a win.
>
> Rafael added "range" metadata to the IR, however it is only for loads.
> If it
> could also be attached to the definition of %cond, saying that the value
> is 1,
> and various places are taught to understand that, then that might be
> another
> possibility.
It really seems to me that the core problem is that the assumption 
branch is removed before the information it contained could be leveraged 
to remove downstream code.  If the "unreachable" is replaced with a call 
to an external function marked noreturn (i.e. a trap like function), the 
optimizer does correctly optimize downstream code.  This seems to be a 
classic ordering problem.

Note: I haven't studied how the downstream code is optimized.  It seems 
odd to me that the information is lost when the branch is lost.  I may 
dig into what's causing the analysis information to be dropped - there 
might simply be a bug somewhere along the way.

Nick's approach seems to tackle the ordering problem by preventing the 
optimization of the unreachable branch.  As you pointed out, this is 
undesirable since we don't want the code to calculate the assumption 
condition to be retained.

I'm not quiet clear on how Rafael's solution addresses the problem at 
hand, so I won't address that.

I've gone ahead and implemented my hack solution - use an external 
function to force downstream optimization, replace it with unreachable 
and then re-optimize.  At least for the trivial examples I'm testing 
with, this seems to get the desired behavior.  The downside of course is 
that it requires the all of the other passes to be run twice.

If anyone is interested, I can throw the (nearly trivial) code up on 
github for anyone else who wants it.

 > However I think people would rather only have such
> metadata be
> attached to "inputs" to a function: function parameters, loads from
> memory etc.
> In your case, is the assumption usually about a function parameter?
For the initial examples I'm considering - specifically object 
invariants and function preconditions - all assumptions are about 
function parameters.  However, I can think of some examples where having 
the ability to note an assumption about an arbitrary intermediate value 
might be useful.  If I can, I'd like to not exclude that case.

One such example (for a language with array bounds checks):
int n = external_func_compute_loop_bound();
assume n < array.size();
for(int i = 0; i < n; i++) {
   process(arrray[i]);
}

Philip

*removed extra text for readability*