[LLVMdev] How best to represent assume statements in LLVM IR?
Philip Reames
listmail at philipreames.com
Tue Oct 2 14:43:51 PDT 2012
On 10/02/2012 12:52 AM, Duncan Sands wrote:
> Hi Philip,
>
>> Does anyone have any suggestions on how to best represent an assumption
>> statement(*) in IR?
>
> good question! There have been various attempts, for example Nick tried
> teaching the optimizers to not prune the branch to unreachable in
>
> br %cond, label %assumption_holds, %assumption_doesnt_hold
> assumption_doesnt_hold:
> unreachable
>
> This then leads to %cond being replaced with true everywhere downstream,
> which
> is good. Unfortunately it also causes a bunch of other optimizations to
> not
> occur, and the overall result was not a win.
>
> Rafael added "range" metadata to the IR, however it is only for loads.
> If it
> could also be attached to the definition of %cond, saying that the value
> is 1,
> and various places are taught to understand that, then that might be
> another
> possibility.
It really seems to me that the core problem is that the assumption
branch is removed before the information it contained could be leveraged
to remove downstream code. If the "unreachable" is replaced with a call
to an external function marked noreturn (i.e. a trap like function), the
optimizer does correctly optimize downstream code. This seems to be a
classic ordering problem.
Note: I haven't studied how the downstream code is optimized. It seems
odd to me that the information is lost when the branch is lost. I may
dig into what's causing the analysis information to be dropped - there
might simply be a bug somewhere along the way.
Nick's approach seems to tackle the ordering problem by preventing the
optimization of the unreachable branch. As you pointed out, this is
undesirable since we don't want the code to calculate the assumption
condition to be retained.
I'm not quiet clear on how Rafael's solution addresses the problem at
hand, so I won't address that.
I've gone ahead and implemented my hack solution - use an external
function to force downstream optimization, replace it with unreachable
and then re-optimize. At least for the trivial examples I'm testing
with, this seems to get the desired behavior. The downside of course is
that it requires the all of the other passes to be run twice.
If anyone is interested, I can throw the (nearly trivial) code up on
github for anyone else who wants it.
> However I think people would rather only have such
> metadata be
> attached to "inputs" to a function: function parameters, loads from
> memory etc.
> In your case, is the assumption usually about a function parameter?
For the initial examples I'm considering - specifically object
invariants and function preconditions - all assumptions are about
function parameters. However, I can think of some examples where having
the ability to note an assumption about an arbitrary intermediate value
might be useful. If I can, I'd like to not exclude that case.
One such example (for a language with array bounds checks):
int n = external_func_compute_loop_bound();
assume n < array.size();
for(int i = 0; i < n; i++) {
process(arrray[i]);
}
Philip
*removed extra text for readability*
More information about the llvm-dev
mailing list