<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/xhtml; charset=utf-8">
</head>
<body>
<div style="font-family:sans-serif"><div style="white-space:normal">
<p dir="auto">On 16 Dec 2019, at 18:16, Doerfert, Johannes via llvm-dev wrote:</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">Abstract:<br>
<br>
It is often hard or impossible to encode complex, e.g., non-boolean,<br>
information in an `llvm.assume(i1)`. This RFC describes various problems<br>
we have right now and provides alternative design ideas.<br>
<br>
<br>
<br>
Some Existing Problems:<br>
<br>
A) The boolean requirement.<br>
The current `llvm.assume(i1)` expects a boolean that is known to hold<br>
true at runtime (once the `llvm.assume` call is reached). However,<br>
forming this boolean for "arbitrary" information is hard or even<br>
impossible. Alignment, which is an easy case, already requires 3 extra<br>
instructions, one of which is a `ptrtoint` and therefore not really<br>
optimizer friendly. Dereferenceability, is even scarier. Thus, we are<br>
currently limited to (almost) boolean information when we want to<br>
manifest information in the IR (which happens due to inlining or code<br>
motion, see <a href="https://reviews.llvm.org/D69477" style="color:#777">https://reviews.llvm.org/D69477</a> for an example).<br>
<br>
B) The one-use checks.<br>
Various pattern matching passes check the number of uses of a value.<br>
However, while `llvm.assume` is eventually eliminated by the backend<br>
it will still increase the use count of the operand. I doubt we are<br>
able to not increase the use count at all (and keep everything else<br>
working), but what we can do is make sure the uses in "assumptions"<br>
are easy to spot, thus filter. This is not the case right now because<br>
of the additional instructions we need to make the values boolean.<br>
Even if you have `__builtin_assume(ptr);` the `ptr` use will not be<br>
the `llvm.assume` call but a `icmp`.<br>
<br>
C) The side-effect avoidance.<br>
`__assume`, `__builtin_assume`, `__builtin_assume_aligned`, and OpenMP<br>
`omp assume` are all defined to not evaluate their argument, thus to<br>
not cause the side effects that the evaluation of the argument would<br>
otherwise imply. The way we implement this restriction is by not<br>
emitting the argument expression in IR if it might cause a side<br>
effect. We warn the user if that happens. While this is generally<br>
speaking fine, it would be interesting to lift the *implementation*<br>
restriction. One benefit would be that we could implement `assert`<br>
through `__builtin_assume` properly.<br>
<br>
D) The singleton ranges.<br>
An `llvm.assume` will only provide information for a single program<br>
point not a range. Even if the beginning and the end of a range have<br>
an `llvm.assume`, there are cases where the information will not be<br>
as good as a proper range assumption. OpenMP 5.1 introduces such<br>
range assumptions but other situations would benefit from them as<br>
well. Take for example function attributes and inlining. Since we know<br>
they hold for the entire function and not only when it is entered we<br>
could encode the information over the entire range of the inlined<br>
code.<br>
<br>
<br>
Some Site Notes:<br>
<br>
- It seems of little use that we interleave the code for the assumed<br>
expression with the user code. Having the `llvm.assume` allows us to<br>
tie information to a program point, beyond that we just clutter the<br>
function with instructions that we later remove anyway.<br>
<br>
- Reconstructing information from the pattern of instructions that feed<br>
into the `llvm.assume` is also not optimal, especially since we do<br>
not need to "optimize" these instructions anyway.<br>
<br>
- The current (=arbitrary) side-effects of `llvm.assume` are too strong.<br>
We have `inaccessiblemem_or_argmemonly` and we might be able to come<br>
up with something even more specialized for this, e.g.,<br>
`control_dependences_only` to indicate that there are only control<br>
dependences.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">This is all well put; I think you’ve covered the major weaknesses.</p>
</div>
<div style="white-space:normal"><blockquote style="border-left:2px solid #777; color:#777; margin:0 0 5px; padding-left:5px"><p dir="auto">Some Design Ideas:<br>
<br>
1) Use named operand bundles to encode information.<br>
If we want to encode property XYZ for a value %V holds at a certain<br>
program point and the property is dependent on %N we could encode<br>
that as:<br>
`llvm.assume() ["XYZ"(%V, %N)]`<br>
There are various benefits, including:<br>
- It is freely extensible.<br>
- The property is directly tied to the value. Thus, no need for<br>
encoding patterns that introduce extra instructions and uses and<br>
which we need to preserve and decode later.<br>
- Allows dynamic properties even when corresponding attributes do<br>
not, e.g., `llvm.assume() ["align"(%arg_ptr, %N)]` is fine and<br>
once `%N` becomes a constant, or we determine a lower bound, we<br>
can introduce the `align(C)` attribute for `%arg_ptr`.<br>
<br>
2) Outline assumption expression code (with side-effects).<br>
If we potentially have side-effects, or we simply have a non-trivial<br>
expression that requires to be lowered into instructions, we can<br>
outline the assumption expression code and tie it to the<br>
`llvm.assume` via another operand bundle property. It could look<br>
something like this:<br>
`__builtin_assume(foo(i) == bar(j));`<br>
will cause us to generate<br>
```<br>
/// Must return true!<br>
static bool llvm.assume.expression_#27(int i, int j) {<br>
return foo(i) == bar(j);<br>
}<br>
```<br>
and a `llvm.assume` call like this:<br>
`llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]<br>
So we generate the expression in a new function which we (only) tie to<br>
the `llvm.assume()` through the "assume_fn" operand bundle. This will<br>
make sure we do not accidentally evaluate the code, or assume it is<br>
evaluated and produced side-effects. We can still optimize the code<br>
and use the information that we learn from it at the `llvm.assume`<br>
site though.</p>
</blockquote></div>
<div style="white-space:normal">
<p dir="auto">I think outlining is abstractly very promising, but I’m worried about<br>
it impacting existing optimizations:</p>
<ul>
<li><p dir="auto">It’s won’t be obvious at all from the IR that the predicate function<br>
is dead code. It would be a shame if we ended up emitting the predicate<br>
function, or some global only it references, because we didn’t delete<br>
all the llvm.assume calls early enough to recognize that it was dead.</p></li>
<li><p dir="auto">Anything passed to the predicate function will by default look like it<br>
escapes. This is particularly true if the predicate takes local<br>
variables by references, which is the easiest and most straightforwardly<br>
correct way for frontends to emit these predicates. So this will block<br>
basic memory analyses (including mem2reg!) unless they’re taught to<br>
remove or rewrite assumptions.</p></li>
</ul>
<p dir="auto">Unfortunately, I don’t have a great counter-proposal that isn’t a<br>
major project.</p>
<p dir="auto">(The major project is to make the predicates sub-functions within<br>
the caller. This doesn’t fix all the problems, and sub-functions<br>
introduce a host of new issues, but they do have the benefit of<br>
making the analysis much more obviously intra-procedural.)</p>
<p dir="auto">John.</p>
</div>
</div>
</body>
</html>