[llvm-dev] [RFC] How to manifest information in LLVM-IR, or, revisiting llvm.assume

Mon Dec 16 15:16:44 PST 2019

Abstract:

It is often hard or impossible to encode complex, e.g., non-boolean,
information in an `llvm.assume(i1)`. This RFC describes various problems
we have right now and provides alternative design ideas.

Some Existing Problems:

A) The boolean requirement.
  The current `llvm.assume(i1)` expects a boolean that is known to hold
  true at runtime (once the `llvm.assume` call is reached). However,
  forming this boolean for "arbitrary" information is hard or even
  impossible. Alignment, which is an easy case, already requires 3 extra
  instructions, one of which is a `ptrtoint` and therefore not really
  optimizer friendly. Dereferenceability, is even scarier. Thus, we are
  currently limited to (almost) boolean information when we want to
  manifest information in the IR (which happens due to inlining or code
  motion, see https://reviews.llvm.org/D69477 for an example).

B) The one-use checks.
  Various pattern matching passes check the number of uses of a value.
  However, while `llvm.assume` is eventually eliminated by the backend
  it will still increase the use count of the operand. I doubt we are
  able to not increase the use count at all (and keep everything else
  working), but what we can do is make sure the uses in "assumptions"
  are easy to spot, thus filter. This is not the case right now because
  of the additional instructions we need to make the values boolean.
  Even if you have `__builtin_assume(ptr);` the `ptr` use will not be
  the `llvm.assume` call but a `icmp`.

C) The side-effect avoidance.
  `__assume`, `__builtin_assume`, `__builtin_assume_aligned`, and OpenMP
  `omp assume` are all defined to not evaluate their argument, thus to
  not cause the side effects that the evaluation of the argument would
  otherwise imply. The way we implement this restriction is by not
  emitting the argument expression in IR if it might cause a side
  effect. We warn the user if that happens. While this is generally
  speaking fine, it would be interesting to lift the *implementation*
  restriction. One benefit would be that we could implement `assert`
  through `__builtin_assume` properly.

D) The singleton ranges.
  An `llvm.assume` will only provide information for a single program
  point not a range. Even if the beginning and the end of a range have
  an `llvm.assume`, there are cases where the information will not be
  as good as a proper range assumption. OpenMP 5.1 introduces such
  range assumptions but other situations would benefit from them as
  well. Take for example function attributes and inlining. Since we know
  they hold for the entire function and not only when it is entered we
  could encode the information over the entire range of the inlined
  code.

Some Site Notes:

- It seems of little use that we interleave the code for the assumed
  expression with the user code. Having the `llvm.assume` allows us to
  tie information to a program point, beyond that we just clutter the
  function with instructions that we later remove anyway.

- Reconstructing information from the pattern of instructions that feed
  into the `llvm.assume` is also not optimal, especially since we do
  not need to "optimize" these instructions anyway.

- The current (=arbitrary) side-effects of `llvm.assume` are too strong.
  We have `inaccessiblemem_or_argmemonly` and we might be able to come
  up with something even more specialized for this, e.g.,
  `control_dependences_only` to indicate that there are only control
  dependences.
- 

Some Design Ideas:

1) Use named operand bundles to encode information.
   If we want to encode property XYZ for a value %V holds at a certain
   program point and the property is dependent on %N we could encode
   that as:
     `llvm.assume() ["XYZ"(%V, %N)]`
   There are various benefits, including:
     - It is freely extensible.
     - The property is directly tied to the value. Thus, no need for
       encoding patterns that introduce extra instructions and uses and
       which we need to preserve and decode later.
     - Allows dynamic properties even when corresponding attributes do
       not, e.g., `llvm.assume() ["align"(%arg_ptr, %N)]` is fine and
       once `%N` becomes a constant, or we determine a lower bound, we
       can introduce the `align(C)` attribute for `%arg_ptr`.

2) Outline assumption expression code (with side-effects).
  If we potentially have side-effects, or we simply have a non-trivial
  expression that requires to be lowered into instructions, we can
  outline the assumption expression code and tie it to the
  `llvm.assume` via another operand bundle property. It could look
  something like this:
    `__builtin_assume(foo(i) == bar(j));`
  will cause us to generate
    ```
    /// Must return true!
    static bool llvm.assume.expression_#27(int i, int j) {
      return foo(i) == bar(j);
    }
    ```
  and a `llvm.assume` call like this:
    `llvm.assume() ["assume_fn"(@llvm.assume.expression_#27, %i, %j))]
  So we generate the expression in a new function which we (only) tie to
  the `llvm.assume()` through the "assume_fn" operand bundle. This will
  make sure we do not accidentally evaluate the code, or assume it is
  evaluated and produced side-effects. We can still optimize the code
  and use the information that we learn from it at the `llvm.assume`
  site though.

 3) Use tokens to mark ranges.
   We have tokens which can be used to tie two instructions together,
   basically forming a range (with some conditions on the initial CFG).
   If we tie two `llvm.assume` calls together we can say that the
   information provided by the first holds for any point dominated by it
   and post-dominated by the second.

I tried to be brief so I hope I could still get some ideas across
without too much confusion. In any case, please let me know what you
think!

Thanks,
  Johannes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 228 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20191216/1e43f565/attachment.sig>