[llvm-dev] [RFC] Adding range metadata to array subscripts.

Tue Mar 30 10:51:34 PDT 2021

On 3/30/21 12:25 PM, Florian Hahn wrote:
>
>> On Mar 27, 2021, at 20:37, Johannes Doerfert <johannesdoerfert at gmail.com> wrote:
>> On 3/27/21 1:30 PM, Florian Hahn wrote:
>>>> On Mar 24, 2021, at 19:32, Johannes Doerfert <johannesdoerfert at gmail.com> wrote:
>>>> On 3/24/21 12:47 PM, Florian Hahn wrote:
>>>>>> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>>>>>> We need to get rid of that assertion. There are other non-attributes
>>>>>> to be used in assume operand bundles in the (near) future, so the this
>>>>>> work has to be done anyway.
>>>>> +1 on trying to use assume, rather than adding another way.
>>>>>
>>>>> But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?
>>>> It would make users explicit and we will have non-attribute bundles anyway.
>>>> I find it also "conceptually nicer", would you prefer explicit instructions?
>>> One disadvantage of using a bundle (or !range metadata) is that we treat ranges for certain values in a special way and differently to how we treat range information expressed by the user e.g. via conditions (or builtin assume).
>> I don't think this is necessarily accurate. We can, and already do (https://godbolt.org/z/MaMEb1Koo <https://godbolt.org/z/MaMEb1Koo>),
>> generate bundles from conditions. If we can interpret a condition, why could we not rewrite it into a bundle?
>> I'm not sure why this is any different form other normalization we do. (And bundles have benefits over implicit
>> instruction encodings, for example use tracking and #instructions.)
>>
> I’m not arguing that it is not possible to do everything with assume bundles. I am saying that we end up with at least 2 ways to encode the same information, so we need to handle 2 parallel encodings (i.e. we always have to handle conditions from the program control flow, which is represented via instructions)
>
> I think the !nonnull example you shared is illustrates the extra work passes will have to do. For example, a couple of passes know how to generically handle information from assumes, exactly because the conditions used for the assumes are not special and they already have to handle the same conditions for branches. If we instead convert the condition to a special bundle, all those passes will need updating to properly interpret !nonnull (and future bundles). Examples include SCCP, NewGVN, parts of SCEV.
>
> FTR I think assume bundles are great to express interesting properties!
>
> I am just trying to highlight some potential drawbacks when it comes to ranges or other properties we can express directly in LLVM IR already. I am sure it would be possible to add some extra abstraction to make it easier to update the relevant passes, it’s just a cost to consider.

You are right,t sometimes we will need more code to "also" handle the 
bundles, especially when the same conditions can occur in regular code 
as well.
My point was that we probably want a canonical assumption representation 
and bundles generally have more benefits over explicit encodings. This might
require us to teach passes a new encoding but we will then use the new 
encoding for all assumptions of a certain kind and start generating 
those right
away wherever possible.

>
>>>   This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handled, which in turn can lead to surprising results (of the form: why does a transformation apply if information provided in a certain way, but does not apply of the equivalent info is provided in a different way).
>>> Using instruction potentially also allows us to specify more complex ranges, in relation to other values.
>> I don't see how bundles would restrict us in any way. I mean, if we want to express property XYZ for %v and %q,
>> `llvm.assume(i1 true) ["XYZ"(%v, %q)]`, makes it really easy and it is arguably as generic as you want it to be.
>>
> I agree that it is possible to encode more interesting properties with assume bundles, but wouldn’t we end up duplicating all existing compare predicates for example? And for something like %x + %y < %x we would either fall back to instructions again or come up with a way to encode that in bundles as well. If we still use IR instructions for more complex expressions, we’d still need a way to exclude the ‘assume-only’ uses.

For "common assumption" I would strongly suggest "known bundles", e.g., 
for frequent kinds of inequalities maybe.
For "complex assumptions" I would prefer we do outlined assumptions to 
deal with the uses problem while also *gaining*
expressiveness. What I mean was described in the email [0] under design 
idea 2) and came up a few times since.
In addition to complex instruction based encodings of assumptions it 
allows us to deal with calls and side-effects properly.
I would use such an encoding and teach the Attributor to transfer 
knowledge about the arguments from the assumption to the
outlined assumption function and back as new "known bundles". So we 
normalize and specialize in the outlined function and
transfer back what we know other passes can actually digest.

~ Johannes

[0] https://lists.llvm.org/pipermail/llvm-dev/2019-December/137632.html

>
>>> But I realize that there are some practical consideration that make the instruction approach less appealing and I am all in favor of the more pragmatic & practical solution to start with.
>> I think bundles, and more generic assumptions, are what we need in the future. I still believe we should use them
>> to encode information in assertions [0], among other things, without running into the risk of having side-effects
>> that influence the compilation.
> Again, I am not saying we shouldn’t, just that there are some potential drawbacks in some cases.
>
> Cheers,
> Florian