[llvm-dev] [RFC] Adding range metadata to array subscripts.

Thu Mar 25 09:30:32 PDT 2021

Hi,

there might be some overlap with this proposal:
https://lists.llvm.org/pipermail/llvm-dev/2019-July/134063.html

Michael

Am Do., 25. März 2021 um 08:53 Uhr schrieb Clement Courbet via
llvm-dev <llvm-dev at lists.llvm.org>:
>
> To summarize here is how each proposal would look like, along with a list of pros and cons for each.
>
> %struct.S = type { i32, [2 x i32], i32 }
>
> inrange
> define void @with_inrange(%struct.S* %s, i32 %in_array) {
>   %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, inrange i32 %in_array
>   %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
>   ret void
> }
>
> (Note that this is not currently valid IR, we need to make `inrange` work on standalone GEP instructions)
> pros:
>  - compact, no extra instructions
> cons:
>  - The range information is local to the GEP instruction, and cannot be easily used for other purposes, e.g. SCEV (the ValueTracker is not aware of it)
>
> annotation with range metadata
> define void @with_annotation(%struct.S* %s, i32 %index) {
>   %in_array = call i32 @llvm.annotation.i32(%index), !range !0
>   %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array
>   %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
>   ret void
> }
>
> pros:
>  - a single added instruction
>  - the ValueTracker already knows about range metadata
>  - deriving the range in ValueTracker involves a single metadata lookup.
>  - The range information will be available for other purposes (e.g. SCEV).
> cons:
>  - one extra instruction
>  - llvm.annotation is not widely used
>
>
> assume
> define void @with_assume(%struct.S* %s, i32 %in_array) {
>   %cmp1 = icmp sge i32 %in_array, 0
>   %cmp2 = icmp slt i32 %in_array, 2
>   %cmp = and i1 %cmp1, %cmp2
>   call void @llvm.assume(i1 %cmp)
>   %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array
>   %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
>   ret void
> }
>
> pros:
>  - The range information will be available for other purposes (e.g. SCEV).
> cons:
>  - several extra instructions
>  - the ValueTracker cannot currently derive a range from this
>  - Deriving the range can be costly
>
> assume with bundle
> define void @with_assume_bundle(%struct.S* %s, i32 %in_array) {
>   call void @llvm.assume(i1 true) ["range"(i32 %in_array, i32 0, i32 2)]
>   %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array
>   %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0
>   ret void
> }
>
> (Note that this is not currently valid IR, we need to make bundle tags work for non-attributes)
> pros:
>  - The range information will be available for other purposes (e.g. SCEV).
>  - ? (see below)
> cons:
>  - one extra instruction
>  - ? (see below)
>
> @johannes For the last option, I'm also not sure what becomes of the "range" tag in the assume bundle once we remove the assertion ? Does it become attached to the value (%in_array), in which case we have the advantages of range metadata (option 2), or is this still attached to the assume, with the disadvantages of option 3 ?
>
> On Wed, Mar 24, 2021 at 8:32 PM Johannes Doerfert <johannesdoerfert at gmail.com> wrote:
>>
>>
>> On 3/24/21 12:47 PM, Florian Hahn wrote:
>> >
>> >> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> >>
>> >>
>> >> On 3/24/21 9:06 AM, Clement Courbet wrote:
>> >>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <
>> >>> johannesdoerfert at gmail.com> wrote:
>> >>>
>> >>>> I really like encoding more (range) information in the IR,
>> >>>> more thoughts inlined.
>> >>>>
>> >>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:
>> >>>>> Hi everyone,
>> >>>>>
>> >>>>> tl;dr: I would like to teach clang to output range metadata so that LLVM
>> >>>>> can do better alias analysis. I have a proposal as D99248
>> >>>>> <https://reviews.llvm.org/D99248> (clang part) and D99247
>> >>>>> <https://reviews.llvm.org/D99247> (llvm part). But there are other
>> >>>> possible
>> >>>>> options that I'm detailing below.
>> >>>>>
>> >>>>> Consider the following code, adapted from brotli
>> >>>>> <https://en.wikipedia.org/wiki/Brotli>:
>> >>>>>
>> >>>>> ```
>> >>>>>
>> >>>>> struct Histogram {
>> >>>>>
>> >>>>>     int values[256];
>> >>>>>
>> >>>>>     int total;
>> >>>>>
>> >>>>> };
>> >>>>>
>> >>>>> Histogram DoIt(const int* image, int size) {
>> >>>>>
>> >>>>>     Histogram histogram;
>> >>>>>
>> >>>>>     for (int i = 0; i < size; ++i) {
>> >>>>>
>> >>>>>       ++histogram.values[image[i]];  // (A)
>> >>>>>
>> >>>>>       ++histogram.total;             // (B)
>> >>>>>
>> >>>>>     }
>> >>>>>
>> >>>>>     return histogram;
>> >>>>>
>> >>>>> }
>> >>>>> ```
>> >>>>>
>> >>>>> In this code, the compiler does not know anything about the values of
>> >>>>> images[i], so it assumes that 256 is a possible value for it. In that
>> >>>> case,
>> >>>>> (A) would change the value of histogram.total, so (B) has to load, add
>> >>>> one
>> >>>>> and store [godbolt <https://godbolt.org/z/KxE343>].
>> >>>>>
>> >>>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use
>> >>>>> values to form a pointer to total and dereference it. What valid C/C++
>> >>>> code
>> >>>>> is allowed to do with values is:
>> >>>>>    - Form any pointer in [values, values + 256].
>> >>>>>    - Form and dereference any pointer in [values, values + 256)
>> >>>>>
>> >>>>> Note that the LLVM memory model is much laxer than that of C/C++. It has
>> >>>> no
>> >>>>> notion of types. In particular, given an LLVM aggregate definition:
>> >>>>>
>> >>>>> ```
>> >>>>> %struct.S = type { [42 x i32], i32, i32 }
>> >>>>> ```
>> >>>>>
>> >>>>> It is perfectly valid to use an address derived from a GEP(0,0,%i)  [gep
>> >>>>> reference] representing indexing into the [42 x i32] array to load the
>> >>>> i32
>> >>>>> member at index 2. It is also valid for %i to be 43 (though not 44 if an
>> >>>>> inbound GEP is used).
>> >>>>> So clang has to give LLVM more information about the C/C++ rules.
>> >>>>>
>> >>>>> *IR representation:*
>> >>>>> LLVM has several ways of representing ranges of values:
>> >>>>>    - *!range* metadata can be attached to integer call and load
>> >>>> instructions
>> >>>>> to indicate the allowed range of values of the result. LLVM's
>> >>>> ValueTracking
>> >>>>> provides a function for querying the range for any llvm::Variable.
>> >>>>>    - The *llvm.assume* intrinsic takes a boolean condition that can also
>> >>>> be
>> >>>>> used by ValueTracking to infer range of values.
>> >>>>>    - The *inrange* attribute of GEP can be used to indicate C-like
>> >>>> semantics
>> >>>>> for the structure field marked with the inrange attribute. It can only be
>> >>>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for
>> >>>>> standalone GEPs defining instructions.  relevant discussion
>> >>>>> <https://reviews.llvm.org/D22793?id=65626#inline-194653>.
>> >>>>>
>> >>>>> Alternatives:
>> >>>>> *(1) *Annotate each array subscript index value with a range, e.g.:
>> >>>>> ```
>> >>>>> %i = i64 …
>> >>>>> %ri =  call i64 @llvm.annotation.i64(%index), !range !0
>> >>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,
>> >>>> i32
>> >>>>> %ri
>> >>>>> ...
>> >>>>> !0 = !{i64 0, i64 42}
>> >>>>> ```
>> >>>>> *(2) *(variant of 1) relax the constraint that !range metadata can only
>> >>>> be
>> >>>>> set on call and load instructions, and set the !range metadata on the
>> >>>> index
>> >>>>> expression. We still need annotations for function parameters though:
>> >>>>> ```
>> >>>>> %i = i64 … , !range !0
>> >>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,
>> >>>> i32
>> >>>>> %i
>> >>>>> ...
>> >>>>> !0 = !{i64 0, i64 42}
>> >>>>> ```
>> >>>>> This is slightly more compact.
>> >>>>>
>> >>>>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations.
>> >>>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this
>> >>>> will
>> >>>>> interfere with optimizations.
>> >>>> I would very much like not to introduce another way to encode
>> >>>> assumptions other than `llvm.assume`. If you want to avoid the extra
>> >>>> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`,
>> >>>> which is in line with our move towards operand bundle use.
>> >>>>
>> >>> Thanks, I did not know about that. I've just tried it but it appears that
>> >>> tags have to be attribute names, and `!range` is not a valid attribute,
>> >>> it's a metadata node. Is there a way to encode this ?
>> >> We need to get rid of that assertion. There are other non-attributes
>> >> to be used in assume operand bundles in the (near) future, so the this
>> >> work has to be done anyway.
>> >
>> > +1 on trying to use assume, rather than adding another way.
>> >
>> > But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?
>>
>> It would make users explicit and we will have non-attribute bundles anyway.
>> I find it also "conceptually nicer", would you prefer explicit instructions?
>>
>> ~ Johannes
>>
>>
>> > Cheers,
>> > Florian
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev