<div dir="ltr"><div>To summarize here is how each proposal would look like, along with a list of pros and cons for each.</div><div><br></div><div>%struct.S = type { i32, [2 x i32], i32 }<br></div><div><div><br></div><div><b>inrange</b></div>define void @with_inrange(%struct.S* %s, i32 %in_array) {<br>  %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, inrange i32 %in_array<br>  %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0<br>  ret void<br>}<br></div><div><br></div><div>(Note that this is not currently valid IR, we need to make `inrange` work on standalone GEP instructions)</div><div>pros:</div><div> - compact, no extra instructions</div><div>cons:</div><div> - The range information is local to the GEP instruction, and cannot be easily used for other purposes, e.g. SCEV (the ValueTracker is not aware of it)</div><div><br></div><div><b>annotation with range metadata</b></div><div>define void @with_annotation(%struct.S* %s, i32 %index) {<br>  %in_array = call i32 @llvm.annotation.i32(%index), !range !0<br>  %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array<br>  %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0<br>  ret void<br>}</div><div><br></div><div>pros:</div><div> - a single added instruction</div><div> - the ValueTracker already knows about range metadata</div><div> - deriving the range in ValueTracker involves a single metadata lookup.</div><div> - The range information will be available for other purposes (e.g. SCEV).</div><div>cons:</div><div> - one extra instruction</div><div> - llvm.annotation is not widely used</div><div><br><br><b>assume</b><br>define void @with_assume(%struct.S* %s, i32 %in_array) {<br>  %cmp1 = icmp sge i32 %in_array, 0<br>  %cmp2 = icmp slt i32 %in_array, 2<br>  %cmp = and i1 %cmp1, %cmp2<br>  call void @llvm.assume(i1 %cmp)<br>  %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array<br>  %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0<br>  ret void<br>}</div><div><br></div><div><div>pros: </div><div> - The range information will be available for other purposes (e.g. SCEV).</div><div></div><div></div><div>cons:</div><div> - several extra instructions<br></div><div> - the ValueTracker cannot currently derive a range from this</div><div> - Deriving the range <a href="https://github.com/llvm/llvm-project/blob/dd388ba3e0b0a5f06565d0bcb6e1aebb5daac065/llvm/lib/Analysis/ValueTracking.cpp#L638">can be costly</a><br></div><div></div><div></div><div><br></div><div><b>assume with bundle</b><br>define void @with_assume_bundle(%struct.S* %s, i32 %in_array) {<br>  call void @llvm.assume(i1 true) ["range"(i32 %in_array, i32 0, i32 2)]<br>  %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 1, i32 %in_array<br>  %gep2 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0<br>  ret void<br>}</div></div><div><div><div><br></div><div>(Note that this is not currently valid IR, we need to make bundle tags work for non-attributes)</div></div><div><div>pros: </div><div> - The range information will be available for other purposes (e.g. SCEV).</div><div> - ? (see below)</div><div></div><div></div><div>cons:</div><div> - one extra instruction</div></div><div> - ? (see below)</div><div><br></div><div>@johannes For the last option, I'm also not sure what becomes of the "range" tag in the assume bundle once we remove the assertion ? Does it become attached to the value (%in_array), in which case we have the advantages of range metadata (option 2), or is this still attached to the assume, with the disadvantages of option 3 ?</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 24, 2021 at 8:32 PM Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com">johannesdoerfert@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><br>

On 3/24/21 12:47 PM, Florian Hahn wrote:<br>

><br>

>> On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>> wrote:<br>

>><br>

>><br>

>> On 3/24/21 9:06 AM, Clement Courbet wrote:<br>

>>> On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <<br>

>>> <a href="mailto:johannesdoerfert@gmail.com" target="_blank">johannesdoerfert@gmail.com</a>> wrote:<br>

>>><br>

>>>> I really like encoding more (range) information in the IR,<br>

>>>> more thoughts inlined.<br>

>>>><br>

>>>> On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:<br>

>>>>> Hi everyone,<br>

>>>>><br>

>>>>> tl;dr: I would like to teach clang to output range metadata so that LLVM<br>

>>>>> can do better alias analysis. I have a proposal as D99248<br>

>>>>> <<a href="https://reviews.llvm.org/D99248" rel="noreferrer" target="_blank">https://reviews.llvm.org/D99248</a>> (clang part) and D99247<br>

>>>>> <<a href="https://reviews.llvm.org/D99247" rel="noreferrer" target="_blank">https://reviews.llvm.org/D99247</a>> (llvm part). But there are other<br>

>>>> possible<br>

>>>>> options that I'm detailing below.<br>

>>>>><br>

>>>>> Consider the following code, adapted from brotli<br>

>>>>> <<a href="https://en.wikipedia.org/wiki/Brotli" rel="noreferrer" target="_blank">https://en.wikipedia.org/wiki/Brotli</a>>:<br>

>>>>><br>

>>>>> ```<br>

>>>>><br>

>>>>> struct Histogram {<br>

>>>>><br>

>>>>>     int values[256];<br>

>>>>><br>

>>>>>     int total;<br>

>>>>><br>

>>>>> };<br>

>>>>><br>

>>>>> Histogram DoIt(const int* image, int size) {<br>

>>>>><br>

>>>>>     Histogram histogram;<br>

>>>>><br>

>>>>>     for (int i = 0; i < size; ++i) {<br>

>>>>><br>

>>>>>       ++histogram.values[image[i]];  // (A)<br>

>>>>><br>

>>>>>       ++histogram.total;             // (B)<br>

>>>>><br>

>>>>>     }<br>

>>>>><br>

>>>>>     return histogram;<br>

>>>>><br>

>>>>> }<br>

>>>>> ```<br>

>>>>><br>

>>>>> In this code, the compiler does not know anything about the values of<br>

>>>>> images[i], so it assumes that 256 is a possible value for it. In that<br>

>>>> case,<br>

>>>>> (A) would change the value of histogram.total, so (B) has to load, add<br>

>>>> one<br>

>>>>> and store [godbolt <<a href="https://godbolt.org/z/KxE343" rel="noreferrer" target="_blank">https://godbolt.org/z/KxE343</a>>].<br>

>>>>><br>

>>>>> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use<br>

>>>>> values to form a pointer to total and dereference it. What valid C/C++<br>

>>>> code<br>

>>>>> is allowed to do with values is:<br>

>>>>>    - Form any pointer in [values, values + 256].<br>

>>>>>    - Form and dereference any pointer in [values, values + 256)<br>

>>>>><br>

>>>>> Note that the LLVM memory model is much laxer than that of C/C++. It has<br>

>>>> no<br>

>>>>> notion of types. In particular, given an LLVM aggregate definition:<br>

>>>>><br>

>>>>> ```<br>

>>>>> %struct.S = type { [42 x i32], i32, i32 }<br>

>>>>> ```<br>

>>>>><br>

>>>>> It is perfectly valid to use an address derived from a GEP(0,0,%i)  [gep<br>

>>>>> reference] representing indexing into the [42 x i32] array to load the<br>

>>>> i32<br>

>>>>> member at index 2. It is also valid for %i to be 43 (though not 44 if an<br>

>>>>> inbound GEP is used).<br>

>>>>> So clang has to give LLVM more information about the C/C++ rules.<br>

>>>>><br>

>>>>> *IR representation:*<br>

>>>>> LLVM has several ways of representing ranges of values:<br>

>>>>>    - *!range* metadata can be attached to integer call and load<br>

>>>> instructions<br>

>>>>> to indicate the allowed range of values of the result. LLVM's<br>

>>>> ValueTracking<br>

>>>>> provides a function for querying the range for any llvm::Variable.<br>

>>>>>    - The *llvm.assume* intrinsic takes a boolean condition that can also<br>

>>>> be<br>

>>>>> used by ValueTracking to infer range of values.<br>

>>>>>    - The *inrange* attribute of GEP can be used to indicate C-like<br>

>>>> semantics<br>

>>>>> for the structure field marked with the inrange attribute. It can only be<br>

>>>>> used for GEP constantexprs (ie.e. GEPs defined inline), but not for<br>

>>>>> standalone GEPs defining instructions.  relevant discussion<br>

>>>>> <<a href="https://reviews.llvm.org/D22793?id=65626#inline-194653" rel="noreferrer" target="_blank">https://reviews.llvm.org/D22793?id=65626#inline-194653</a>>.<br>

>>>>><br>

>>>>> Alternatives:<br>

>>>>> *(1) *Annotate each array subscript index value with a range, e.g.:<br>

>>>>> ```<br>

>>>>> %i = i64 …<br>

>>>>> %ri =  call i64 @llvm.annotation.i64(%index), !range !0<br>

>>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,<br>

>>>> i32<br>

>>>>> %ri<br>

>>>>> ...<br>

>>>>> !0 = !{i64 0, i64 42}<br>

>>>>> ```<br>

>>>>> *(2) *(variant of 1) relax the constraint that !range metadata can only<br>

>>>> be<br>

>>>>> set on call and load instructions, and set the !range metadata on the<br>

>>>> index<br>

>>>>> expression. We still need annotations for function parameters though:<br>

>>>>> ```<br>

>>>>> %i = i64 … , !range !0<br>

>>>>> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,<br>

>>>> i32<br>

>>>>> %i<br>

>>>>> ...<br>

>>>>> !0 = !{i64 0, i64 42}<br>

>>>>> ```<br>

>>>>> This is slightly more compact.<br>

>>>>><br>

>>>>> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations.<br>

>>>>> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this<br>

>>>> will<br>

>>>>> interfere with optimizations.<br>

>>>> I would very much like not to introduce another way to encode<br>

>>>> assumptions other than `llvm.assume`. If you want to avoid the extra<br>

>>>> instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`,<br>

>>>> which is in line with our move towards operand bundle use.<br>

>>>><br>

>>> Thanks, I did not know about that. I've just tried it but it appears that<br>

>>> tags have to be attribute names, and `!range` is not a valid attribute,<br>

>>> it's a metadata node. Is there a way to encode this ?<br>

>> We need to get rid of that assertion. There are other non-attributes<br>

>> to be used in assume operand bundles in the (near) future, so the this<br>

>> work has to be done anyway.<br>

><br>

> +1 on trying to use assume, rather than adding another way.<br>

><br>

> But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?<br>

<br>

It would make users explicit and we will have non-attribute bundles anyway.<br>

I find it also "conceptually nicer", would you prefer explicit instructions?<br>

<br>

~ Johannes<br>

<br>

<br>

> Cheers,<br>

> Florian<br>

</blockquote></div></div>