<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><br class=""><div><br class=""><blockquote type="cite" class=""><div class="">On Mar 24, 2021, at 19:32, Johannes Doerfert <<a href="mailto:johannesdoerfert@gmail.com" class="">johannesdoerfert@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><meta charset="UTF-8" class=""><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">On 3/24/21 12:47 PM, Florian Hahn wrote:</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><blockquote type="cite" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""><blockquote type="cite" class="">On Mar 24, 2021, at 15:16, Johannes Doerfert via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org" class="">llvm-dev@lists.llvm.org</a>> wrote:<br class=""><br class=""><br class="">On 3/24/21 9:06 AM, Clement Courbet wrote:<br class=""><blockquote type="cite" class="">On Wed, Mar 24, 2021 at 2:20 PM Johannes Doerfert <<br class=""><a href="mailto:johannesdoerfert@gmail.com" class="">johannesdoerfert@gmail.com</a>> wrote:<br class=""><br class=""><blockquote type="cite" class="">I really like encoding more (range) information in the IR,<br class="">more thoughts inlined.<br class=""><br class="">On 3/24/21 4:14 AM, Clement Courbet via llvm-dev wrote:<br class=""><blockquote type="cite" class="">Hi everyone,<br class=""><br class="">tl;dr: I would like to teach clang to output range metadata so that LLVM<br class="">can do better alias analysis. I have a proposal as D99248<br class=""><<a href="https://reviews.llvm.org/D99248" class="">https://reviews.llvm.org/D99248</a>> (clang part) and D99247<br class=""><<a href="https://reviews.llvm.org/D99247" class="">https://reviews.llvm.org/D99247</a>> (llvm part). But there are other<br class=""></blockquote>possible<br class=""><blockquote type="cite" class="">options that I'm detailing below.<br class=""><br class="">Consider the following code, adapted from brotli<br class=""><<a href="https://en.wikipedia.org/wiki/Brotli" class="">https://en.wikipedia.org/wiki/Brotli</a>>:<br class=""><br class="">```<br class=""><br class="">struct Histogram {<br class=""><br class="">   int values[256];<br class=""><br class="">   int total;<br class=""><br class="">};<br class=""><br class="">Histogram DoIt(const int* image, int size) {<br class=""><br class="">   Histogram histogram;<br class=""><br class="">   for (int i = 0; i < size; ++i) {<br class=""><br class="">     ++histogram.values[image[i]];  // (A)<br class=""><br class="">     ++histogram.total;             // (B)<br class=""><br class="">   }<br class=""><br class="">   return histogram;<br class=""><br class="">}<br class="">```<br class=""><br class="">In this code, the compiler does not know anything about the values of<br class="">images[i], so it assumes that 256 is a possible value for it. In that<br class=""></blockquote>case,<br class=""><blockquote type="cite" class="">(A) would change the value of histogram.total, so (B) has to load, add<br class=""></blockquote>one<br class=""><blockquote type="cite" class="">and store [godbolt <<a href="https://godbolt.org/z/KxE343" class="">https://godbolt.org/z/KxE343</a>>].<br class=""><br class="">Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use<br class="">values to form a pointer to total and dereference it. What valid C/C++<br class=""></blockquote>code<br class=""><blockquote type="cite" class="">is allowed to do with values is:<br class="">  - Form any pointer in [values, values + 256].<br class="">  - Form and dereference any pointer in [values, values + 256)<br class=""><br class="">Note that the LLVM memory model is much laxer than that of C/C++. It has<br class=""></blockquote>no<br class=""><blockquote type="cite" class="">notion of types. In particular, given an LLVM aggregate definition:<br class=""><br class="">```<br class="">%struct.S = type { [42 x i32], i32, i32 }<br class="">```<br class=""><br class="">It is perfectly valid to use an address derived from a GEP(0,0,%i)  [gep<br class="">reference] representing indexing into the [42 x i32] array to load the<br class=""></blockquote>i32<br class=""><blockquote type="cite" class="">member at index 2. It is also valid for %i to be 43 (though not 44 if an<br class="">inbound GEP is used).<br class="">So clang has to give LLVM more information about the C/C++ rules.<br class=""><br class="">*IR representation:*<br class="">LLVM has several ways of representing ranges of values:<br class="">  - *!range* metadata can be attached to integer call and load<br class=""></blockquote>instructions<br class=""><blockquote type="cite" class="">to indicate the allowed range of values of the result. LLVM's<br class=""></blockquote>ValueTracking<br class=""><blockquote type="cite" class="">provides a function for querying the range for any llvm::Variable.<br class="">  - The *llvm.assume* intrinsic takes a boolean condition that can also<br class=""></blockquote>be<br class=""><blockquote type="cite" class="">used by ValueTracking to infer range of values.<br class="">  - The *inrange* attribute of GEP can be used to indicate C-like<br class=""></blockquote>semantics<br class=""><blockquote type="cite" class="">for the structure field marked with the inrange attribute. It can only be<br class="">used for GEP constantexprs (ie.e. GEPs defined inline), but not for<br class="">standalone GEPs defining instructions.  relevant discussion<br class=""><<a href="https://reviews.llvm.org/D22793?id=65626#inline-194653" class="">https://reviews.llvm.org/D22793?id=65626#inline-194653</a>>.<br class=""><br class="">Alternatives:<br class="">*(1) *Annotate each array subscript index value with a range, e.g.:<br class="">```<br class="">%i = i64 …<br class="">%ri =  call i64 @llvm.annotation.i64(%index), !range !0<br class="">%gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,<br class=""></blockquote>i32<br class=""><blockquote type="cite" class="">%ri<br class="">...<br class="">!0 = !{i64 0, i64 42}<br class="">```<br class="">*(2) *(variant of 1) relax the constraint that !range metadata can only<br class=""></blockquote>be<br class=""><blockquote type="cite" class="">set on call and load instructions, and set the !range metadata on the<br class=""></blockquote>index<br class=""><blockquote type="cite" class="">expression. We still need annotations for function parameters though:<br class="">```<br class="">%i = i64 … , !range !0<br class="">%gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,<br class=""></blockquote>i32<br class=""><blockquote type="cite" class="">%i<br class="">...<br class="">!0 = !{i64 0, i64 42}<br class="">```<br class="">This is slightly more compact.<br class=""><br class="">*(3)* Same as (1), with llvm.assume. This feels inferior to annotations.<br class="">*(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this<br class=""></blockquote>will<br class=""><blockquote type="cite" class="">interfere with optimizations.<br class=""></blockquote>I would very much like not to introduce another way to encode<br class="">assumptions other than `llvm.assume`. If you want to avoid the extra<br class="">instructions, use `llvm.assume(i1 true) ["range"(%val, %lb, %ub)]`,<br class="">which is in line with our move towards operand bundle use.<br class=""><br class=""></blockquote>Thanks, I did not know about that. I've just tried it but it appears that<br class="">tags have to be attribute names, and `!range` is not a valid attribute,<br class="">it's a metadata node. Is there a way to encode this ?<br class=""></blockquote>We need to get rid of that assertion. There are other non-attributes<br class="">to be used in assume operand bundles in the (near) future, so the this<br class="">work has to be done anyway.<br class=""></blockquote><br class="">+1 on trying to use assume, rather than adding another way.<br class=""><br class="">But are value ranges special for assumes, so that we need to handle them in a bundle? Is that just so we can easier skip ‘artificial’ assume users?<br class=""></blockquote><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">It would make users explicit and we will have non-attribute bundles anyway.</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><span style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;" class="">I find it also "conceptually nicer", would you prefer explicit instructions?</span><br style="caret-color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""></div></blockquote></div><br class=""><div class="">One disadvantage of using a bundle (or !range metadata) is that we treat ranges for certain values in a special way and differently to how we treat range information expressed by the user e.g. via conditions (or builtin assume). This means we have to handle multiple variants across the codebase, which can lead to situations where only one or the other is handled, which in turn can lead to surprising results (of the form: why does a transformation apply if information provided in a certain way, but does not apply of the equivalent info is provided in a different way).</div><div class="">Using instruction potentially also allows us to specify more complex ranges, in relation to other values.</div><div class=""><br class=""></div><div class="">But I realize that there are some practical consideration that make the instruction approach less appealing and I am all in favor of the more pragmatic & practical solution to start with.</div><div class=""><br class=""></div><div class="">Cheers,</div><div class="">Florian</div></body></html>