[llvm-dev] [RFC] Adding range metadata to array subscripts.

Clement Courbet via llvm-dev llvm-dev at lists.llvm.org
Thu Mar 25 07:42:14 PDT 2021


On Wed, Mar 24, 2021 at 12:00 PM Nuno Lopes <nunoplopes at sapo.pt> wrote:

> In theory, inrange should be perfect for this job. If inrange was extended
> to non-const geps, I think it would work for all cases you care about,
> right? It’s also the most compact representation, so I would prefer that
> solution. Better than adding a new call instruction for every single
> pointer arithmetic instruction.
>
> In terms of BasicAA, supporting inrange shouldn’t be any different from
> !range metadata I think.
>
>
>

So I tried implementing this on top of `inrange`, and I found out that you
can only have one `inrange` index, which doesn't allow us to represent
access into multidimensional arrays : https://reviews.llvm.org/D99341


> Just my 2c.
>
>
>
> Nuno
>
>
>
>
>
> *From:* Clement Courbet
> *Sent:* 24 March 2021 09:15
> *To:* llvm-dev <llvm-dev at lists.llvm.org>
> *Subject:* [llvm-dev] [RFC] Adding range metadata to array subscripts.
>
>
>
> Hi everyone,
>
>
>
> tl;dr: I would like to teach clang to output range metadata so that LLVM
> can do better alias analysis. I have a proposal as D99248
> <https://reviews.llvm.org/D99248> (clang part) and D99247
> <https://reviews.llvm.org/D99247> (llvm part). But there are other
> possible options that I'm detailing below.
>
>
>
> Consider the following code, adapted from brotli
> <https://en.wikipedia.org/wiki/Brotli>:
>
>
>
> ```
>
> struct Histogram {
>
>   int values[256];
>
>   int total;
>
> };
>
>
>
> Histogram DoIt(const int* image, int size) {
>
>   Histogram histogram;
>
>   for (int i = 0; i < size; ++i) {
>
>     ++histogram.values[image[i]];  // (A)
>
>     ++histogram.total;             // (B)
>
>   }
>
>   return histogram;
>
> }
>
> ```
>
>
>
> In this code, the compiler does not know anything about the values of
> images[i], so it assumes that 256 is a possible value for it. In that case,
> (A) would change the value of histogram.total, so (B) has to load, add one
> and store [godbolt <https://godbolt.org/z/KxE343>].
>
>
>
> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use
> values to form a pointer to total and dereference it. What valid C/C++ code
> is allowed to do with values is:
>
>  - Form any pointer in [values, values + 256].
>  - Form and dereference any pointer in [values, values + 256)
>
>
>
> Note that the LLVM memory model is much laxer than that of C/C++. It has
> no notion of types. In particular, given an LLVM aggregate definition:
>
> ```
> %struct.S = type { [42 x i32], i32, i32 }
> ```
>
> It is perfectly valid to use an address derived from a GEP(0,0,%i)  [gep
> reference] representing indexing into the [42 x i32] array to load the i32
> member at index 2. It is also valid for %i to be 43 (though not 44 if an
> inbound GEP is used).
>
> So clang has to give LLVM more information about the C/C++ rules.
>
>
>
> *IR representation:*
> LLVM has several ways of representing ranges of values:
>  - *!range* metadata can be attached to integer call and load
> instructions to indicate the allowed range of values of the result. LLVM's
> ValueTracking provides a function for querying the range for any
> llvm::Variable.
>  - The *llvm.assume* intrinsic takes a boolean condition that can also be
> used by ValueTracking to infer range of values.
>  - The *inrange* attribute of GEP can be used to indicate C-like
> semantics for the structure field marked with the inrange attribute. It can
> only be used for GEP constantexprs (ie.e. GEPs defined inline), but not for
> standalone GEPs defining instructions.  relevant discussion
> <https://reviews.llvm.org/D22793?id=65626#inline-194653>.
>
>
>
> Alternatives:
> *(1) *Annotate each array subscript index value with a range, e.g.:
>
> ```
> %i = i64 …
> %ri =  call i64 @llvm.annotation.i64(%index), !range !0
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32
> %ri
> ...
> !0 = !{i64 0, i64 42}
>
> ```
> *(2) *(variant of 1) relax the constraint that !range metadata can only
> be set on call and load instructions, and set the !range metadata on the
> index expression. We still need annotations for function parameters though:
>
> ```
> %i = i64 … , !range !0
> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0, i32
> %i
> ...
> !0 = !{i64 0, i64 42}
>
> ```
> This is slightly more compact.
>
> *(3)* Same as (1), with llvm.assume. This feels inferior to annotations.
> *(4)* Extend inrange to non-constantexprs GEPs. It is unclear how this
> will interfere with optimizations.
>
>
>
>
>
> *On the clang side*:
>
> The clang part is quite trivial as the infrastructure is already in place
> to emit dynamic ubsan guards: D99248 <https://reviews.llvm.org/D99248>
>
>
>
> *On the LLVM Side:*
>
> Alternatives:
> *(A)* - (annotation or assume options) Simply enable scev-aa which knows
> how to handle value ranges in general. IIUC it's not enabled in clang
> because it has issues with invalidation when code changes, and is therefore
> not cacheable. This makes it too slow to be practical.
> *(B) *- (annotation or assume options) Teach BasicAA to honor !range
> metadata (D99247 <https://reviews.llvm.org/D99247>)
> *(C)* - (inrange option) Teach BasicAA to honor inrange attributes of GEP.
>
>
>
> I was leaning towards (1) and (B) because:
>
>  - BasicAA already has basic support for value range analysis
> (zero/nonzero), this is a small and natural extension.
>
>  - The BasicAA improvement already benefits some existing code (as
> evidenced by the test changes in D99247 <https://reviews.llvm.org/D99247>)
>
>  - Using range metadata rather than the `inrange` attribute means that
> BasicAA will automatically benefit from improvements in value tracking in
> the future.
>
>
>
> Opinions ?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210325/75c657c3/attachment.html>


More information about the llvm-dev mailing list