[llvm-dev] [RFC] Adding range metadata to array subscripts.
Clement Courbet via llvm-dev
llvm-dev at lists.llvm.org
Thu Mar 25 09:23:05 PDT 2021
On Thu, Mar 25, 2021 at 3:46 PM Roman Lebedev <lebedev.ri at gmail.com> wrote:
> On Thu, Mar 25, 2021 at 5:42 PM Clement Courbet via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
> >
> >
> >
> > On Wed, Mar 24, 2021 at 12:00 PM Nuno Lopes <nunoplopes at sapo.pt> wrote:
> >>
> >> In theory, inrange should be perfect for this job. If inrange was
> extended to non-const geps, I think it would work for all cases you care
> about, right? It’s also the most compact representation, so I would prefer
> that solution. Better than adding a new call instruction for every single
> pointer arithmetic instruction.
> >>
> >> In terms of BasicAA, supporting inrange shouldn’t be any different from
> !range metadata I think.
> >>
> >>
> >
> >
> > So I tried implementing this on top of `inrange`, and I found out that
> you can only have one `inrange` index, which doesn't allow us to represent
> access into multidimensional arrays : https://reviews.llvm.org/D99341
> Are you aware of upcoming Opaque Pointers?
> There won't be any multi-dimensional GEP's after that.
>
I'm not familiar with that ? Does that mean that any N-dimensional GEP will
become a chain of N 1-dimensional GEPs ?
>
> >> Just my 2c.
> >>
> >>
> >>
> >> Nuno
> >>
> >>
> >>
> >>
> >>
> >> From: Clement Courbet
> >> Sent: 24 March 2021 09:15
> >> To: llvm-dev <llvm-dev at lists.llvm.org>
> >> Subject: [llvm-dev] [RFC] Adding range metadata to array subscripts.
> >>
> >>
> >>
> >> Hi everyone,
> >>
> >>
> >>
> >> tl;dr: I would like to teach clang to output range metadata so that
> LLVM can do better alias analysis. I have a proposal as D99248 (clang part)
> and D99247 (llvm part). But there are other possible options that I'm
> detailing below.
> >>
> >>
> >>
> >> Consider the following code, adapted from brotli:
> >>
> >>
> >>
> >> ```
> >>
> >> struct Histogram {
> >>
> >> int values[256];
> >>
> >> int total;
> >>
> >> };
> >>
> >>
> >>
> >> Histogram DoIt(const int* image, int size) {
> >>
> >> Histogram histogram;
> >>
> >> for (int i = 0; i < size; ++i) {
> >>
> >> ++histogram.values[image[i]]; // (A)
> >>
> >> ++histogram.total; // (B)
> >>
> >> }
> >>
> >> return histogram;
> >>
> >> }
> >>
> >> ```
> >>
> >>
> >>
> >> In this code, the compiler does not know anything about the values of
> images[i], so it assumes that 256 is a possible value for it. In that case,
> (A) would change the value of histogram.total, so (B) has to load, add one
> and store [godbolt].
> >>
> >>
> >>
> >> Fortunately, C/C++ has a rule that it is invalid (actually, UB) to use
> values to form a pointer to total and dereference it. What valid C/C++ code
> is allowed to do with values is:
> >>
> >> - Form any pointer in [values, values + 256].
> >> - Form and dereference any pointer in [values, values + 256)
> >>
> >>
> >>
> >> Note that the LLVM memory model is much laxer than that of C/C++. It
> has no notion of types. In particular, given an LLVM aggregate definition:
> >>
> >> ```
> >> %struct.S = type { [42 x i32], i32, i32 }
> >> ```
> >>
> >> It is perfectly valid to use an address derived from a GEP(0,0,%i)
> [gep reference] representing indexing into the [42 x i32] array to load the
> i32 member at index 2. It is also valid for %i to be 43 (though not 44 if
> an inbound GEP is used).
> >>
> >> So clang has to give LLVM more information about the C/C++ rules.
> >>
> >>
> >>
> >> IR representation:
> >> LLVM has several ways of representing ranges of values:
> >> - !range metadata can be attached to integer call and load
> instructions to indicate the allowed range of values of the result. LLVM's
> ValueTracking provides a function for querying the range for any
> llvm::Variable.
> >> - The llvm.assume intrinsic takes a boolean condition that can also be
> used by ValueTracking to infer range of values.
> >> - The inrange attribute of GEP can be used to indicate C-like
> semantics for the structure field marked with the inrange attribute. It can
> only be used for GEP constantexprs (ie.e. GEPs defined inline), but not for
> standalone GEPs defining instructions. relevant discussion.
> >>
> >>
> >>
> >> Alternatives:
> >> (1) Annotate each array subscript index value with a range, e.g.:
> >>
> >> ```
> >> %i = i64 …
> >> %ri = call i64 @llvm.annotation.i64(%index), !range !0
> >> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,
> i32 %ri
> >> ...
> >> !0 = !{i64 0, i64 42}
> >>
> >> ```
> >> (2) (variant of 1) relax the constraint that !range metadata can only
> be set on call and load instructions, and set the !range metadata on the
> index expression. We still need annotations for function parameters though:
> >>
> >> ```
> >> %i = i64 … , !range !0
> >> %gep1 = getelementptr inbounds %struct.S, %struct.S* %s, i64 0, i32 0,
> i32 %i
> >> ...
> >> !0 = !{i64 0, i64 42}
> >>
> >> ```
> >> This is slightly more compact.
> >>
> >> (3) Same as (1), with llvm.assume. This feels inferior to annotations.
> >> (4) Extend inrange to non-constantexprs GEPs. It is unclear how this
> will interfere with optimizations.
> >>
> >>
> >>
> >>
> >>
> >> On the clang side:
> >>
> >> The clang part is quite trivial as the infrastructure is already in
> place to emit dynamic ubsan guards: D99248
> >>
> >>
> >>
> >> On the LLVM Side:
> >>
> >> Alternatives:
> >> (A) - (annotation or assume options) Simply enable scev-aa which knows
> how to handle value ranges in general. IIUC it's not enabled in clang
> because it has issues with invalidation when code changes, and is therefore
> not cacheable. This makes it too slow to be practical.
> >> (B) - (annotation or assume options) Teach BasicAA to honor !range
> metadata (D99247)
> >> (C) - (inrange option) Teach BasicAA to honor inrange attributes of GEP.
> >>
> >>
> >>
> >> I was leaning towards (1) and (B) because:
> >>
> >> - BasicAA already has basic support for value range analysis
> (zero/nonzero), this is a small and natural extension.
> >>
> >> - The BasicAA improvement already benefits some existing code (as
> evidenced by the test changes in D99247)
> >>
> >> - Using range metadata rather than the `inrange` attribute means that
> BasicAA will automatically benefit from improvements in value tracking in
> the future.
> >>
> >>
> >>
> >> Opinions ?
> Roman
>
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210325/6d93801d/attachment.html>
More information about the llvm-dev
mailing list