[llvm] [DA] Add initial support for monotonicity check (PR #162280)
via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 20 10:53:12 PDT 2025
amehsan wrote:
I explain the main idea with an example here. If this is correct and can be made to work, it will be indepndent of this patch and probably something that can be implemented earlier in pipeline. Consider the following loop. Also this part of langref about `inbounds` attribute is important here:
> The successive addition of the current address, truncated to the pointer index type and interpreted as an unsigned number, and each offset, interpreted as a signed number, does not wrap the pointer index type
```
void foo2 (char *A, uint64_t n) {
// for.body: ; preds = %entry, %for.body
// %j.04 = phi i64 [ 0, %entry ], [ %inc, %for.body ]
// %conv = trunc i64 %j.04 to i8
// %arrayidx = getelementptr inbounds nuw i8, ptr %A, i64 %j.04
// store i8 %conv, ptr %arrayidx, align 1, !tbaa !6
// %inc = add nuw i64 %j.04, 1
// %exitcond = icmp eq i64 %inc, %umax
// br i1 %exitcond, label %for.cond.cleanup, label %for.body, !llvm.loop !9
for (uint64_t j = 0; j <= n ; j ++) {
A[j] = (char) j;
}
}
```
The value of `%j.04` can be any 64 bit integer. In the context of GEP instruction it should be intepreted as a signed integer that can take values in the range `[-2^63, 2^63-1]`. I assume we have a 64 bit address space starting from 0 and ending at 2^64 - 1. There are a couple of possibilities for address of `A`
1. If address of `A` is `2^63+X` (`X` is a positive value less than `2^63`): Consider `j = 2^63 - X`. `A[j - 1]` will address last memory byte, `2^64-1`, and `A[j]` will wrap `i64` type, in violation of the `inbounds` rule above. So to avoid UB, there must be an upper bound of at most `2^63 - 1` for j. (when j is interreted as a signed integer).
2. We can also make a similar argument and prove a bound for `j` if address of `A` is `2^63+X` (here `X` is any negative 64 bit number)
3. The only remaining case is that address of `A` is exactly at `2^63`. In this case values of `A[j]` can address any memory location in the entire 64-bit address space. To ensure we have provenance to access any memory location, there must have been a single allocated object of size `2^64`. While this may be OK for the abstract machine, this is certainly not the case for any 64 bit target that I am aware of (*). It is reasonable to add a new parameter to e.g. DataLayout (or somewhere else) that is an upper bound on the size of a single allocated object (**). If we have a bound smaller than `2^64` then this last case is also impossible.
Essentially we prove that `j` cannot wrap. Even though loop conditions allow that, there will UB somewhere else in the loop if `j` really takes all possible values. Either we violate conditions of `inbound` or the allocated object should be impossibly large.
---------------------------------------
(*) AArch64 and x86 machines use less than 64 bits for virtual address as discussed before. A machine that uses e.g. 56 bits for addressing cannot allocate an object of size larger than `2^56`
(**) I am not sure DataLayout is the best place. The bound could be `2^64` bit by default, but set to a smaller value depending on `-mcpu` option. Realistic bounds are definitely smaller, but this is provably correct. It is also possible that the user sets this bound for a program being compiled.
```
https://github.com/llvm/llvm-project/pull/162280
More information about the llvm-commits
mailing list