[llvm] [DA] Fix the check between Subscript and Size after delinearization (PR #151326)

Tue Aug 5 07:53:49 PDT 2025

================
@@ -594,14 +594,15 @@ for.end12:                                        ; preds = %for.inc10, %entry
 }
 
 
+; FIXME? It seems that we cannot prove that %N is non-negative...
 define void @nonnegative(ptr nocapture %A, i32 %N) {
 ; CHECK-LABEL: 'nonnegative'
 ; CHECK-NEXT:  Src: store i32 1, ptr %arrayidx, align 4 --> Dst: store i32 1, ptr %arrayidx, align 4
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - output [* *]!
 ; CHECK-NEXT:  Src: store i32 1, ptr %arrayidx, align 4 --> Dst: store i32 2, ptr %arrayidx, align 4
-; CHECK-NEXT:    da analyze - consistent output [0 0|<]!
+; CHECK-NEXT:    da analyze - output [* *|<]!
 ; CHECK-NEXT:  Src: store i32 2, ptr %arrayidx, align 4 --> Dst: store i32 2, ptr %arrayidx, align 4
-; CHECK-NEXT:    da analyze - none!
+; CHECK-NEXT:    da analyze - output [* *]!
----------------
Meinersbur wrote:

My concern to the assumption that values are always signed. The only indicator for wrapping behavior are `%add16`, `%add19` which are no-UNSIGNED-wrap. Moving from `0b01111111` to `0b10000000` is valid and must be considered. We cannot implicitly assume the result of it is poison/cannot happen because it is valid.

Every single `SE->isKnownNegative` assumes a signed interpretation, and it seems not even
https://github.com/kasuga-fj/llvm-project/blob/b06f10d96c6d0fb89253c75c7f1d75c4cf519339/llvm/lib/Analysis/DependenceAnalysis.cpp#L1144
considers that it could be unsigned. But also `getBackedgeTakenCount()` logically cannot be negative. That is, we always have a mix of unsigned and signed representations. `getMinusSCEV()` uses two-complement wrapping behavior. 

Assuming and i8 SCEV:
```
getMinusSCEV(129, 0) == 129 == 0b10000001 == -127
isKnownNegative(-127)
```
so `isKnownNegative` is happy to interpret an (obviously positive) unsigned number as negtive. Just start the iteration space with the 'negative' number and `isKnownNegative` returns exactly the opposite of what you would expect
```
for (uint8_t i = 128; i < 255; ++i)
  A[i];
```
If we wanted to exploit the fact that memory blocks larger than half the address space might be undefined behavior (due to `ptrdiff_t` being signed), we would need to sign-extend the SCEV to pointer size. Also, the logic would only be valid for getelementptr argument, not in general. Also, by adding/multiplying a sufficient larger number, the actuall array index may again be within the sweet sport of below half the address space.

I seems doubious to me that `<` is implemented in terms of `getMinusSCEV < 0`. The more I think about it the more worried I become. The issue can easily be shown when using constants.

This PR at least adds a check for the AddRec having signed semantics. But `AddRec->evaluateAtIteration(BECount)` could still be larger than INTn_MAX as above. Then `isKnownNegative` returns true on `Diff1` it even if Size is zero. `AddRec->hasNoSignedWrap()` just means the AddRec itself does not wrap, but maybe the index that is computed from that AddRec.

I am considering just LGTM since it clearly fixes a bug. Cannot fix all the pre-existing corner cases. Some idea for rigerous/systematic testing would be to use loops with a single iteration only (in two different loops; so AddRec resolves to a constant), or with two iterations, one before the (unsigned and/or signed) wrap and one below. With so just a single iteration it is easier to reason about and SCEV less often has to fall back to SCEVCouldNotCompute. Instead of comparing just the first/last value of the array index SCEV, one might possibly (also) need to recursively visit the SCEV tyo analyze whether each of its constituents is monotonic.

https://github.com/llvm/llvm-project/pull/151326