[llvm] [LangRef] Specify icmp on pointers to only compare address (PR #163936)

Nikita Popov via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 27 06:00:52 PDT 2025


nikic wrote:

> For ordered comparisons, the address is the only thing that makes sense, but equality typically implies substitutability. We need to be careful that blocks reachable from a branch on `icmp eq %a, %b` don't assume that `%a` can substitute `%b`.

You keep repeating this, even though you have already been told multiple times that this is incorrect. Pointer equality does not imply substitutability, and LLVM does not assume it does for cases where it cannot prove that the provenance is the same (in which case the non-address bits are known to also be the same, and substitutability holds regardless of whether icmp compares those bits or not).

> This is currently the behaviour of CHERI C/C++. As we've had more people trying to write these dialects, we've found that this is the number one source of friction. In CHERIoT, our C++ wrapper type around a CHERI capability uses exact (all bits including tag) and we have found from user feedback that this is a more understandable behaviour.

I think that this behavior would have to use an intrinsic in any case (as even without the proposed LangRef change, icmp would not compare the tag).

> I would like to change that behaviour, as it's also strictly not a compliant implementation of C++ as of C++20, which explicitly requires equality to respect the substitution principle.

I assume that you're referring to [17.12.2.1p4](https://eel.is/c++draft/cmp.categories#pre-4) here:

> For the purposes of [cmp.categories], [substitutability](https://eel.is/c++draft/cmp.categories#def:substitutability) is the property that f(a) == f(b) is true whenever a == b is true, where f denotes a function that reads only comparison-salient state that is accessible via the argument's public const members.

strong_order substitutability in the C++ standard is a deliberately vague property, which hinges on the function only inspecting "comparison-salient" state. Complying with this is a matter of defining which part of the pointer is "comparison-salient". The interpretation of current CHERI implementations is that this is only the address part of the pointer. For non-CHERI architectures, I don't think it's possible to have "pointer provenance" be part of the "comparison-salient" state while complying with other requirements of the standard.

(Though I doubt that the C++ committee gave any deep consideration to how `<=>` on pointers returning `strong_ordering` interacts with pointer provenance, given that in C++ pointer provenance is currently an emergent property, rather than an explicitly specified one. I'd expect the precise wording in these areas to change once C++ specifies a complete provenance model.)

> > LLVM can only reason about the address bits. These semantics allow pointers with non-address bits to receive essentially the same comparison optimization support as ordinary pointers.
>
> Such optimisations are almost certainly unsound. Comparing only the address permits substitution with address-only pointers (modulo provenance-based alias analysis). It does not on CHERI.

LLVM already handles substitution based on pointer equality correctly. The optimizations I'm referring to here are more basic things. Like if you have a loop with an `iv == end` exit condition (with pointer IV), we can only easily determine the loop trip count if the comparison is on the address only. If it includes non-address bits, then it's likely possible to recover this in some cases, but only through more sophisticated analysis that proves that a difference in non-address bits will result in UB. E.g. by making an argument that if they differ, the loop would violate the forward progress guarantee (which only works for C/C++) or that we would eventually branch on poison by violating a nowrap requirement.

https://github.com/llvm/llvm-project/pull/163936


More information about the llvm-commits mailing list