[llvm] [IR] Introduce captures attribute (PR #116990)

Mon Nov 25 08:04:34 PST 2024

nikic wrote:

> I'm not quite sure about the statement that icmp doesn't capture provenance... I mean, take something like the following:
> 
> ```
> void f(int*x) {
>   if ((size_t)x == 0x11111111)
>     *(int*)0x11111111 = 10;
> }
> ```
> 
> Your phrasing implies this code is UB. Which... I'm not sure whether it's what you want, but it's a change to the underlying semantics. (LLVM optimizations currently convert icmp of a ptrtoint to an icmp of the underlying pointer.)
> 
> In the interest of making sure there aren't any surprises here, maybe we should say that this is a provenance capture? (If you're planning to introduce a non-provenance-capturing ptrtoint, it can be used to express a non-capturing icmp.)

I think everyone would agree that this variant is UB, right?

```
void f(int*x) {
  if (x == (int*)0x11111111)
    *(int*)0x11111111 = 10;
}
```

The problem in your example isn't so much that we convert icmp of ptrtoint into icmp of pointers, but that we then DCE the ptrtoints, losing the provenance exposure side effect. This is the same class of problem as https://github.com/llvm/llvm-project/issues/33896, and all the related issues.

I really wouldn't want to specify icmp as a provenance capturing operation. I think nowadays we're pretty clear on icmp being an address-only, provenance-unaware operation (thus also changes like https://github.com/llvm/llvm-project/pull/82458) and going back on that would muddy the waters quite a bit, not to mention all the consequences of making icmps a non-pure operation. Of course, we'd just ignore all those consequences anyway, but I'd rather move us towards the correct direction...

I think if we find evidence that this is a problem in *practice* (rather than just in adversarially constructed inttoptr examples, where we already have ten other ways in which LLVM can miscompile them...) I'd consider temporarily working around this on the implementation side only, while still specifying the correct semantics in LangRef.

> I think it makes sense to consider a null-comparison involving dereferenceable_or_null to be non-capturing, but I think we need to tighten the phrasing on dereferenceable_or_null to actually make that true. There isn't really anything stopping someone from just doing an out-of-bounds gep where the result happens to be null... at which point, you've captured the address. I think we need some statement indicating the "null" in dereferenceable_or_null has to be a true null, not the result of an out-of-bounds gep.

Good point. I think the relevant distinction here would be that the "null" needs to not have provenance. But then we'd have a case where adding provenance to a pointer would introduce additional UB, losing provenance monotonicity, which is likely a bad idea. I'm not sure how to specify this cleanly while retaining the existing special case.

A possible alternative here would be to instead split off an extra `address_is_null` subset from `address`. That way we'd know that the address is only "captured" via null comparisons and could ignore this in transforms where the null comparison doesn't matter. Do you think that would make sense?

https://github.com/llvm/llvm-project/pull/116990