[PATCH] D93820: [InstSimplify] Don't fold gep p, -p to null

Sat Dec 26 08:15:59 PST 2020

nlopes added a comment.

In D93820#2471778 <https://reviews.llvm.org/D93820#2471778>, @nikic wrote:

> In D93820#2471737 <https://reviews.llvm.org/D93820#2471737>, @nlopes wrote:
>
>> I only looked at the tests and they were correct before, see here: https://alive2.llvm.org/ce/z/UzW3pv
>> The tests are weird because they have 'gep inbounds'. The reason they are correct (and weird) is that the only way `p - (int)p/sizeof(*p)` is inbounds is p being null. Anything else will overflow.
>
> This doesn't look right to me, at least not given current LangRef wording. Lets say we have `gep inbounds p, -p`, where `p = ptr(base_addr = 1, offset = -1)`. This means that the address value of `p` is `0`, but it has provenance of the object at `base_addr = 1`. As such, the `inbounds` is not violated (both `p` and the gep results are inbounds of the zero address), but we still change provenance.

There's an extra catch: `gep inbounds` requires both the input and output pointers to be in bounds. This part is explicit in LangRef, at least.
Some examples:

  p = malloc()
  q = gep inbounds p, -1  // poison
  r = gep p, -1           // ok
  s = gep inbounds r, 1   // poison: r is not inbounds
  t = gep r, 1            // ok, offset = 0
  u = gep inbounds t, 1   // ok, offset = 1 (assuming malloc size > 0)

> For the transform to be correct, the GEP inbounds specification would have to require not only that the base address is in bounds of **an** allocated object, but that it is in bounds of **the** allocated object corresponding to the provenance of the base value. Is that supposed to be part of the GEP inbounds semantics?
>
>> The code is buggy because it doesn't require inbounds (not caught because there's no test without inbounds): https://gcc.godbolt.org/z/GYxc78
>
> I did consider making this transform require inbounds instead. The requirement that at least the address values belong to the same object would make practical miscompiles less likely. However, as argued above, I don't think the inbounds requirements would actually make the transform correct. And alive does agree that requiring inbounds is not sufficient for the more general case (https://alive2.llvm.org/ce/z/Fykn-3).

Right, only the null case is correct because it only works when the input pointer is null; all other cases were poison. So transforming poison into null is fine.
The general case changes provenance. It's very wrong. And LangRef is explicit about this case as well. Well, and BasicAA will break programs if you do this general transformation as it uses provenance to prove no alias.

>> So in the end the transformation doesn't seem very useful in practice?
>
> At least, the null pointer case I'm touching here doesn't fire on test-suite `-O3` at all...

The transformation does look weird. When is the code trying to compute nullptr in some convoluted way? I have some inclination to remove it altogether.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D93820/new/

https://reviews.llvm.org/D93820