[llvm] [llvm] Fix __builtin_object_size interaction between Negative Offset … (PR #111827)

Fri Oct 25 16:00:02 PDT 2024

hvdijk wrote:

Nice work! Using a new class rather than extending `SizeOffsetAPInt` is probably good, it helps maintain compatibility, which is a plus if it means we can also backport this to LLVM 19 to fix the wrong UBSAN errors there.

The change to `llvm/test/Transforms/InstCombine/builtin-object-size-offset.ll` should not be needed. With `-passes=instcombine`, it is supposed to check for the size to be known exactly. Here, the size was previously known exactly and it is a regression that we no longer pick that up. The reason this is failing is because in `ObjectSizeOffsetVisitor::compute`, you have `if (!Span.bothKnown()) return {};`, but if `EvalMode != ObjectSizeOpts::Mode::ExactUnderlyingSizeAndOffset`, that is not the best way of handling it. For `ExactSizeFromOffset` specifically, the idea is that `Before` does not matter to the caller, you can construct the result from `After` alone. By requiring both `Before` and `After` to be known, you are effectively making it so that `ExactSizeFromOffset` and `ExactUnderlyingSizeAndOffset` do the same thing, which they should not.

However, if `ExactSizeFromOffset` is restored to handle that case, it then becomes important to ensure that nothing calls `ObjectSizeOffsetVisitor::compute` with `Options.EvalMode = ObjectSizeOpts::Mode::ExactSizeFromOffset` if actually it does care about the offset. Yet `ObjectSizeOffsetEvaluator::compute_` does exactly that. Fixing that is possible by making sure `ObjectSizeOffsetVisitor` has some public `compute` version that returns `OffsetSpan`, and making `ObjectSizeOffsetEvaluator::compute_` use that as well. Alternatively, it may be possible to get `ObjectSizeOffsetEvaluator::compute_` to use a different `EvalMode`, depending on how exactly `ObjectSizeOffsetVisitor::compute` gets changed.

---

P.S.: The reason why I suggested keeping `TotalSize` is that this case can give an upper bound when we know we are indexing into a specific object, but do not know the index (e.g. `__builtin_object_size(array + i, 0)` where `i` is only known at runtime can still be bound by `sizeof array`). I thought we already handled that, but we don't, so it is fine that your PR leaves that unhandled.

I will do more extensive testing later.

https://github.com/llvm/llvm-project/pull/111827