[PATCH] D156658: [clang][dataflow] When checking `ExprToLoc` convergence, only consider children of block terminator.

Wed Aug 2 00:43:53 PDT 2023

mboehme added a comment.

In D156658#4546955 <https://reviews.llvm.org/D156658#4546955>, @xazax.hun wrote:

>> It looks as if, instead, what we should be doing to improve convergence in a sound manner is to implement widening for ExprToLoc. I'll submit a corresponding patch shortly.
>
> +1, I believe we want `ExprToLoc` to converge. That being said, if we can get away with not checking some parts, it could potentially be implemented as an optimization. In that case, I'd expect to still have full checking in debug builds and a strong argument why is it OK to not compare some parts.

I've investigated this in more detail. Unfortunately, it turns out that it's not quite as simple as just implementing widening on `ExprToLoc`.

One of the reasons for this is that we only apply widening at loop heads, but the expressions that are "blocking" convergence may be contained in a block that is not a loop head.

We could solve this by applying widening everywhere, but AFAIU, that's really not desirable because you lose a lot of precision that way.

But really, I think `ExprToLoc` is just a red herring here. The real issue is:

- We lack widening on `PointerValue`s
- Instead, for the purposes of convergence, we simply ignore differences in `PointerValue`s
- However, different `PointerValue`s can lead to different locations for expressions that dereference these `PointerValue`s, and we do consider the difference in these locations to be relevant for convergence

In other words, we have an inconsistency between when we consider a `PointerValue` to be converged and when we consider the storage location for an expression to be converged.

I think the real solution to this would be to introduce widening for `PointerValue`s. Essentially, what we want is a "top" `PointerValue` that does not have an associated `StorageLocation`. However, we don't want to eliminate the `PointerValue` entirely; we still want to be able to attach properties to it, so that, for example, an analysis can record that the `PointerValue` is non-null, even though we don't know what its exact location is. This is important for us to be able to handle cases like this one correctly.

The most obvious way to implement a "top" `PointerValue` would be to make `PointerValue::getPointeeLoc()` return a nullable pointer instead of a reference. When dereferencing a `PointerValue` without a storage location, we would then not associate the corresponding `Expr` with a storage location at all, thereby solving the convergence issue. However, this approach would require some effort, as it would involve changing callers of `PointerValue::getPointeeLoc()` so that they can deal with the case where no pointee location is available.

I therefore think that we should consider a shorter-term solution: In `Environment::equivalentTo()`, ignoring glvalue entries in `ExprToLoc` for certain expressions where it's unlikely that any analysis will ever want to retrieve their storage location. See https://reviews.llvm.org/D156856, which I've just submitted for review. I hope that, in practice, this will cover a majority of the cases that are causing non-convergence.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156658/new/

https://reviews.llvm.org/D156658