[llvm-dev] [RFC] Introducing a byte type to LLVM

Tue Jun 22 10:21:21 PDT 2021

Hi,

> Your proposal generally relies on certain optimizations not applying
> to pointers because they mess up provenance as represented in
> transitive use-dependencies. If those optimizations can be applied
> to integers, you can lose use-dependencies in exactly the same way as
> you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
> reduction doesn’t matter at all, because in either case that value
> has a use-dependency on p, whereas the actual problem is that there
> is no longer a use-dependency on some other value.

Note that "provenance" as we use it in this discussion is an *explicit 
operational artifact* -- it exists as a concrete piece of state in the Abstract 
Machine. That is very different from something that might just be used 
internally in some kind of analysis.

There is no problem with "resetting" that provenance on a "inttoptr", and 
basically forgetting about where the int comes from. Note that this is a 
statement about an operation in the Abstract Machine, not merely a statement 
about some analysis: this is not "forgetting" as in "safely overapproximating 
the real set of possible behaviors", it is "forgetting" by *forcing* the 
provenance to be some kind of wildcard/default provenance. All analyses then 
have to correctly account for that.

> For example, you have compellingly argued that it’s problematic to
> do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
> I instead do this optimization on integers where |a = ptrtoint A|.
> The result is now simply |b|. If I |inttoptr| that result and access
> the memory, there will be no record that that access may validly
> be to |A|. It does not help that the access may be represented
> as |inttoptr (ptrtoint B)| for some |B| rather than just directly
> to |B|, because there is no use-dependence on |A|. All there is
> is an apparently unrelated and unused |ptrtoint A|.

So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being 
replaced by "ptrtoint B"? I don't see any problem with that. Do you have a 
concrete example?

> Obviously we can avoid doing this optimization locally when we
> see that the inputs result from |ptrtoint|, but that’s no more
> than best-effort: we can do this optimization in a function which
> we later inline in a caller that performs all the |ptrtoint| and
> |inttoptr| casts.

I agree "avoiding something locally" is way too fragile to be considered a 
solution. I do not think it is needed, though.

Kind regards,
Ralf