[llvm-dev] [RFC] Introducing a byte type to LLVM

Wed Jun 23 11:55:20 PDT 2021

Hi John,

>     So, at least in the paper Juneyoung and me and a few other people wrote (the
>     one referenced a couple times in this thread already:
>     https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf
>     <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>), it is a wildcard
>     provenance.
>     "exposed" is not a thing in the operational semantics, it is a theorem we
>     can prove.
> 
> Okay, fine, I won’t say “exposure”. :) You intend to still be able to
> prove that the reconstructed provenance cannot be that of certain
> memory locations based on a comprehensive use analysis of those
> locations.

Not really. You prove that the *address* can never be guessed by anything else. 
For pointers with the wrong address, their provenance simply does not matter.

>     I don't think that's right, since "a" is still used in the "if", so a bit of
>     information about the address is leaked and you can't assume the address was
>     not guessed.
> 
> Okay, so you have a baseline expectation that pointer comparisons will be 
> treated as escapes.

Yes. This expectation is based on "otherwise the analysis is wrong and I think I 
can prove that". ;)  You can't just ignore control dependencies and expect 
things to work out.
Does LLVM currently not treat pointer comparisons as escapes? I naively assumed 
it would because how could one even justify not doing so?^^

> The reason I added a second comparison in my example is just 
> because otherwise in the |a != b| case the access is out of bounds. So I guess 
> you’re suggesting that it’s impossible to construct an example that avoids that 
> problem that wouldn’t ultimately do something uneliminatable that would count as 
> an escape?

Yes, I think the theorems we proved for that paper establish this.
Of course, there is a chance that *some other* optimization LLVM does (not 
covered by our theorems) is in conflict with that story.

>     That said, things get more tricky once you want to account for C 'restrict'
>     / LLVM 'noalias', and then it might be necessary to have explicit
>     'exposed'-like operations. I haven't seen a properly worked-out model for
>     this, but the candidate models I saw would need something like this. So
>     maybe it'd not be a bad idea to have such an operation anyway... I just
>     don't think it should have any effect for the kind of programs we have been
>     discussing here so far.
> 
> Honestly, part of why I like this approach is that I think the basic idea — 
> recognizing when we’re doing a dependency-breaking value replacement and using a 
> different replacement sequence — also moves us towards a solution for |consume| 
> dependencies.

What I don't like about it is that it is not operational (or I did not 
understand it properly; I am not quite sure what the exact semantics of that 
marker you want to insert is). I think we to get to a state where we have a 
precise operational semantics / Abstract Machine for LLVM IR (along the lines of 
what we have in that paper -- not the same semantics, of course, but the same 
basic style of defining things). Ideally this would even be implemented in an 
interpreter. Then we can attack the question of "is this analysis/optimization 
correct" in a principled way.
(We have such an interpreter for Rust: https://github.com/rust-lang/miri/ . It's 
been tremendously useful, not just for finding bugs.)

The operational version of this that I know is basically the 'exposed' flag of 
the C provenance TS. Then 'ptrtoint' has a side-effect of setting that flag, and 
can (in general) not be removed, even if its result is unused. For 
'restrict'/'noalias' we need something more flexible, but I think it can be 
done. (I've done something similar for a Rust memory model in 
http://plv.mpi-sws.org/rustbelt/stacked-borrows/ .)
If you are okay with 'ptrtoint' having a side-effect, many things become a lot 
simpler.
But you seem to imagine something more targeted, something that only affects 
"dependency-breaking value replacements"... I don't have a good enough intuition 
for what exactly that is, to really evaluate this option. Could you give an example?

Kind regards,
Ralf