[llvm-dev] [RFC] Introducing a byte type to LLVM

Wed Jun 23 12:09:10 PDT 2021

Hi Jeroen,

>> To add to what Juneyoung said:
>> I don't think that experiment has been made. From what I can see, the
>> alternative you propose leads to an internally consistent model -- one "just"
>> has to account for the fact that a "load i64" might do some transformation on
>> the data to actually obtain an integer result (namely, it might to ptrtoint).
>>
>> However, I am a bit worried about what happens when we eventually add proper
>> support for 'restrict'/'noalias': the only models I know for that one actually
>> make 'ptrtoint' have side-effects on the memory state (similar to setting the
>> 'exposed' flag in the C provenance TS). I can't (currently) demonstrate that
> 
> For the 'c standard', it is undefined behavior to convert a restrict pointer to
> an integer and back to a pointer type.
> 
> (At least, that is my interpretation of n2573 6.7.3.1 para 3:
>     Note that "based" is defined only for expressions with pointer types.
> )
> 
> For the full restrict patches, we do not track restrict provenance across a
> ptr2int, except for the 'int2ptr(ptr2int %P)' (which we do, as llvm sometimes
> introduced these pairs; not sure if this is still valid).

Interesting. I assumed that doing ptr2int, then doing whatever you want with 
that value (say, AES encrypt and then decrypt it), and then turning the same 
value back into a pointer, must always produce a pointer that is "at least as 
usable" as the one that we started with. I would interpret the parts of the 
standard that talk about integer-pointer casts that way.
(That's the problem with axiomatic standards: it is very easy to have mutually 
contradicting axioms...)

FWIW, Rust's use of LLVM 'noalias' pretty much relies on this. It would be 
rather disastrous for Rust if 'noalias' pointers cannot be cast to integers, 
cast back (potentially in a different function), and used.

The C standard definition of 'restrict' is based on hypothetical alternative 
executions of the program with different inputs. I can't even imagine any 
reasonable way to interpret that unambiguously, so honestly I don't see how that 
is even a starting point for a precise formal definition that one could prove 
theorems about.^^
The ideas colleagues and me discussed for this more evolved around the idea of 
having more than one "provenance" for an allocation (so when a pointer is passed 
to a function as 'restrict' argument, it gets a fresh "ID" into its provenance), 
and then ensuring that the different provenances on one allocation are used 
consistently.  But then when you cast a ptr to an int you basically have to mark 
that particular provenance as 'exposed' (losing all 'restrict' advantages) to 
have any chance of handling the case of casting the int back to a ptr.  That 
seems fair to me honestly, if you cast a ptr to an int you cannot reasonably 
expect alias analysis to make heads or tails of what you are doing.  But then 
'ptrtoint' has a side-effect and cannot be removed even if the result is unused.

Kind regards,
Ralf

> 
> Greetings,
> 
> Jeroen Dobbelaere
> 
>> this is *required*, but I also don't know an alternative. So if this remains
>> the
>> case, and if we say "load i64" performs a ptrtoint when needed, then that
>> would
>> mean we could not do dead load elimination any more as that would remove the
>> ptrtoint side-effect.
>>
>> There also is the somewhat conceptual concern that LLVM ought to have a type
>> that can loslessly hold all kinds of data that exist in LLVM. Currently, that
>> is
>> not the case -- 'iN' cannot hold data with provenance.
>>
>> Kind regards,
>> Ralf
> 

-- 
Website: https://people.mpi-sws.org/~jung/