[cfe-dev] [RFC] Introducing a byte type to LLVM

Mon Jun 14 09:08:09 PDT 2021

On 14 Jun 2021, at 7:04, Ralf Jung wrote:

> Hi,
>
>>>> I don't dispute that but I am still not understanding the need for 
>>>> bytes. None of the examples I have seen so far
>>>> clearly made the point that it is the byte types that provide a 
>>>> substantial benefit. The AA example below does neither.
>>>
>>> I hope 
>>> <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> 
>>> makes a convincing case that under the current semantics, when one 
>>> does an "i64" load of a value that was stored at pointer type, we 
>>> have to say that this load returns poison. In particular, saying 
>>> that this implicitly performs a "ptrtoint" is inconsistent with 
>>> optimizations that are probably too important to be changed to 
>>> accommodate this implicit "ptrtoint".
>>
>> I think it is fact rather obvious that, if this optimization as 
>> currently written is indeed in conflict with the current semantics, 
>> it is the optimization that will have to give.  If the optimization 
>> is too important for performance to give up entirely, we will simply 
>> have to find some more restricted pattern that wee can still soundly 
>> optimize.
>
> That is certainly a reasonable approach.
> However, judging from how reluctant LLVM is to remove optimizations 
> that are much more convincingly wrong [1], my impression was that it 
> is easier to complicate the semantics than to remove an optimization 
> that LLVM already performs.
>
> [1]: https://bugs.llvm.org/show_bug.cgi?id=34548,
>      https://bugs.llvm.org/show_bug.cgi?id=35229;
>      see https://www.ralfj.de/blog/2020/12/14/provenance.html for a
>      more detailed explanation
>
>> Perhaps the clearest reason is that, if we did declare that integer 
>> types cannot carry pointers and so introduced byte types that could, 
>> C frontends would have to switch to byte types for their integer 
>> types, and so we would immediately lose this supposedly important 
>> optimization for C-like languages, and so, since optimizing C is very 
>> important, we would immediately need to find some restricted pattern 
>> under which we could soundly apply this optimization to byte types.  
>> That’s assuming that this optimization is actually significant, of 
>> course.
>
> At least C with strict aliasing enabled (i.e., standard C) only needs 
> to use the byte type for "(un)signed char". The other integer types 
> remain unaffected. There is no arithmetic on these types ("char + 
> char" is subject to integer promotion), so the IR overhead would 
> consist in a few "bytecast" instructions next to / replacing the 
> existing sign extensions that convert "char" to "int" before 
> performing the arithmetic.

The semantics you seem to want are that LLVM’s integer types cannot 
carry information from pointers.  But I can cast a pointer to an integer 
in C and vice-versa, and compilers have de facto defined the behavior of 
subsequent operations like breaking the integer up (and then putting it 
back together), adding numbers to it, and so on.  So no, as a C compiler 
writer, I do not have a choice; I will have to use a type that can 
validly carry pointer information for integers in C.

Since you seem to find this sort of thing compelling, please note that 
even a simple assignment like `char c2 = c1` technically promotes 
through `int` in C, and so `int` must be able to carry pointer 
information if `char` can.

John.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20210614/2fab5432/attachment-0001.html>