[llvm-dev] [RFC] Introducing a byte type to LLVM

Ralf Jung via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 14 04:04:30 PDT 2021


>>> I don't dispute that but I am still not understanding the need for bytes. 
>>> None of the examples I have seen so far
>>> clearly made the point that it is the byte types that provide a substantial 
>>> benefit. The AA example below does neither.
>> I hope <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> makes 
>> a convincing case that under the current semantics, when one does an "i64" 
>> load of a value that was stored at pointer type, we have to say that this load 
>> returns poison. In particular, saying that this implicitly performs a 
>> "ptrtoint" is inconsistent with optimizations that are probably too important 
>> to be changed to accommodate this implicit "ptrtoint".
> I think it is fact rather obvious that, if this optimization as currently 
> written is indeed in conflict with the current semantics, it is the optimization 
> that will have to give.  If the optimization is too important for performance to 
> give up entirely, we will simply have to find some more restricted pattern that 
> wee can still soundly optimize.

That is certainly a reasonable approach.
However, judging from how reluctant LLVM is to remove optimizations that are 
much more convincingly wrong [1], my impression was that it is easier to 
complicate the semantics than to remove an optimization that LLVM already performs.

[1]: https://bugs.llvm.org/show_bug.cgi?id=34548,
      see https://www.ralfj.de/blog/2020/12/14/provenance.html for a
      more detailed explanation

> Perhaps the clearest reason is that, if we did declare that integer types cannot 
> carry pointers and so introduced byte types that could, C frontends would have 
> to switch to byte types for their integer types, and so we would immediately 
> lose this supposedly important optimization for C-like languages, and so, since 
> optimizing C is very important, we would immediately need to find some 
> restricted pattern under which we could soundly apply this optimization to byte 
> types.  That’s assuming that this optimization is actually significant, of course.

At least C with strict aliasing enabled (i.e., standard C) only needs to use the 
byte type for "(un)signed char". The other integer types remain unaffected. 
There is no arithmetic on these types ("char + char" is subject to integer 
promotion), so the IR overhead would consist in a few "bytecast" instructions 
next to / replacing the existing sign extensions that convert "char" to "int" 
before performing the arithmetic.

Kind regards,

More information about the llvm-dev mailing list