[llvm-dev] [RFC] Introducing a byte type to LLVM

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Mon Jun 14 22:49:21 PDT 2021


On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> The semantics you seem to want are that LLVM’s integer types cannot carry
> information from pointers. But I can cast a pointer to an integer in C and
> vice-versa, and compilers have de facto defined the behavior of subsequent
> operations like breaking the integer up (and then putting it back
> together), adding numbers to it, and so on. So no, as a C compiler writer,
> I do not have a choice; I will have to use a type that can validly carry
> pointer information for integers in C.
>
int->ptr cast can reconstruct the pointer information, so making integer
types not carry pointer information does not necessarily mean that
dereferencing a pointer casted from integer is UB.

For example, the definition of cast_ival_to_ptrval at the n2676 proposal
shows that a pointer's provenance is reconstructed from an integer.
(Whether n2676's cast_ival_to_ptrval can be also used for LLVM's inttoptr
semantics is a different question, though)

> Since you seem to find this sort of thing compelling, please note that
> even a simple assignment like char c2 = c1 technically promotes through
> int in C, and so int must be able to carry pointer information if char
> can.
>
IIUC integer promotion is done when it is used as an operand of arithmetic
ops or switch's condition, so I think assignment operation is okay.

Juneyoung


On Tue, Jun 15, 2021 at 1:08 AM John McCall via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> On 14 Jun 2021, at 7:04, Ralf Jung wrote:
>
> Hi,
>
> I don't dispute that but I am still not understanding the need for bytes.
> None of the examples I have seen so far
> clearly made the point that it is the byte types that provide a
> substantial benefit. The AA example below does neither.
>
> I hope <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html>
> makes a convincing case that under the current semantics, when one does an
> "i64" load of a value that was stored at pointer type, we have to say that
> this load returns poison. In particular, saying that this implicitly
> performs a "ptrtoint" is inconsistent with optimizations that are probably
> too important to be changed to accommodate this implicit "ptrtoint".
>
> I think it is fact rather obvious that, if this optimization as currently
> written is indeed in conflict with the current semantics, it is the
> optimization that will have to give.  If the optimization is too important
> for performance to give up entirely, we will simply have to find some more
> restricted pattern that wee can still soundly optimize.
>
> That is certainly a reasonable approach.
> However, judging from how reluctant LLVM is to remove optimizations that
> are much more convincingly wrong [1], my impression was that it is easier
> to complicate the semantics than to remove an optimization that LLVM
> already performs.
>
> [1]: https://bugs.llvm.org/show_bug.cgi?id=34548,
> https://bugs.llvm.org/show_bug.cgi?id=35229;
> see https://www.ralfj.de/blog/2020/12/14/provenance.html for a
> more detailed explanation
>
> Perhaps the clearest reason is that, if we did declare that integer types
> cannot carry pointers and so introduced byte types that could, C frontends
> would have to switch to byte types for their integer types, and so we would
> immediately lose this supposedly important optimization for C-like
> languages, and so, since optimizing C is very important, we would
> immediately need to find some restricted pattern under which we could
> soundly apply this optimization to byte types.  That’s assuming that this
> optimization is actually significant, of course.
>
> At least C with strict aliasing enabled (i.e., standard C) only needs to
> use the byte type for "(un)signed char". The other integer types remain
> unaffected. There is no arithmetic on these types ("char + char" is subject
> to integer promotion), so the IR overhead would consist in a few "bytecast"
> instructions next to / replacing the existing sign extensions that convert
> "char" to "int" before performing the arithmetic.
>
> The semantics you seem to want are that LLVM’s integer types cannot carry
> information from pointers. But I can cast a pointer to an integer in C and
> vice-versa, and compilers have de facto defined the behavior of subsequent
> operations like breaking the integer up (and then putting it back
> together), adding numbers to it, and so on. So no, as a C compiler writer,
> I do not have a choice; I will have to use a type that can validly carry
> pointer information for integers in C.
>
> Since you seem to find this sort of thing compelling, please note that
> even a simple assignment like char c2 = c1 technically promotes through
> int in C, and so int must be able to carry pointer information if char
> can.
>
> John.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>


-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20210615/4f348f6c/attachment-0001.html>


More information about the llvm-dev mailing list