[llvm-dev] [RFC] Introducing a byte type to LLVM

Sun Jun 13 08:26:59 PDT 2021

Hi Johannes,

>> I think Joshua gave a very nice motivation already.
> 
> I don't dispute that but I am still not understanding the need for bytes. None 
> of the examples I have seen so far
> clearly made the point that it is the byte types that provide a substantial 
> benefit. The AA example below does neither.

I hope <https://lists.llvm.org/pipermail/llvm-dev/2021-June/151110.html> makes a 
convincing case that under the current semantics, when one does an "i64" load of 
a value that was stored at pointer type, we have to say that this load returns 
poison. In particular, saying that this implicitly performs a "ptrtoint" is 
inconsistent with optimizations that are probably too important to be changed to 
accommodate this implicit "ptrtoint".

If/once we agree on that, "byte" is a fairly natural proposal IMO: we need some 
type at which to perform a load that does *not* turn some kinds of data into 
poison. "byte" is that type.

Kind regards,
Ralf

> 
> 
>> I'll just complement with an alias analysis example. (plus see my reply to Chris)
>>
>> define @f() {
>>    %p = alloca i8
>>    %q = alloca i8**
>>    store i8 42, %p
>>    store i8* %p, i8** %q
>>
>>    %pp = some ptr arithmetic; full may-alias
>>    %v = load i64, %pp
>>    call @g(%v)
>>
>>    %w = load i8, %p  ; call we store-forward & RAUW 42?
>> }
>>
>> So, we pass an integer to some function. If we don't distinguish between 
>> pointers and integers (current memory model), we need to assume that %v may 
>> carry the address of %p, as the load may have done an implicit 
>> pointer-to-integer cast. Therefore, pointer %p is escaped. Thus, in this model 
>> we cannot do store forwarding of 42 to the load of %w.
> 
> The conclusion that we cannot forward 42 is true if %pp has access to %p or %q, 
> which is not clear in the above code.
> If no use of %p or %q somehow makes it's way into %pp, nor a load of %q, %pp can 
> be shown to be independent of %q already, regardless if we already perform this 
> reasoning or not.
> 
> 
> 
>>   Except that LLVM doesn't consider integer loads as escaping pointers. This 
>> is wrong, and that's one of the reasons why we miscompile integer/pointer 
>> casts. The other reason is what I explained in the reply to John & Chris, 
>> which is that some integer optimizations aren't correct for pointers, so we 
>> cannot represent pointers as integers (without appropriate p2i/i2p casts).
> 
> At least me (and probably others) are arguing for the appropriate and explicit 
> p2i/i2p casts instead of a entirely new type that does the same job.
> 
> 
>> Another model is to say that integers can only carry integers. Above there's 
>> no pointer-to-integer cast of %p, hence %v cannot have the address of %p, and 
>> thus store-forwarding can happen.
>> This model requires a "universal holder" different than (ab)using integers.
> 
> Storing %p into %q causes %p to be escaped by default, IMHO. With the newest 
> definition of escapes, we allow analyses to prove %q is not used in "a weird 
> way" and therefore we can pretend %p is not escaped. This effectively requires 
> us to look at all the loads of %q and assume they are potential copies of %p. If 
> such a load is (potentially) an integer, you'd need to decide if the (potential) 
> integer use can escape %p, same as you would decide any other load of %q would. 
> I fail to see the need for byte types here.
> 
> 
>> Introducing a byte type means we only need to careful with char loads, but not 
>> with short, int, long, etc loads. Clang would lower char loads to byte loads, 
>> and the remaining integer types straight to i<n> loads. So we can go full 
>> speed on integers, and throttle down just on chars, which is all we can do 
>> because the semantics of chars in C.
> 
> This will cause a lot of side effects, in addition to the work required to make 
> this happen in the first place.
> Even if I assume we really want byte types for AA, which I am not sure of given 
> the lack of a convincing example, we should determine what the impact would be 
> here, e.g., restrict some optimizations in the presence of i8 operation and get 
> some idea of the cost.
> 
> ~ Johannes
> 
> 
>>
>> Nuno
>>
>>
>> -----Original Message-----
>> From: Johannes Doerfert <johannesdoerfert at gmail.com>
>> Sent: 04 June 2021 22:12
>> To: Nuno Lopes <nunoplopes at sapo.pt>; 'George Mitenkov' <georgemitenk0v at gmail.com>
>> Cc: 'Ralf Jung' <jung at mpi-sws.org>; cfe-dev at lists.llvm.org; 
>> llvm-dev at lists.llvm.org
>> Subject: Re: [RFC] Introducing a byte type to LLVM
>>
>> I think the lack of a end-to-end example is making this harder (for me) to 
>> understand.
>>
>>
>> On 6/4/21 2:17 PM, Nuno Lopes wrote:
>>> It's true: an alternative to introducing a new byte type is to make alias 
>>> analysis more conservative. But I think it's way worse than you think.
>>> Making AA conservative means that every single integer load must be 
>>> considered to be escaping a pointer.
>> What pointer is escaping if I have an integer load. I'm sorry for my
>> questions but I assume I'm not the only one that might
>> want to understand this properly. So far, I've assumed if I store a
>> pointer away w/o knowing what happens to the memory the
>> pointer escapes. This is our current model and it's not that bad.
>>
>>
>>>    Unless you know the last stored value was an integer, you would have to 
>>> conservatively assume that the load may be escaping some pointer (even if 
>>> only partially). This sounds to me like terrible. Type punning through memory 
>>> is probably very uncommon, and it seems we would be slowing down all programs 
>>> because of such uncommon case.
>>>
>>> Therefore I feel it's best if we introduce a way to explicitly mark potential 
>>> type punning situations, such that AA (and optimizers) only need to be 
>>> conservative when needed. It's not the case that right now we can detect the 
>>> few potential type punning cases and be conservative just there. We would 
>>> need to be conservative in e.g. every function that loads an argument pointer.
>>> I agree it's scary to introduce a new type in the IR; we don't do that very 
>>> often. But we need to fix the bugs in LLVM.
>> What doesn't add up for me is that one the one side the argument
>> "everything needs to be conservative" w/o byte types, but at the same
>> time we can somehow determine when to use byte types locally? If the
>> latter would not be the case, how would we know when we need a byte type?
>>
>> Why would we not be conservative/explicit about type punning whenever we
>> would introduce byte types with this proposal? I unfortunately don't
>> understand what the byte types give us. My best guess is that they
>> explicitly mark byte operations on pointers, which we could do otherwise
>> I assume (explicit p2i, i2p). If there is more to it, I think we need an
>> end to end example that shows how we cannot make type punning explicit
>> with current means and a new type is conceptually better.
>>
>> ~ Johannes
>>
>>
>>> Nuno
>>>
>>> -----Original Message-----
>>> From: Johannes Doerfert via llvm-dev
>>> Sent: 04 June 2021 17:56
>>> Subject: Re: [llvm-dev] [cfe-dev] [RFC] Introducing a byte type to LLVM
>>>
>>>
>>> On 6/4/21 11:35 AM, George Mitenkov wrote:
>>>> Hi Johannes,
>>>>
>>>> Sure! The underlying problem is that raw-memory access handlers are
>>>> treated as integers, while they are not really integers. Especially
>>>> std::byte that specifically states that it has raw-memory access
>>>> semantics. This semantic mismatch can make AA wrong and a pointer to
>>>> escape.
>>> I would assume being conservative is a possibility here, right?
>>> So if someone wants to type-pun a pointer we would do very little 
>>> optimizations around it.
>>>
>>>
>>>> Consider the following LLVM IR that copies a pointer:
>>>>
>>>> %src8 = bitcast i8** %src to i8*
>>>>
>>>> %dst8 = bitcast i8** %dst to i8*
>>>>
>>>> call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dst8, i8* %src8, i32 8, i1
>>>> false)
>>>>
>>>> %load = load i8*, i8** %dst
>>>>
>>>> %addr = ptrtoint i8* %load to i64
>>>>
>>>> ret i64 %addr
>>>>
>>>>
>>>> If we optimize the call to memcpy, then the IR becomes
>>>>
>>>>
>>>> %src64 = bitcast i8** %src to i64*
>>>>
>>>> %dst64 = bitcast i8** %dst to i64*
>>>>
>>>> %addr = load i64, i64* %src64, align 1
>>>>
>>>> store i64 %addr, i64* %dst64, align 1
>>>>
>>>> ret i64 %addr
>>>>
>>>>
>>>> Since there is no "data" type in LLVM like a byte, the ptrtoint is
>>>> optimized out and the pointer escapes. If we have used a pair of byte
>>>> load/store that would not happen.
>>> The pointer (load %src) does escape in this example, doesn't it?
>>>
>>> I don't get why "that would not happen" with byte load/store and why that 
>>> would be correct at all.
>>>
>>>
>>>> The mentioned bug is just an example, but this pattern can be
>>>>
>>>> replicated in other cases that optimize memcpy as integer load/stores and
>>>> drop ptrtoint,
>>>>
>>>> undermining alias analysis. To illustrate further, consider loading a
>>>> pointer like:
>>>>
>>>>
>>>> %v1 = load i8*, %p
>>>>
>>>> %v2 = load i8*, %p
>>>>
>>>>
>>>> Instcombine optimizes this into
>>>>
>>>>
>>>> %v1 = load i64, %p
>>>>
>>>> %v2 = inttoptr %v1
>>>>
>>>>
>>>> changing the underlying provenance (again, not good for AA).
>>> This example looks somehow broken (I'll assuming %v1 has integer type).
>>>
>>> I agree this is not a great optimization, but is it wrong? We should have
>>> done the opposite (ptr2int for v1) but even then I'm not sure what AA could
>>> derive here. We load a pointer and then we escape it into an integer. If the
>>> ptr2int is not removed, I'm not sure what AA should do about this. I'm also
>>> not sure how practical this is. Do we have some idea how often such
>>> cases exist?
>>>
>>>
>>>>     If we had a
>>>> byte type instead,
>>>>
>>>> then we could
>>>>
>>>> 1. preserve provenance
>>>>
>>>> 2. differentiate between integers and the raw data copied
>>>>
>>>>
>>>> This might be beneficial for some backends (e.g CHERI) that want to
>>>> differentiate between
>>>>
>>>> true integers and bytes that can be integers. The optimizations for known
>>>> integers
>>>>
>>>> can become more aggressive in this case.
>>>>
>>> It sounds to me like the problem is our optimization in the presence of
>>> int2ptr and ptr2int are (still) unsound. We should at least not remove
>>> ptr2int,
>>> is that it?
>>>
>>> That said, I still don't get how a byte type helps for these examples.
>>> More concretely:
>>>
>>> ref 1) How does a byte type preserve provenance? If it is somehow
>>> implicitly attached
>>>           would that not mean we need to now be careful doing
>>> optimizations on bytes (as we
>>>           should now be careful doing optimizations around p2i and i2p)?
>>> Literally shifting
>>>           the problem from looking for p2i and i2p to looking for bytecast?
>>>
>>> ref 2) What optimization can I do if I have the differentiation of
>>> integers and raw data?
>>>           Can we somehow get an idea of the optimization potential?
>>>
>>>
>>> ~ Johannes
>>>
>>>
>>>
>>>> Thanks,
>>>>
>>>> George
>>>>
>>>> On Fri, Jun 4, 2021 at 7:03 PM Johannes Doerfert <johannesdoerfert at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi George,
>>>>>
>>>>> On 6/4/21 10:24 AM, George Mitenkov via cfe-dev wrote:
>>>>>> Hi all,
>>>>>>
>>>>>> Together with Nuno Lopes and Juneyoung Lee we propose to add a new byte
>>>>>> type to LLVM to fix miscompilations due to load type punning. Please see
>>>>>> the proposal below. It would be great to hear the
>>>>>> feedback/comments/suggestions!
>>>>>>
>>>>>>
>>>>>> Motivation
>>>>>> ==========
>>>>>>
>>>>>> char and unsigned char are considered to be universal holders in C. They
>>>>>> can access raw memory and are used to implement memcpy. i8 is the LLVM’s
>>>>>> counterpart but it does not have such semantics, which is also not
>>>>>> desirable as it would disable many optimizations.
>>>>>>
>>>>>> We therefore propose to introduce a new byte type that would have a
>>>>> raw-memory
>>>>>> access semantics and can be differentiated from i8. This can help to fix
>>>>>> unsound optimizations and make the lowering of char, unsigned char or
>>>>>> std::byte correct. Currently, we denote the byte type as b<N>, where N is
>>>>>> the number of bits.
>>>>>>
>>>>>> In our semantics, byte type carries provenance and copying bytes does not
>>>>>> escape pointers, thereby benefiting alias analysis. If the memory
>>>>> contains
>>>>>> poison bits, then the result of loading a byte from it is bitwise poison.
>>>>> Could you elaborate on the motivation a bit. I'm not sure I follow.
>>>>> Maybe examples would be good. The only one I found is in
>>>>> https://bugs.llvm.org/show_bug.cgi?id=37469. For that one I'm
>>>>> wondering how much we would in reality loose if we make the required
>>>>> ptr2int explicit when people cast a pointer to an integer. In general,
>>>>> explicit encoding seems to me preferable and overall less work (new
>>>>> types kinda scare me TBH).
>>>>>
>>>>> ~ Johannes
>>>>>
>>>>>
>>>>>> Benefits
>>>>>> ========
>>>>>>
>>>>>> The small calls to memcpy and memmove that are transformed by Instcombine
>>>>>> into integer load/store pairs would be correctly transformed into the
>>>>>> loads/stores of the new byte type no matter what underlying value is
>>>>> being
>>>>>> copied (integral value or a pointer value).
>>>>>>
>>>>>>        1.
>>>>>>
>>>>>>        The new byte type solves the problem of copying the padded data,
>>>>> with no
>>>>>>        contamination of the loaded value due to poison bits of the padding.
>>>>>>        2.
>>>>>>
>>>>>>        When copying pointers as bytes implicit pointer-to-integer casts are
>>>>>>        avoided.The current setting of performing a memcpy using i8s leads to
>>>>>>        miscompilations (example: bug report 37469
>>>>>>        <https://bugs.llvm.org/show_bug.cgi?id=37469>) and is bad for alias
>>>>>>        analysis.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Lowering of char, unsigned char and std::byte
>>>>>>
>>>>>> ======================================
>>>>>>
>>>>>> For any function that takes char, unsigned char or std::byte as
>>>>> arguments,
>>>>>> we lower these to the b8 instead of i8. The motivation comes from the
>>>>> fact
>>>>>> that any pointer can be explicitly copied in bytes, and each byte can be
>>>>>> passed as an argument to the copying function. Since a copy like that can
>>>>>> escape a pointer, we need to generate the byte type to avoid the issue.
>>>>>>
>>>>>> Example
>>>>>>
>>>>>> void foo(unsigned char arg1, char arg2) {...}
>>>>>>
>>>>>>                 =>
>>>>>>
>>>>>> void @foo(zeroext b8 %arg1, signext b8 %arg2) {...}
>>>>>>
>>>>>>
>>>>>>
>>>>>> std::byte is defined as enum class byte : unsigned char {} and therefore
>>>>>> all bitwise operations over it can be lowered equivalently to operations
>>>>> on
>>>>>> unsigned char (operation is performed over zero-extended to i32 operands,
>>>>>> and the result is truncated back).
>>>>>>
>>>>>>
>>>>>> ==============================
>>>>>>
>>>>>> Byte Type Semantics (Version 1)
>>>>>>
>>>>>> ==============================
>>>>>>
>>>>>>
>>>>>> [allowed] alloca/load/store
>>>>>>
>>>>>> =====================
>>>>>>
>>>>>> The byte type is allowed to be allocated, stored and loaded. No semantic
>>>>>> changes needed.
>>>>>>
>>>>>> [allowed] zext/sext/trunc
>>>>>>
>>>>>> ====================
>>>>>>
>>>>>> In order to easily adapt the current replacement of i8 with b8, we want
>>>>> to
>>>>>> extend zext/sext/trunc instructions’ semantics to allow the first operand
>>>>>> as a byte type as well. This will reduce the number of instructions
>>>>> needed
>>>>>> for the cast.
>>>>>>
>>>>>> Example
>>>>>>
>>>>>> trunc i32 %int to b8
>>>>>>
>>>>>> zext i8 %char to b32
>>>>>>
>>>>>> [modified] bitcast
>>>>>>
>>>>>> ==============
>>>>>>
>>>>>> We modify the semantics of the bitcast to allow casts from and to the
>>>>> byte
>>>>>> types. If the byte has a value of the same kind (integral or a pointer)
>>>>> as
>>>>>> the other type, then the bitcast is a noop. Otherwise, the result is
>>>>> poison.
>>>>>> Example
>>>>>>
>>>>>> bitcast i<N> %val to b<N>
>>>>>>
>>>>>> bitcast i<N>* %val to b64
>>>>>>
>>>>>> bitcast b<N> %byte to i<N>     [byte has an integral value => noop]
>>>>>>
>>>>>>
>>>>>>
>>>>>> [new] bytecast
>>>>>>
>>>>>> ============
>>>>>>
>>>>>> During IR generation, we cannot deduce whether the byte value contains a
>>>>>> pointer or an integral value, and therefore we cannot create a bitcast.
>>>>>>
>>>>>> Example
>>>>>>
>>>>>> int @cast(char c) { return (int) c; }
>>>>>>
>>>>>> Current
>>>>>>
>>>>>> Proposed
>>>>>>
>>>>>> i32 @cast(i8 signext %c) {
>>>>>>
>>>>>>       %1 = alloca i8, align 1
>>>>>>
>>>>>>       store i8 %c, i8* %1, align 1
>>>>>>
>>>>>>       %2 = load i8, i8* %1, align 1
>>>>>>
>>>>>>       %3 = sext i8 %2 to i32
>>>>>>
>>>>>>       ret i32 %3
>>>>>>
>>>>>> }
>>>>>>
>>>>>> i32 @cast(b8 %c) {
>>>>>>
>>>>>>       %1 = alloca b8
>>>>>>
>>>>>>       store b8 %c, b8* %11
>>>>>>
>>>>>>       %2 = load b8, b8* %1
>>>>>>
>>>>>>       %3 =    ?    b8 %2 to i32
>>>>>>
>>>>>>       ret i32 %3
>>>>>>
>>>>>> }
>>>>>>
>>>>>> In this example, the operation  ?  can be sext (b8 is i8) or can be
>>>>>> ptrtoint if the byte has a pointer value.
>>>>>>
>>>>>> We therefore introduce a new bytecast instruction. The frontend will
>>>>> always
>>>>>> produce a bytecast, which can then be optimized into a more specific cast
>>>>>> if necessary. Bytecast operands must have the same bitwidth (or size).
>>>>>>
>>>>>> The semantics of bytecast instruction is:
>>>>>>
>>>>>> bytecast b<N> %byte to T
>>>>>>
>>>>>> The kind of the value of the byte
>>>>>>
>>>>>> T
>>>>>>
>>>>>> Semantics
>>>>>>
>>>>>> Pointer
>>>>>>
>>>>>> Integral
>>>>>>
>>>>>> ptrtoint
>>>>>>
>>>>>> Pointer
>>>>>>
>>>>>> Pointer
>>>>>>
>>>>>> bitcast
>>>>>>
>>>>>> Integral
>>>>>>
>>>>>> Integral
>>>>>>
>>>>>> integral casts (zext, sext, etc.)
>>>>>>
>>>>>> Integral
>>>>>>
>>>>>> Pointer
>>>>>>
>>>>>> inttoptr
>>>>>>
>>>>>>
>>>>>>
>>>>>> Essentially, this instruction is a selection of the right casts. Once the
>>>>>> compiler has been able to prove the kind of the byte’s value, the
>>>>> bytecast
>>>>>> can be replaced with appropriate cast or become a noop.
>>>>>>
>>>>>> Example
>>>>>>
>>>>>> bytecast b64 %bytes to i32*
>>>>>>
>>>>>> bytecast b8 %byte to i8
>>>>>>
>>>>>>
>>>>>>
>>>>>> [disabled] and/or/xor
>>>>>>
>>>>>> =================
>>>>>>
>>>>>> We do not allow bitwise operations over byte types because it’s not
>>>>> trivial
>>>>>> to define all cases, particularly the ones involving pointers. If these
>>>>>> operations are useful for optimizations and/or some frontends, semantics
>>>>>> for these can be considered separately.
>>>>>>
>>>>>> If the operation is desired on the frontend level, then the default
>>>>>> generated code can always cast the byte type to an integer to operate on
>>>>>> integer values.
>>>>>>
>>>>>>
>>>>>>
>>>>>> [disabled] arithmetic
>>>>>>
>>>>>> =================
>>>>>>
>>>>>> Performing arithmetic operations over the byte type is similarly not
>>>>>> allowed (A valid question is what does it mean to add to bytes of raw
>>>>>> memory?). If we want to perform arithmetic, we need to cast a byte to an
>>>>>> integer (via sext/zext/trunc explicitly).
>>>>>>
>>>>>>
>>>>>>
>>>>>> [modified] comparison
>>>>>>
>>>>>> ==================
>>>>>>
>>>>>> We allow performing comparisons, as we may potentially want to compare
>>>>> the
>>>>>> ordering of the memory instances, check for null, etc. Comparison is also
>>>>>> needed since char types are commonly compared. We define the following
>>>>>> semantic of the byte type comparison.
>>>>>>
>>>>>> Case 1: The values of the bytes have the same kinds: compare the values.
>>>>>>
>>>>>> Case 2: The values of the bytes have the different kinds: do a bytecast
>>>>> of
>>>>>> the non-integral type to the other type and compare the integral values.
>>>>>>
>>>>>>
>>>>>> Example
>>>>>>
>>>>>> =======
>>>>>>
>>>>>> unsigned char sum(unsigned char a, unsigned char b) { return a + b; }
>>>>>>
>>>>>> Current
>>>>>>
>>>>>> Proposed
>>>>>>
>>>>>> zeroext i8 @sum(i8 zeroext %a, i8 zeroext %b) {
>>>>>>
>>>>>>       %1 = alloca i8
>>>>>>
>>>>>>       %2 = alloca i8
>>>>>>
>>>>>>       store i8 %a, i8* %1
>>>>>>
>>>>>>       store i8 %b, i8* %2
>>>>>>
>>>>>>       %3 = load i8, i8* %1
>>>>>>
>>>>>>       %4 = zext i8 %3 to i32
>>>>>>
>>>>>>       %5 = load i8, i8* %2
>>>>>>
>>>>>>       %6 = zext i8 %5 to i32
>>>>>>
>>>>>>       %7 = add nsw i32 %4, %6
>>>>>>
>>>>>>       %8 = trunc i32 %7 to i8
>>>>>>
>>>>>>       ret i8 %8
>>>>>>
>>>>>> }
>>>>>>
>>>>>> zeroext b8 @sum(b8 zeroext %a, b8 zeroext %b) {
>>>>>>
>>>>>>       %1 = alloca b8
>>>>>>
>>>>>>       %2 = alloca b8
>>>>>>
>>>>>>       store b8 %a, b8* %1
>>>>>>
>>>>>>       store b8 %b, b8* %2
>>>>>>
>>>>>>       %3 = load b8, b8* %1
>>>>>>
>>>>>>       %4 = bytecast b8 %3 to i8
>>>>>>
>>>>>>       %5 = zext i8 %4 to i32
>>>>>>
>>>>>>       %6 = load b8, b8* %2
>>>>>>
>>>>>>       %7 = bytecast b8 %6 to i8
>>>>>>
>>>>>>       %8 = zext i8 %7 to i32
>>>>>>
>>>>>>       %9 = add nsw i32 %5, %8
>>>>>>
>>>>>>       %10 = trunc i32 %9 to b8
>>>>>>
>>>>>>       ret b8 %10
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> George
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> cfe-dev mailing list
>>>>>> cfe-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>

-- 
Website: https://people.mpi-sws.org/~jung/