[llvm-dev] Undef and Poison round table follow-up & a plan
Juneyoung Lee via llvm-dev
llvm-dev at lists.llvm.org
Fri Oct 9 18:29:35 PDT 2020
>
> Okay, it's just not immediately undefined behaviour. The C model has more
> issues because of the problem with how "trap representation" is defined
> (which precludes trap representations for unsigned char, two's complement
> signed char, etc.).
This interpretation is further stressed because C only explicitly ascribes
> undefined behaviour to trap representations on loads and stores.
In this case, freeze(poison) can be used to represent an uninitialized
value, because using freeze(poison) never raises UB.
However, I couldn't find relevant statements from C standard about these
two. Could you elaborate a bit more please?
Thanks for raising this. So, it can be that there are:
> - values that induce broader undefined behaviour (with certain operations),
> - values that may change on their own, and
> - values that become defined to an unspecified value (not of the first
> category) at some point in time.
The unspecified values put into padding in C appear to be in the last
> category. Is that how LLVM treats them?
With the current version of LLVM, there is slightly a gap, and with our
suggestion this becomes correct.
https://godbolt.org/z/nxhnTK
Reading padding is optimized to undef, but according to the previous link (
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787), this
isn't valid because unspecified value is a 'stable' value.
In our suggestion, well-defined bits are stored into padding at object
creation, so it becomes okay.
Juneyoung
On Fri, Oct 9, 2020 at 11:58 PM Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:
> On Thu, Oct 8, 2020 at 11:54 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
>
>> // Members are initialized to poison at object creation.
>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>>>> p[0] is an i8, so it shouldn't be poison?
>>>
>>>
>> My interpretation of standard is that reading uninitialized char can also
>> yield trap representation.
>> If uninitialized, char variable has indeterminate value, and C/C++ does
>> not seem to forbid reading trap representation from it.
>>
> Okay, it's just not immediately undefined behaviour. The C model has more
> issues because of the problem with how "trap representation" is defined
> (which precludes trap representations for unsigned char, two's complement
> signed char, etc.). This interpretation is further stressed because C only
> explicitly ascribes undefined behaviour to trap representations on loads
> and stores.
>
>
>> C++14 explicitly has an example that shows it is indeterminate value at
>> 3.3.2.1 :
>> ```
>> The point of declaration for a name is immediately after its complete
>> declarator (Clause 8) and before its initializer (if any), except as noted
>> below. [Example:
>> unsigned char x = 12;
>> { unsigned char x = x; }
>> Here the second x is initialized with its own (*indeterminate*) value.
>> —end example]
>> ```
>>
>> It seems there was a phrase saying that reading indeterminate value as an
>> unsigned char should yield unspecified value in the C++14 draft in the
>> past, but it is removed:
>> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
>> The removed phrase did not exist in C++11, so I believe it is fine to use
>> poison for uninitialized char types.
>>
> Thanks for raising this. So, it can be that there are:
> - values that induce broader undefined behaviour (with certain operations),
> - values that may change on their own, and
> - values that become defined to an unspecified value (not of the first
> category) at some point in time.
>
> The unspecified values put into padding in C appear to be in the last
> category. Is that how LLVM treats them?
>
>
>>
>> Juneyoung
>>
>> On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong <
>> hubert.reinterpretcast at gmail.com> wrote:
>>
>>> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee <
>>> juneyoung.lee at sf.snu.ac.kr> wrote:
>>>
>>>> It is UB when a poison is passed to certain operations that raise UB on
>>>> poison, such as division by poison/dereferencing poison pointer/branching
>>>> on poison condition/etc.
>>>>
>>> Got it. Thanks.
>>>
>>>
>>>> Otherwise, poison is simply propagated, but it does not raise UB
>>>> Copying poison bytes is okay:
>>>>
>>>> // Members are initialized to poison at object creation.
>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>>>>
>>> p[0] is an i8, so it shouldn't be poison?
>>>
>>>
>>>> q = alloca {i8, i32} // we want to copy p to q
>>>> v = load i8* p[0] // v is poison
>>>> store i8 v, i8* q[0] // poison is simply copied; no UB happened
>>>>
>>>> Similarly, passing/returning poison is allowed as well.
>>>>
>>>> Juneyoung
>>>>
>>>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>
>>>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <
>>>>> juneyoung.lee at sf.snu.ac.kr> wrote:
>>>>>
>>>>>> > It is important to note that this applies to trap representations
>>>>>> and not to unspecified values. A structure or union never has a trap
>>>>>> representation.
>>>>>> Yes, nondeterministic bits would work for padding of struct/union, as
>>>>>> described in (3) The third case is the value of struct/union padding.
>>>>>> For the members of struct/union, it is allowed to have trap
>>>>>> representation, so poison can be used.
>>>>>>
>>>>> At what point are the members considered poison? For
>>>>> copying/passing/returning a struct or union, there is no UB even if some
>>>>> members are uninitialized.
>>>>>
>>>>>
>>>>>>
>>>>>> Juneyoung
>>>>>>
>>>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>>>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>>>
>>>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>>
>>>>>>>> Thank everyone who participated in the (impromptu) round table
>>>>>>>> discussion on Tuesday.
>>>>>>>> For those who are interested, I share the summary of the discussion.
>>>>>>>> Also, I share a short-term plan regarding this issue and relevant
>>>>>>>> patches.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Fixing Miscompilations using Freeze*
>>>>>>>> -----------------------------------
>>>>>>>>
>>>>>>>> To reduce the cost of fixing miscompilations using freeze
>>>>>>>> instruction, we need to
>>>>>>>> optimize freeze away whenever possible.
>>>>>>>> Using the no-undef/poison assumption from the source language
>>>>>>>> (C/C++ in
>>>>>>>> this context) can play a significant role.
>>>>>>>> To make use the assumptions, here are short-term goals:
>>>>>>>>
>>>>>>>> *1. Preserve no-undef/poison assumption of function arguments from
>>>>>>>> C/C++ when valid.*
>>>>>>>>
>>>>>>>> There is an ongoing relevant patch (that is written by others):
>>>>>>>> https://reviews.llvm.org/D81678
>>>>>>>>
>>>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++
>>>>>>>> when valid.*
>>>>>>>>
>>>>>>>> Reading an indeterminate value from an lvalue that does not have
>>>>>>>> char or
>>>>>>>> std::byte type is UB [1].
>>>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>>>>>> attaching a new
>>>>>>>> !noundef metadata to such `load`s.
>>>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>>>>>> The clang-side change is going to be made after D81678 is reviewed,
>>>>>>>> because it is likely
>>>>>>>> that this patch will have a lot of changes in clang tests.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Replacing Undef with Poison*
>>>>>>>> ---------------------------
>>>>>>>>
>>>>>>>> Since undef is known to be the source of many optimizations due to
>>>>>>>> its complexity,
>>>>>>>> we'd like to suggest gradually moving towards using poison only.
>>>>>>>> To make it, (1) `poison` constant should be introduced into LLVM IR
>>>>>>>> first, and (2)
>>>>>>>> transformations that introduce `undef` should be updated to
>>>>>>>> introduce `poison` instead.
>>>>>>>>
>>>>>>>> For the step (2), we need an experimental result showing that it
>>>>>>>> does not cause
>>>>>>>> performance degradation. This relies on better support for freeze
>>>>>>>> (the
>>>>>>>> no-undef/poison analysis patches).
>>>>>>>>
>>>>>>>> *1. Introduce a new `poison` constant into IR*:
>>>>>>>> https://reviews.llvm.org/D71126
>>>>>>>>
>>>>>>>> Note that `poison` constant can be used as a true placeholder value
>>>>>>>> as well.
>>>>>>>> Undef cannot be used in general because it is less undefined than
>>>>>>>> poison.
>>>>>>>>
>>>>>>>> *2. Update transformations that introduce `undef` to introduce
>>>>>>>> `poison` instead*
>>>>>>>>
>>>>>>>> (1) There are transformations that introduce `undef` as a
>>>>>>>> placeholder (e.g. phi operand
>>>>>>>> from an unreachable block).
>>>>>>>> For these, `poison` can be used instead.
>>>>>>>>
>>>>>>>> (2) The value of an uninitialized object (automatic or dynamic).
>>>>>>>> They are indeterminate values in C/C++, so okay to use poison
>>>>>>>> instead.
>>>>>>>> A tricky case is a bitfield access, and we have two
>>>>>>>> possible solutions:
>>>>>>>>
>>>>>>>> - i. Introduce a very-packed struct type
>>>>>>>> ```
>>>>>>>> <C>
>>>>>>>> struct {
>>>>>>>> int a:2, b:6;
>>>>>>>> } s;
>>>>>>>>
>>>>>>>> v = s.a;
>>>>>>>>
>>>>>>>> =>
>>>>>>>>
>>>>>>>> <IR>
>>>>>>>>
>>>>>>>> s = alloca
>>>>>>>>
>>>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>>>>>> v = extractvalue tmp, 0
>>>>>>>> ```
>>>>>>>> * Pros: Can be used to precisely lower C/C++'s struct typed
>>>>>>>> function argument into IR
>>>>>>>> (currently clang coerces a struct into int if small enough; I'll
>>>>>>>> explain about this detail if anyone requests)
>>>>>>>> * Cons: Since optimizations aren’t aware of the new type, they
>>>>>>>> should be updated
>>>>>>>>
>>>>>>>> - ii. Use load-freeze
>>>>>>>> ```
>>>>>>>> <C>
>>>>>>>> struct {
>>>>>>>> int a:2, b:6;
>>>>>>>> } s;
>>>>>>>>
>>>>>>>> v = s.a;
>>>>>>>>
>>>>>>>> =>
>>>>>>>>
>>>>>>>> <IR>
>>>>>>>> s = alloca
>>>>>>>>
>>>>>>>> // Poison bits are frozen and returned
>>>>>>>> tmp = *load freeze* i8* s
>>>>>>>> v = tmp & 3
>>>>>>>> ```
>>>>>>>> * Pros: The change is simpler
>>>>>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze
>>>>>>>> (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>>>>>>
>>>>>>>>
>>>>>>>> (3) The third case is the value of struct/union padding.
>>>>>>>> Padding is filled with unspecified value in C, so it is too
>>>>>>>> undefined to use poison.
>>>>>>>> We can fill it with defined bits nondeterministically chosen at
>>>>>>>> allocation time (freeze poison).
>>>>>>>>
>>>>>>>> ```
>>>>>>>> <C>
>>>>>>>> struct {
>>>>>>>> char a; // 3 bytes padding
>>>>>>>> int b;
>>>>>>>> } s;
>>>>>>>>
>>>>>>>> v = s.b;
>>>>>>>>
>>>>>>>> =>
>>>>>>>>
>>>>>>>> <IR>
>>>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a
>>>>>>>> type-dependent manner
>>>>>>>> // s[0], s[4~7]: poison
>>>>>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>>>>>
>>>>>>>> s2 = gep (bitcast s to i8*), 4
>>>>>>>> v = load i32 s2
>>>>>>>> ```
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Juneyoung
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> [1]
>>>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>>>>>> representation and is read by an lvalue expression that does not have
>>>>>>>> character type, the behavior is undefined.
>>>>>>>> (Similarly, C17 6.2.6.1.5)
>>>>>>>>
>>>>>>> It is important to note that this applies to trap representations
>>>>>>> and not to unspecified values. A structure or union never has a trap
>>>>>>> representation.
>>>>>>>
>>>>>>>
>>>>>>>> C++14 8.5.12: If an indeterminate value is produced by an
>>>>>>>> evaluation, the behavior is undefined except in the following cases: If an
>>>>>>>> indeterminate value of unsigned narrow character type ...
>>>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>>>>>
>>>>>>> While loading undef for the unsigned character type case merely
>>>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on
>>>>>>> an undef i8 is also undefined behaviour.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> LLVM Developers mailing list
>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Juneyoung Lee
>>>>>> Software Foundation Lab, Seoul National University
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Juneyoung Lee
>>>> Software Foundation Lab, Seoul National University
>>>>
>>>
>>
>> --
>>
>> Juneyoung Lee
>> Software Foundation Lab, Seoul National University
>>
>
--
Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201010/67634be8/attachment.html>
More information about the llvm-dev
mailing list