[llvm-dev] Undef and Poison round table follow-up & a plan

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Sun Oct 11 21:48:43 PDT 2020


>
> The definition of trap representation in C17 subclause 6.2.6.1 paragraph 5
> refers to object representations that do not represent a value of the type.
> No such object representations can exist for an unsigned char because all
> of the bits in the object representation of that type are value bits and
> every combination of value bits represent a value of the type (see C17
> subclause 6.2.6.2 paragraph 1). It seems C17 still allows trap
> representations for two's complement signed char, but that is only if
> SCHAR_MIN == -SCHAR_MAX.

I see, thank you for the info.

Juneyoung

On Sat, Oct 10, 2020 at 1:08 PM Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:

> On Fri, Oct 9, 2020 at 9:29 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
>
>> Okay, it's just not immediately undefined behaviour. The C model has more
>>> issues because of the problem with how "trap representation" is defined
>>> (which precludes trap representations for unsigned char, two's complement
>>> signed char, etc.).
>>
>> This interpretation is further stressed because C only explicitly
>>> ascribes undefined behaviour to trap representations on loads and stores.
>>
>>
>> In this case, freeze(poison) can be used to represent an uninitialized
>> value, because using freeze(poison) never raises UB.
>> However, I couldn't find relevant statements from C standard about these
>> two. Could you elaborate a bit more please?
>>
> The definition of trap representation in C17 subclause 6.2.6.1 paragraph 5
> refers to object representations that do not represent a value of the type.
> No such object representations can exist for an unsigned char because all
> of the bits in the object representation of that type are value bits and
> every combination of value bits represent a value of the type (see C17
> subclause 6.2.6.2 paragraph 1). It seems C17 still allows trap
> representations for two's complement signed char, but that is only if
> SCHAR_MIN == -SCHAR_MAX.
>
>
>> Thanks for raising this. So, it can be that there are:
>>> - values that induce broader undefined behaviour (with certain
>>> operations),
>>> - values that may change on their own, and
>>> - values that become defined to an unspecified value (not of the first
>>> category) at some point in time.
>>
>>
>>
>> The unspecified values put into padding in C appear to be in the last
>>> category. Is that how LLVM treats them?
>>
>>
>> With the current version of LLVM, there is slightly a gap, and with our
>> suggestion this becomes correct.
>> https://godbolt.org/z/nxhnTK
>> Reading padding is optimized to undef, but according to the previous link
>> (http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787),
>> this isn't valid because unspecified value is a 'stable' value.
>> In our suggestion, well-defined bits are stored into padding at object
>> creation, so it becomes okay.
>>
>> Juneyoung
>>
>> On Fri, Oct 9, 2020 at 11:58 PM Hubert Tong <
>> hubert.reinterpretcast at gmail.com> wrote:
>>
>>> On Thu, Oct 8, 2020 at 11:54 PM Juneyoung Lee <
>>> juneyoung.lee at sf.snu.ac.kr> wrote:
>>>
>>>> // Members are initialized to poison at object creation.
>>>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>>>>>> p[0] is an i8, so it shouldn't be poison?
>>>>>
>>>>>
>>>> My interpretation of standard is that reading uninitialized char can
>>>> also yield trap representation.
>>>> If uninitialized, char variable has indeterminate value, and C/C++ does
>>>> not seem to forbid reading trap representation from it.
>>>>
>>> Okay, it's just not immediately undefined behaviour. The C model has
>>> more issues because of the problem with how "trap representation" is
>>> defined (which precludes trap representations for unsigned char, two's
>>> complement signed char, etc.). This interpretation is further stressed
>>> because C only explicitly ascribes undefined behaviour to trap
>>> representations on loads and stores.
>>>
>>>
>>>> C++14 explicitly has an example that shows it is indeterminate value at
>>>> 3.3.2.1 :
>>>> ```
>>>> The point of declaration for a name is immediately after its complete
>>>> declarator (Clause 8) and before its initializer (if any), except as noted
>>>> below. [Example:
>>>>   unsigned char x = 12;
>>>>   { unsigned char x = x; }
>>>> Here the second x is initialized with its own (*indeterminate*) value.
>>>> —end example]
>>>> ```
>>>>
>>>> It seems there was a phrase saying that reading indeterminate value as
>>>> an unsigned char should yield unspecified value in the C++14 draft in the
>>>> past, but it is removed:
>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
>>>> The removed phrase did not exist in C++11, so I believe it is fine to
>>>> use poison for uninitialized char types.
>>>>
>>> Thanks for raising this. So, it can be that there are:
>>> - values that induce broader undefined behaviour (with certain
>>> operations),
>>> - values that may change on their own, and
>>> - values that become defined to an unspecified value (not of the first
>>> category) at some point in time.
>>>
>>> The unspecified values put into padding in C appear to be in the last
>>> category. Is that how LLVM treats them?
>>>
>>>
>>>>
>>>> Juneyoung
>>>>
>>>> On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong <
>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>
>>>>> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee <
>>>>> juneyoung.lee at sf.snu.ac.kr> wrote:
>>>>>
>>>>>> It is UB when a poison is passed to certain operations that raise UB
>>>>>> on poison, such as division by poison/dereferencing poison
>>>>>> pointer/branching on poison condition/etc.
>>>>>>
>>>>> Got it. Thanks.
>>>>>
>>>>>
>>>>>> Otherwise, poison is simply propagated, but it does not raise UB
>>>>>> Copying poison bytes is okay:
>>>>>>
>>>>>> // Members are initialized to poison at object creation.
>>>>>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>>>>>>
>>>>> p[0] is an i8, so it shouldn't be poison?
>>>>>
>>>>>
>>>>>> q = alloca {i8, i32} // we want to copy p to q
>>>>>> v = load i8* p[0] // v is poison
>>>>>> store i8 v, i8* q[0] // poison is simply copied; no UB happened
>>>>>>
>>>>>> Similarly, passing/returning poison is allowed as well.
>>>>>>
>>>>>> Juneyoung
>>>>>>
>>>>>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
>>>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>>>
>>>>>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <
>>>>>>> juneyoung.lee at sf.snu.ac.kr> wrote:
>>>>>>>
>>>>>>>> > It is important to note that this applies to trap representations
>>>>>>>> and not to unspecified values. A structure or union never has a trap
>>>>>>>> representation.
>>>>>>>> Yes, nondeterministic bits would work for padding of struct/union,
>>>>>>>> as described in (3) The third case is the value of struct/union padding.
>>>>>>>> For the members of struct/union, it is allowed to have trap
>>>>>>>> representation, so poison can be used.
>>>>>>>>
>>>>>>> At what point are the members considered poison? For
>>>>>>> copying/passing/returning a struct or union, there is no UB even if some
>>>>>>> members are uninitialized.
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> Juneyoung
>>>>>>>>
>>>>>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>>>>>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>>>>>
>>>>>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>>>>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>>>>>
>>>>>>>>>> Hello all,
>>>>>>>>>>
>>>>>>>>>> Thank everyone who participated in the (impromptu) round table
>>>>>>>>>> discussion on Tuesday.
>>>>>>>>>> For those who are interested, I share the summary of the
>>>>>>>>>> discussion.
>>>>>>>>>> Also, I share a short-term plan regarding this issue and relevant
>>>>>>>>>> patches.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Fixing Miscompilations using Freeze*
>>>>>>>>>> -----------------------------------
>>>>>>>>>>
>>>>>>>>>> To reduce the cost of fixing miscompilations using freeze
>>>>>>>>>> instruction, we need to
>>>>>>>>>> optimize freeze away whenever possible.
>>>>>>>>>> Using the no-undef/poison assumption from the source language
>>>>>>>>>> (C/C++ in
>>>>>>>>>> this context) can play a significant role.
>>>>>>>>>> To make use the assumptions, here are short-term goals:
>>>>>>>>>>
>>>>>>>>>> *1. Preserve no-undef/poison assumption of function arguments
>>>>>>>>>> from C/C++ when valid.*
>>>>>>>>>>
>>>>>>>>>> There is an ongoing relevant patch (that is written by others):
>>>>>>>>>> https://reviews.llvm.org/D81678
>>>>>>>>>>
>>>>>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++
>>>>>>>>>> when valid.*
>>>>>>>>>>
>>>>>>>>>> Reading an indeterminate value from an lvalue that does not have
>>>>>>>>>> char or
>>>>>>>>>> std::byte type is UB [1].
>>>>>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>>>>>>>> attaching a new
>>>>>>>>>> !noundef metadata to such `load`s.
>>>>>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>>>>>>>> The clang-side change is going to be made after D81678 is
>>>>>>>>>> reviewed, because it is likely
>>>>>>>>>> that this patch will have a lot of changes in clang tests.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> *Replacing Undef with Poison*
>>>>>>>>>> ---------------------------
>>>>>>>>>>
>>>>>>>>>> Since undef is known to be the source of many optimizations due
>>>>>>>>>> to its complexity,
>>>>>>>>>> we'd like to suggest gradually moving towards using poison only.
>>>>>>>>>> To make it, (1) `poison` constant should be introduced into LLVM
>>>>>>>>>> IR first, and (2)
>>>>>>>>>> transformations that introduce `undef` should be updated to
>>>>>>>>>> introduce `poison` instead.
>>>>>>>>>>
>>>>>>>>>> For the step (2), we need an experimental result showing that it
>>>>>>>>>> does not cause
>>>>>>>>>> performance degradation. This relies on better support for freeze
>>>>>>>>>> (the
>>>>>>>>>> no-undef/poison analysis patches).
>>>>>>>>>>
>>>>>>>>>> *1. Introduce a new `poison` constant into IR*:
>>>>>>>>>> https://reviews.llvm.org/D71126
>>>>>>>>>>
>>>>>>>>>> Note that `poison` constant can be used as a true placeholder
>>>>>>>>>> value as well.
>>>>>>>>>> Undef cannot be used in general because it is less undefined than
>>>>>>>>>> poison.
>>>>>>>>>>
>>>>>>>>>> *2. Update transformations that introduce `undef` to introduce
>>>>>>>>>> `poison` instead*
>>>>>>>>>>
>>>>>>>>>> (1) There are transformations that introduce `undef` as a
>>>>>>>>>> placeholder (e.g. phi operand
>>>>>>>>>> from an unreachable block).
>>>>>>>>>> For these, `poison` can be used instead.
>>>>>>>>>>
>>>>>>>>>> (2) The value of an uninitialized object (automatic or dynamic).
>>>>>>>>>> They are indeterminate values in C/C++, so okay to use poison
>>>>>>>>>> instead.
>>>>>>>>>> A tricky case is a bitfield access, and we have two
>>>>>>>>>> possible solutions:
>>>>>>>>>>
>>>>>>>>>> - i. Introduce a very-packed struct type
>>>>>>>>>> ```
>>>>>>>>>> <C>
>>>>>>>>>> struct {
>>>>>>>>>>   int a:2, b:6;
>>>>>>>>>> } s;
>>>>>>>>>>
>>>>>>>>>> v = s.a;
>>>>>>>>>>
>>>>>>>>>> =>
>>>>>>>>>>
>>>>>>>>>> <IR>
>>>>>>>>>>
>>>>>>>>>> s = alloca
>>>>>>>>>>
>>>>>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>>>>>>>> v = extractvalue tmp, 0
>>>>>>>>>> ```
>>>>>>>>>>   * Pros: Can be used to precisely lower C/C++'s struct typed
>>>>>>>>>> function argument into IR
>>>>>>>>>> (currently clang coerces a struct into int if small enough; I'll
>>>>>>>>>> explain about this detail if anyone requests)
>>>>>>>>>>   * Cons: Since optimizations aren’t aware of the new type, they
>>>>>>>>>> should be updated
>>>>>>>>>>
>>>>>>>>>> - ii. Use load-freeze
>>>>>>>>>> ```
>>>>>>>>>> <C>
>>>>>>>>>> struct {
>>>>>>>>>>   int a:2, b:6;
>>>>>>>>>> } s;
>>>>>>>>>>
>>>>>>>>>> v = s.a;
>>>>>>>>>>
>>>>>>>>>> =>
>>>>>>>>>>
>>>>>>>>>> <IR>
>>>>>>>>>> s = alloca
>>>>>>>>>>
>>>>>>>>>> // Poison bits are frozen and returned
>>>>>>>>>> tmp = *load freeze* i8* s
>>>>>>>>>> v = tmp & 3
>>>>>>>>>> ```
>>>>>>>>>>   * Pros: The change is simpler
>>>>>>>>>>   * Cons: Store forwarding isn’t free; needs insertion of freeze
>>>>>>>>>>     (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> (3) The third case is the value of struct/union padding.
>>>>>>>>>> Padding is filled with unspecified value in C, so it is too
>>>>>>>>>> undefined to use poison.
>>>>>>>>>> We can fill it with defined bits nondeterministically chosen at
>>>>>>>>>> allocation time (freeze poison).
>>>>>>>>>>
>>>>>>>>>> ```
>>>>>>>>>> <C>
>>>>>>>>>> struct {
>>>>>>>>>>   char a; // 3 bytes padding
>>>>>>>>>>   int b;
>>>>>>>>>> } s;
>>>>>>>>>>
>>>>>>>>>> v = s.b;
>>>>>>>>>>
>>>>>>>>>> =>
>>>>>>>>>>
>>>>>>>>>> <IR>
>>>>>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a
>>>>>>>>>> type-dependent manner
>>>>>>>>>> // s[0], s[4~7]: poison
>>>>>>>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>>>>>>>
>>>>>>>>>> s2 = gep (bitcast s to i8*), 4
>>>>>>>>>> v = load i32 s2
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Juneyoung
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> [1]
>>>>>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>>>>>>>> representation and is read by an lvalue expression that does not have
>>>>>>>>>> character type, the behavior is undefined.
>>>>>>>>>> (Similarly, C17 6.2.6.1.5)
>>>>>>>>>>
>>>>>>>>> It is important to note that this applies to trap representations
>>>>>>>>> and not to unspecified values. A structure or union never has a trap
>>>>>>>>> representation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> C++14 8.5.12: If an indeterminate value is produced by an
>>>>>>>>>> evaluation, the behavior is undefined except in the following cases: If an
>>>>>>>>>> indeterminate value of unsigned narrow character type ...
>>>>>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>>>>>>>
>>>>>>>>> While loading undef for the unsigned character type case merely
>>>>>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on
>>>>>>>>> an undef i8 is also undefined behaviour.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> LLVM Developers mailing list
>>>>>>>>>> llvm-dev at lists.llvm.org
>>>>>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>
>>>>>>>> Juneyoung Lee
>>>>>>>> Software Foundation Lab, Seoul National University
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>>
>>>>>> Juneyoung Lee
>>>>>> Software Foundation Lab, Seoul National University
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Juneyoung Lee
>>>> Software Foundation Lab, Seoul National University
>>>>
>>>
>>
>> --
>>
>> Juneyoung Lee
>> Software Foundation Lab, Seoul National University
>>
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201012/f0a1a247/attachment.html>


More information about the llvm-dev mailing list