[llvm-dev] Undef and Poison round table follow-up & a plan

Juneyoung Lee via llvm-dev llvm-dev at lists.llvm.org
Thu Oct 8 20:54:20 PDT 2020


>
> // Members are initialized to poison at object creation.
>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>> p[0] is an i8, so it shouldn't be poison?
>
>
My interpretation of standard is that reading uninitialized char can also
yield trap representation.
If uninitialized, char variable has indeterminate value, and C/C++ does not
seem to forbid reading trap representation from it.
C++14 explicitly has an example that shows it is indeterminate value at
3.3.2.1 :
```
The point of declaration for a name is immediately after its complete
declarator (Clause 8) and before its initializer (if any), except as noted
below. [Example:
  unsigned char x = 12;
  { unsigned char x = x; }
Here the second x is initialized with its own (*indeterminate*) value. —end
example]
```

It seems there was a phrase saying that reading indeterminate value as an
unsigned char should yield unspecified value in the C++14 draft in the
past, but it is removed:
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_defects.html#1787
The removed phrase did not exist in C++11, so I believe it is fine to use
poison for uninitialized char types.

Juneyoung

On Fri, Oct 9, 2020 at 12:19 PM Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:

> On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
>
>> It is UB when a poison is passed to certain operations that raise UB on
>> poison, such as division by poison/dereferencing poison pointer/branching
>> on poison condition/etc.
>>
> Got it. Thanks.
>
>
>> Otherwise, poison is simply propagated, but it does not raise UB
>> Copying poison bytes is okay:
>>
>> // Members are initialized to poison at object creation.
>> p = alloca {i8, i32} // p[0], p[4~7] are poison
>>
> p[0] is an i8, so it shouldn't be poison?
>
>
>> q = alloca {i8, i32} // we want to copy p to q
>> v = load i8* p[0] // v is poison
>> store i8 v, i8* q[0] // poison is simply copied; no UB happened
>>
>> Similarly, passing/returning poison is allowed as well.
>>
>> Juneyoung
>>
>> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
>> hubert.reinterpretcast at gmail.com> wrote:
>>
>>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
>>> wrote:
>>>
>>>> > It is important to note that this applies to trap representations and
>>>> not to unspecified values. A structure or union never has a trap
>>>> representation.
>>>> Yes, nondeterministic bits would work for padding of struct/union, as
>>>> described in (3) The third case is the value of struct/union padding.
>>>> For the members of struct/union, it is allowed to have trap
>>>> representation, so poison can be used.
>>>>
>>> At what point are the members considered poison? For
>>> copying/passing/returning a struct or union, there is no UB even if some
>>> members are uninitialized.
>>>
>>>
>>>>
>>>> Juneyoung
>>>>
>>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>>>> hubert.reinterpretcast at gmail.com> wrote:
>>>>
>>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> wrote:
>>>>>
>>>>>> Hello all,
>>>>>>
>>>>>> Thank everyone who participated in the (impromptu) round table
>>>>>> discussion on Tuesday.
>>>>>> For those who are interested, I share the summary of the discussion.
>>>>>> Also, I share a short-term plan regarding this issue and relevant
>>>>>> patches.
>>>>>>
>>>>>>
>>>>>> *Fixing Miscompilations using Freeze*
>>>>>> -----------------------------------
>>>>>>
>>>>>> To reduce the cost of fixing miscompilations using freeze
>>>>>> instruction, we need to
>>>>>> optimize freeze away whenever possible.
>>>>>> Using the no-undef/poison assumption from the source language (C/C++
>>>>>> in
>>>>>> this context) can play a significant role.
>>>>>> To make use the assumptions, here are short-term goals:
>>>>>>
>>>>>> *1. Preserve no-undef/poison assumption of function arguments from
>>>>>> C/C++ when valid.*
>>>>>>
>>>>>> There is an ongoing relevant patch (that is written by others):
>>>>>> https://reviews.llvm.org/D81678
>>>>>>
>>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
>>>>>> valid.*
>>>>>>
>>>>>> Reading an indeterminate value from an lvalue that does not have char
>>>>>> or
>>>>>> std::byte type is UB [1].
>>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>>>> attaching a new
>>>>>> !noundef metadata to such `load`s.
>>>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>>>> The clang-side change is going to be made after D81678 is reviewed,
>>>>>> because it is likely
>>>>>> that this patch will have a lot of changes in clang tests.
>>>>>>
>>>>>>
>>>>>> *Replacing Undef with Poison*
>>>>>> ---------------------------
>>>>>>
>>>>>> Since undef is known to be the source of many optimizations due to
>>>>>> its complexity,
>>>>>> we'd like to suggest gradually moving towards using poison only.
>>>>>> To make it, (1) `poison` constant should be introduced into LLVM IR
>>>>>> first, and (2)
>>>>>> transformations that introduce `undef` should be updated to introduce
>>>>>> `poison` instead.
>>>>>>
>>>>>> For the step (2), we need an experimental result showing that it does
>>>>>> not cause
>>>>>> performance degradation. This relies on better support for freeze (the
>>>>>> no-undef/poison analysis patches).
>>>>>>
>>>>>> *1. Introduce a new `poison` constant into IR*:
>>>>>> https://reviews.llvm.org/D71126
>>>>>>
>>>>>> Note that `poison` constant can be used as a true placeholder value
>>>>>> as well.
>>>>>> Undef cannot be used in general because it is less undefined than
>>>>>> poison.
>>>>>>
>>>>>> *2. Update transformations that introduce `undef` to introduce
>>>>>> `poison` instead*
>>>>>>
>>>>>> (1) There are transformations that introduce `undef` as a placeholder
>>>>>> (e.g. phi operand
>>>>>> from an unreachable block).
>>>>>> For these, `poison` can be used instead.
>>>>>>
>>>>>> (2) The value of an uninitialized object (automatic or dynamic).
>>>>>> They are indeterminate values in C/C++, so okay to use poison instead.
>>>>>> A tricky case is a bitfield access, and we have two
>>>>>> possible solutions:
>>>>>>
>>>>>> - i. Introduce a very-packed struct type
>>>>>> ```
>>>>>> <C>
>>>>>> struct {
>>>>>>   int a:2, b:6;
>>>>>> } s;
>>>>>>
>>>>>> v = s.a;
>>>>>>
>>>>>> =>
>>>>>>
>>>>>> <IR>
>>>>>>
>>>>>> s = alloca
>>>>>>
>>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>>>> v = extractvalue tmp, 0
>>>>>> ```
>>>>>>   * Pros: Can be used to precisely lower C/C++'s struct typed
>>>>>> function argument into IR
>>>>>> (currently clang coerces a struct into int if small enough; I'll
>>>>>> explain about this detail if anyone requests)
>>>>>>   * Cons: Since optimizations aren’t aware of the new type, they
>>>>>> should be updated
>>>>>>
>>>>>> - ii. Use load-freeze
>>>>>> ```
>>>>>> <C>
>>>>>> struct {
>>>>>>   int a:2, b:6;
>>>>>> } s;
>>>>>>
>>>>>> v = s.a;
>>>>>>
>>>>>> =>
>>>>>>
>>>>>> <IR>
>>>>>> s = alloca
>>>>>>
>>>>>> // Poison bits are frozen and returned
>>>>>> tmp = *load freeze* i8* s
>>>>>> v = tmp & 3
>>>>>> ```
>>>>>>   * Pros: The change is simpler
>>>>>>   * Cons: Store forwarding isn’t free; needs insertion of freeze
>>>>>>     (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>>>>
>>>>>>
>>>>>> (3) The third case is the value of struct/union padding.
>>>>>> Padding is filled with unspecified value in C, so it is too undefined
>>>>>> to use poison.
>>>>>> We can fill it with defined bits nondeterministically chosen at
>>>>>> allocation time (freeze poison).
>>>>>>
>>>>>> ```
>>>>>> <C>
>>>>>> struct {
>>>>>>   char a; // 3 bytes padding
>>>>>>   int b;
>>>>>> } s;
>>>>>>
>>>>>> v = s.b;
>>>>>>
>>>>>> =>
>>>>>>
>>>>>> <IR>
>>>>>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent
>>>>>> manner
>>>>>> // s[0], s[4~7]: poison
>>>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>>>
>>>>>> s2 = gep (bitcast s to i8*), 4
>>>>>> v = load i32 s2
>>>>>> ```
>>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Juneyoung
>>>>>>
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>>>> representation and is read by an lvalue expression that does not have
>>>>>> character type, the behavior is undefined.
>>>>>> (Similarly, C17 6.2.6.1.5)
>>>>>>
>>>>> It is important to note that this applies to trap representations and
>>>>> not to unspecified values. A structure or union never has a trap
>>>>> representation.
>>>>>
>>>>>
>>>>>> C++14 8.5.12: If an indeterminate value is produced by an evaluation,
>>>>>> the behavior is undefined except in the following cases: If an
>>>>>> indeterminate value of unsigned narrow character type ...
>>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>>>
>>>>> While loading undef for the unsigned character type case merely
>>>>> produces undef, for C++, operations such as sign-extend or zero-extend on
>>>>> an undef i8 is also undefined behaviour.
>>>>>
>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> LLVM Developers mailing list
>>>>>> llvm-dev at lists.llvm.org
>>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>>
>>>>>
>>>>
>>>> --
>>>>
>>>> Juneyoung Lee
>>>> Software Foundation Lab, Seoul National University
>>>>
>>>
>>
>> --
>>
>> Juneyoung Lee
>> Software Foundation Lab, Seoul National University
>>
>

-- 

Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/e42a3335/attachment.html>


More information about the llvm-dev mailing list