[llvm-dev] Undef and Poison round table follow-up & a plan
Hubert Tong via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 8 20:19:15 PDT 2020
On Thu, Oct 8, 2020 at 11:09 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
wrote:
> It is UB when a poison is passed to certain operations that raise UB on
> poison, such as division by poison/dereferencing poison pointer/branching
> on poison condition/etc.
>
Got it. Thanks.
> Otherwise, poison is simply propagated, but it does not raise UB
> Copying poison bytes is okay:
>
> // Members are initialized to poison at object creation.
> p = alloca {i8, i32} // p[0], p[4~7] are poison
>
p[0] is an i8, so it shouldn't be poison?
> q = alloca {i8, i32} // we want to copy p to q
> v = load i8* p[0] // v is poison
> store i8 v, i8* q[0] // poison is simply copied; no UB happened
>
> Similarly, passing/returning poison is allowed as well.
>
> Juneyoung
>
> On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
> hubert.reinterpretcast at gmail.com> wrote:
>
>> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
>> wrote:
>>
>>> > It is important to note that this applies to trap representations and
>>> not to unspecified values. A structure or union never has a trap
>>> representation.
>>> Yes, nondeterministic bits would work for padding of struct/union, as
>>> described in (3) The third case is the value of struct/union padding.
>>> For the members of struct/union, it is allowed to have trap
>>> representation, so poison can be used.
>>>
>> At what point are the members considered poison? For
>> copying/passing/returning a struct or union, there is no UB even if some
>> members are uninitialized.
>>
>>
>>>
>>> Juneyoung
>>>
>>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>>> hubert.reinterpretcast at gmail.com> wrote:
>>>
>>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>>> llvm-dev at lists.llvm.org> wrote:
>>>>
>>>>> Hello all,
>>>>>
>>>>> Thank everyone who participated in the (impromptu) round table
>>>>> discussion on Tuesday.
>>>>> For those who are interested, I share the summary of the discussion.
>>>>> Also, I share a short-term plan regarding this issue and relevant
>>>>> patches.
>>>>>
>>>>>
>>>>> *Fixing Miscompilations using Freeze*
>>>>> -----------------------------------
>>>>>
>>>>> To reduce the cost of fixing miscompilations using freeze instruction,
>>>>> we need to
>>>>> optimize freeze away whenever possible.
>>>>> Using the no-undef/poison assumption from the source language (C/C++ in
>>>>> this context) can play a significant role.
>>>>> To make use the assumptions, here are short-term goals:
>>>>>
>>>>> *1. Preserve no-undef/poison assumption of function arguments from
>>>>> C/C++ when valid.*
>>>>>
>>>>> There is an ongoing relevant patch (that is written by others):
>>>>> https://reviews.llvm.org/D81678
>>>>>
>>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
>>>>> valid.*
>>>>>
>>>>> Reading an indeterminate value from an lvalue that does not have char
>>>>> or
>>>>> std::byte type is UB [1].
>>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>>> attaching a new
>>>>> !noundef metadata to such `load`s.
>>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>>> The clang-side change is going to be made after D81678 is reviewed,
>>>>> because it is likely
>>>>> that this patch will have a lot of changes in clang tests.
>>>>>
>>>>>
>>>>> *Replacing Undef with Poison*
>>>>> ---------------------------
>>>>>
>>>>> Since undef is known to be the source of many optimizations due to its
>>>>> complexity,
>>>>> we'd like to suggest gradually moving towards using poison only.
>>>>> To make it, (1) `poison` constant should be introduced into LLVM IR
>>>>> first, and (2)
>>>>> transformations that introduce `undef` should be updated to introduce
>>>>> `poison` instead.
>>>>>
>>>>> For the step (2), we need an experimental result showing that it does
>>>>> not cause
>>>>> performance degradation. This relies on better support for freeze (the
>>>>> no-undef/poison analysis patches).
>>>>>
>>>>> *1. Introduce a new `poison` constant into IR*:
>>>>> https://reviews.llvm.org/D71126
>>>>>
>>>>> Note that `poison` constant can be used as a true placeholder value as
>>>>> well.
>>>>> Undef cannot be used in general because it is less undefined than
>>>>> poison.
>>>>>
>>>>> *2. Update transformations that introduce `undef` to introduce
>>>>> `poison` instead*
>>>>>
>>>>> (1) There are transformations that introduce `undef` as a placeholder
>>>>> (e.g. phi operand
>>>>> from an unreachable block).
>>>>> For these, `poison` can be used instead.
>>>>>
>>>>> (2) The value of an uninitialized object (automatic or dynamic).
>>>>> They are indeterminate values in C/C++, so okay to use poison instead.
>>>>> A tricky case is a bitfield access, and we have two possible solutions:
>>>>>
>>>>> - i. Introduce a very-packed struct type
>>>>> ```
>>>>> <C>
>>>>> struct {
>>>>> int a:2, b:6;
>>>>> } s;
>>>>>
>>>>> v = s.a;
>>>>>
>>>>> =>
>>>>>
>>>>> <IR>
>>>>>
>>>>> s = alloca
>>>>>
>>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>>> v = extractvalue tmp, 0
>>>>> ```
>>>>> * Pros: Can be used to precisely lower C/C++'s struct typed function
>>>>> argument into IR
>>>>> (currently clang coerces a struct into int if small enough; I'll
>>>>> explain about this detail if anyone requests)
>>>>> * Cons: Since optimizations aren’t aware of the new type, they
>>>>> should be updated
>>>>>
>>>>> - ii. Use load-freeze
>>>>> ```
>>>>> <C>
>>>>> struct {
>>>>> int a:2, b:6;
>>>>> } s;
>>>>>
>>>>> v = s.a;
>>>>>
>>>>> =>
>>>>>
>>>>> <IR>
>>>>> s = alloca
>>>>>
>>>>> // Poison bits are frozen and returned
>>>>> tmp = *load freeze* i8* s
>>>>> v = tmp & 3
>>>>> ```
>>>>> * Pros: The change is simpler
>>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze
>>>>> (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>>>
>>>>>
>>>>> (3) The third case is the value of struct/union padding.
>>>>> Padding is filled with unspecified value in C, so it is too undefined
>>>>> to use poison.
>>>>> We can fill it with defined bits nondeterministically chosen at
>>>>> allocation time (freeze poison).
>>>>>
>>>>> ```
>>>>> <C>
>>>>> struct {
>>>>> char a; // 3 bytes padding
>>>>> int b;
>>>>> } s;
>>>>>
>>>>> v = s.b;
>>>>>
>>>>> =>
>>>>>
>>>>> <IR>
>>>>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent
>>>>> manner
>>>>> // s[0], s[4~7]: poison
>>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>>
>>>>> s2 = gep (bitcast s to i8*), 4
>>>>> v = load i32 s2
>>>>> ```
>>>>>
>>>>>
>>>>> Thanks,
>>>>> Juneyoung
>>>>>
>>>>>
>>>>>
>>>>> [1]
>>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>>> representation and is read by an lvalue expression that does not have
>>>>> character type, the behavior is undefined.
>>>>> (Similarly, C17 6.2.6.1.5)
>>>>>
>>>> It is important to note that this applies to trap representations and
>>>> not to unspecified values. A structure or union never has a trap
>>>> representation.
>>>>
>>>>
>>>>> C++14 8.5.12: If an indeterminate value is produced by an evaluation,
>>>>> the behavior is undefined except in the following cases: If an
>>>>> indeterminate value of unsigned narrow character type ...
>>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>>
>>>> While loading undef for the unsigned character type case merely
>>>> produces undef, for C++, operations such as sign-extend or zero-extend on
>>>> an undef i8 is also undefined behaviour.
>>>>
>>>>
>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>
>>>
>>> --
>>>
>>> Juneyoung Lee
>>> Software Foundation Lab, Seoul National University
>>>
>>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/531b61ac/attachment.html>
More information about the llvm-dev
mailing list