[llvm-dev] Undef and Poison round table follow-up & a plan
Juneyoung Lee via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 8 20:08:50 PDT 2020
It is UB when a poison is passed to certain operations that raise UB on
poison, such as division by poison/dereferencing poison pointer/branching
on poison condition/etc.
Otherwise, poison is simply propagated, but it does not raise UB
Copying poison bytes is okay:
// Members are initialized to poison at object creation.
p = alloca {i8, i32} // p[0], p[4~7] are poison
q = alloca {i8, i32} // we want to copy p to q
v = load i8* p[0] // v is poison
store i8 v, i8* q[0] // poison is simply copied; no UB happened
Similarly, passing/returning poison is allowed as well.
Juneyoung
On Fri, Oct 9, 2020 at 10:45 AM Hubert Tong <
hubert.reinterpretcast at gmail.com> wrote:
> On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
> wrote:
>
>> > It is important to note that this applies to trap representations and
>> not to unspecified values. A structure or union never has a trap
>> representation.
>> Yes, nondeterministic bits would work for padding of struct/union, as
>> described in (3) The third case is the value of struct/union padding.
>> For the members of struct/union, it is allowed to have trap
>> representation, so poison can be used.
>>
> At what point are the members considered poison? For
> copying/passing/returning a struct or union, there is no UB even if some
> members are uninitialized.
>
>
>>
>> Juneyoung
>>
>> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
>> hubert.reinterpretcast at gmail.com> wrote:
>>
>>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>>> llvm-dev at lists.llvm.org> wrote:
>>>
>>>> Hello all,
>>>>
>>>> Thank everyone who participated in the (impromptu) round table
>>>> discussion on Tuesday.
>>>> For those who are interested, I share the summary of the discussion.
>>>> Also, I share a short-term plan regarding this issue and relevant
>>>> patches.
>>>>
>>>>
>>>> *Fixing Miscompilations using Freeze*
>>>> -----------------------------------
>>>>
>>>> To reduce the cost of fixing miscompilations using freeze instruction,
>>>> we need to
>>>> optimize freeze away whenever possible.
>>>> Using the no-undef/poison assumption from the source language (C/C++ in
>>>> this context) can play a significant role.
>>>> To make use the assumptions, here are short-term goals:
>>>>
>>>> *1. Preserve no-undef/poison assumption of function arguments from
>>>> C/C++ when valid.*
>>>>
>>>> There is an ongoing relevant patch (that is written by others):
>>>> https://reviews.llvm.org/D81678
>>>>
>>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
>>>> valid.*
>>>>
>>>> Reading an indeterminate value from an lvalue that does not have char or
>>>> std::byte type is UB [1].
>>>> Since reading an lvalue is lowered to `load` in IR, we suggest
>>>> attaching a new
>>>> !noundef metadata to such `load`s.
>>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>>> The clang-side change is going to be made after D81678 is reviewed,
>>>> because it is likely
>>>> that this patch will have a lot of changes in clang tests.
>>>>
>>>>
>>>> *Replacing Undef with Poison*
>>>> ---------------------------
>>>>
>>>> Since undef is known to be the source of many optimizations due to its
>>>> complexity,
>>>> we'd like to suggest gradually moving towards using poison only.
>>>> To make it, (1) `poison` constant should be introduced into LLVM IR
>>>> first, and (2)
>>>> transformations that introduce `undef` should be updated to introduce
>>>> `poison` instead.
>>>>
>>>> For the step (2), we need an experimental result showing that it does
>>>> not cause
>>>> performance degradation. This relies on better support for freeze (the
>>>> no-undef/poison analysis patches).
>>>>
>>>> *1. Introduce a new `poison` constant into IR*:
>>>> https://reviews.llvm.org/D71126
>>>>
>>>> Note that `poison` constant can be used as a true placeholder value as
>>>> well.
>>>> Undef cannot be used in general because it is less undefined than
>>>> poison.
>>>>
>>>> *2. Update transformations that introduce `undef` to introduce `poison`
>>>> instead*
>>>>
>>>> (1) There are transformations that introduce `undef` as a placeholder
>>>> (e.g. phi operand
>>>> from an unreachable block).
>>>> For these, `poison` can be used instead.
>>>>
>>>> (2) The value of an uninitialized object (automatic or dynamic).
>>>> They are indeterminate values in C/C++, so okay to use poison instead.
>>>> A tricky case is a bitfield access, and we have two possible solutions:
>>>>
>>>> - i. Introduce a very-packed struct type
>>>> ```
>>>> <C>
>>>> struct {
>>>> int a:2, b:6;
>>>> } s;
>>>>
>>>> v = s.a;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>>
>>>> s = alloca
>>>>
>>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>>> v = extractvalue tmp, 0
>>>> ```
>>>> * Pros: Can be used to precisely lower C/C++'s struct typed function
>>>> argument into IR
>>>> (currently clang coerces a struct into int if small enough; I'll
>>>> explain about this detail if anyone requests)
>>>> * Cons: Since optimizations aren’t aware of the new type, they should
>>>> be updated
>>>>
>>>> - ii. Use load-freeze
>>>> ```
>>>> <C>
>>>> struct {
>>>> int a:2, b:6;
>>>> } s;
>>>>
>>>> v = s.a;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>> s = alloca
>>>>
>>>> // Poison bits are frozen and returned
>>>> tmp = *load freeze* i8* s
>>>> v = tmp & 3
>>>> ```
>>>> * Pros: The change is simpler
>>>> * Cons: Store forwarding isn’t free; needs insertion of freeze
>>>> (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>>
>>>>
>>>> (3) The third case is the value of struct/union padding.
>>>> Padding is filled with unspecified value in C, so it is too undefined
>>>> to use poison.
>>>> We can fill it with defined bits nondeterministically chosen at
>>>> allocation time (freeze poison).
>>>>
>>>> ```
>>>> <C>
>>>> struct {
>>>> char a; // 3 bytes padding
>>>> int b;
>>>> } s;
>>>>
>>>> v = s.b;
>>>>
>>>> =>
>>>>
>>>> <IR>
>>>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent
>>>> manner
>>>> // s[0], s[4~7]: poison
>>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>>
>>>> s2 = gep (bitcast s to i8*), 4
>>>> v = load i32 s2
>>>> ```
>>>>
>>>>
>>>> Thanks,
>>>> Juneyoung
>>>>
>>>>
>>>>
>>>> [1]
>>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>>> representation and is read by an lvalue expression that does not have
>>>> character type, the behavior is undefined.
>>>> (Similarly, C17 6.2.6.1.5)
>>>>
>>> It is important to note that this applies to trap representations and
>>> not to unspecified values. A structure or union never has a trap
>>> representation.
>>>
>>>
>>>> C++14 8.5.12: If an indeterminate value is produced by an evaluation,
>>>> the behavior is undefined except in the following cases: If an
>>>> indeterminate value of unsigned narrow character type ...
>>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>>
>>> While loading undef for the unsigned character type case merely produces
>>> undef, for C++, operations such as sign-extend or zero-extend on an undef
>>> i8 is also undefined behaviour.
>>>
>>>
>>>>
>>>> _______________________________________________
>>>> LLVM Developers mailing list
>>>> llvm-dev at lists.llvm.org
>>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>
>>>
>>
>> --
>>
>> Juneyoung Lee
>> Software Foundation Lab, Seoul National University
>>
>
--
Juneyoung Lee
Software Foundation Lab, Seoul National University
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201009/a7ae424f/attachment-0001.html>
More information about the llvm-dev
mailing list