[llvm-dev] Undef and Poison round table follow-up & a plan
Hubert Tong via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 8 18:45:42 PDT 2020
On Thu, Oct 8, 2020 at 7:13 PM Juneyoung Lee <juneyoung.lee at sf.snu.ac.kr>
wrote:
> > It is important to note that this applies to trap representations and
> not to unspecified values. A structure or union never has a trap
> representation.
> Yes, nondeterministic bits would work for padding of struct/union, as
> described in (3) The third case is the value of struct/union padding.
> For the members of struct/union, it is allowed to have trap
> representation, so poison can be used.
>
At what point are the members considered poison? For
copying/passing/returning a struct or union, there is no UB even if some
members are uninitialized.
>
> Juneyoung
>
> On Fri, Oct 9, 2020 at 5:37 AM Hubert Tong <
> hubert.reinterpretcast at gmail.com> wrote:
>
>> On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hello all,
>>>
>>> Thank everyone who participated in the (impromptu) round table
>>> discussion on Tuesday.
>>> For those who are interested, I share the summary of the discussion.
>>> Also, I share a short-term plan regarding this issue and relevant
>>> patches.
>>>
>>>
>>> *Fixing Miscompilations using Freeze*
>>> -----------------------------------
>>>
>>> To reduce the cost of fixing miscompilations using freeze instruction,
>>> we need to
>>> optimize freeze away whenever possible.
>>> Using the no-undef/poison assumption from the source language (C/C++ in
>>> this context) can play a significant role.
>>> To make use the assumptions, here are short-term goals:
>>>
>>> *1. Preserve no-undef/poison assumption of function arguments from C/C++
>>> when valid.*
>>>
>>> There is an ongoing relevant patch (that is written by others):
>>> https://reviews.llvm.org/D81678
>>>
>>> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
>>> valid.*
>>>
>>> Reading an indeterminate value from an lvalue that does not have char or
>>> std::byte type is UB [1].
>>> Since reading an lvalue is lowered to `load` in IR, we suggest attaching
>>> a new
>>> !noundef metadata to such `load`s.
>>> The IR-side change is here: https://reviews.llvm.org/D89050
>>> The clang-side change is going to be made after D81678 is reviewed,
>>> because it is likely
>>> that this patch will have a lot of changes in clang tests.
>>>
>>>
>>> *Replacing Undef with Poison*
>>> ---------------------------
>>>
>>> Since undef is known to be the source of many optimizations due to its
>>> complexity,
>>> we'd like to suggest gradually moving towards using poison only.
>>> To make it, (1) `poison` constant should be introduced into LLVM IR
>>> first, and (2)
>>> transformations that introduce `undef` should be updated to introduce
>>> `poison` instead.
>>>
>>> For the step (2), we need an experimental result showing that it does
>>> not cause
>>> performance degradation. This relies on better support for freeze (the
>>> no-undef/poison analysis patches).
>>>
>>> *1. Introduce a new `poison` constant into IR*:
>>> https://reviews.llvm.org/D71126
>>>
>>> Note that `poison` constant can be used as a true placeholder value as
>>> well.
>>> Undef cannot be used in general because it is less undefined than poison.
>>>
>>> *2. Update transformations that introduce `undef` to introduce `poison`
>>> instead*
>>>
>>> (1) There are transformations that introduce `undef` as a placeholder
>>> (e.g. phi operand
>>> from an unreachable block).
>>> For these, `poison` can be used instead.
>>>
>>> (2) The value of an uninitialized object (automatic or dynamic).
>>> They are indeterminate values in C/C++, so okay to use poison instead.
>>> A tricky case is a bitfield access, and we have two possible solutions:
>>>
>>> - i. Introduce a very-packed struct type
>>> ```
>>> <C>
>>> struct {
>>> int a:2, b:6;
>>> } s;
>>>
>>> v = s.a;
>>>
>>> =>
>>>
>>> <IR>
>>>
>>> s = alloca
>>>
>>> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
>>> v = extractvalue tmp, 0
>>> ```
>>> * Pros: Can be used to precisely lower C/C++'s struct typed function
>>> argument into IR
>>> (currently clang coerces a struct into int if small enough; I'll explain
>>> about this detail if anyone requests)
>>> * Cons: Since optimizations aren’t aware of the new type, they should
>>> be updated
>>>
>>> - ii. Use load-freeze
>>> ```
>>> <C>
>>> struct {
>>> int a:2, b:6;
>>> } s;
>>>
>>> v = s.a;
>>>
>>> =>
>>>
>>> <IR>
>>> s = alloca
>>>
>>> // Poison bits are frozen and returned
>>> tmp = *load freeze* i8* s
>>> v = tmp & 3
>>> ```
>>> * Pros: The change is simpler
>>> * Cons: Store forwarding isn’t free; needs insertion of freeze
>>> (store x, p; v = load freeze p => store x, p; v = freeze x)
>>>
>>>
>>> (3) The third case is the value of struct/union padding.
>>> Padding is filled with unspecified value in C, so it is too undefined to
>>> use poison.
>>> We can fill it with defined bits nondeterministically chosen at
>>> allocation time (freeze poison).
>>>
>>> ```
>>> <C>
>>> struct {
>>> char a; // 3 bytes padding
>>> int b;
>>> } s;
>>>
>>> v = s.b;
>>>
>>> =>
>>>
>>> <IR>
>>> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent
>>> manner
>>> // s[0], s[4~7]: poison
>>> // s[1~3]: let's fill these bytes with nondet. bits
>>>
>>> s2 = gep (bitcast s to i8*), 4
>>> v = load i32 s2
>>> ```
>>>
>>>
>>> Thanks,
>>> Juneyoung
>>>
>>>
>>>
>>> [1]
>>> C11 6.2.6.1.5: If the stored value of an object has such a
>>> representation and is read by an lvalue expression that does not have
>>> character type, the behavior is undefined.
>>> (Similarly, C17 6.2.6.1.5)
>>>
>> It is important to note that this applies to trap representations and not
>> to unspecified values. A structure or union never has a trap representation.
>>
>>
>>> C++14 8.5.12: If an indeterminate value is produced by an evaluation,
>>> the behavior is undefined except in the following cases: If an
>>> indeterminate value of unsigned narrow character type ...
>>> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>>>
>> While loading undef for the unsigned character type case merely produces
>> undef, for C++, operations such as sign-extend or zero-extend on an undef
>> i8 is also undefined behaviour.
>>
>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>
> --
>
> Juneyoung Lee
> Software Foundation Lab, Seoul National University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/fc759927/attachment.html>
More information about the llvm-dev
mailing list