[llvm-dev] Undef and Poison round table follow-up & a plan
Hubert Tong via llvm-dev
llvm-dev at lists.llvm.org
Thu Oct 8 13:37:18 PDT 2020
On Thu, Oct 8, 2020 at 12:12 PM Juneyoung Lee via llvm-dev <
llvm-dev at lists.llvm.org> wrote:
> Hello all,
>
> Thank everyone who participated in the (impromptu) round table discussion
> on Tuesday.
> For those who are interested, I share the summary of the discussion.
> Also, I share a short-term plan regarding this issue and relevant patches.
>
>
> *Fixing Miscompilations using Freeze*
> -----------------------------------
>
> To reduce the cost of fixing miscompilations using freeze instruction, we
> need to
> optimize freeze away whenever possible.
> Using the no-undef/poison assumption from the source language (C/C++ in
> this context) can play a significant role.
> To make use the assumptions, here are short-term goals:
>
> *1. Preserve no-undef/poison assumption of function arguments from C/C++
> when valid.*
>
> There is an ongoing relevant patch (that is written by others):
> https://reviews.llvm.org/D81678
>
> *2. Preserve no-undef/poison assumption of lvalue reads in C/C++ when
> valid.*
>
> Reading an indeterminate value from an lvalue that does not have char or
> std::byte type is UB [1].
> Since reading an lvalue is lowered to `load` in IR, we suggest attaching a
> new
> !noundef metadata to such `load`s.
> The IR-side change is here: https://reviews.llvm.org/D89050
> The clang-side change is going to be made after D81678 is reviewed,
> because it is likely
> that this patch will have a lot of changes in clang tests.
>
>
> *Replacing Undef with Poison*
> ---------------------------
>
> Since undef is known to be the source of many optimizations due to its
> complexity,
> we'd like to suggest gradually moving towards using poison only.
> To make it, (1) `poison` constant should be introduced into LLVM IR first,
> and (2)
> transformations that introduce `undef` should be updated to introduce
> `poison` instead.
>
> For the step (2), we need an experimental result showing that it does not
> cause
> performance degradation. This relies on better support for freeze (the
> no-undef/poison analysis patches).
>
> *1. Introduce a new `poison` constant into IR*:
> https://reviews.llvm.org/D71126
>
> Note that `poison` constant can be used as a true placeholder value as
> well.
> Undef cannot be used in general because it is less undefined than poison.
>
> *2. Update transformations that introduce `undef` to introduce `poison`
> instead*
>
> (1) There are transformations that introduce `undef` as a placeholder
> (e.g. phi operand
> from an unreachable block).
> For these, `poison` can be used instead.
>
> (2) The value of an uninitialized object (automatic or dynamic).
> They are indeterminate values in C/C++, so okay to use poison instead.
> A tricky case is a bitfield access, and we have two possible solutions:
>
> - i. Introduce a very-packed struct type
> ```
> <C>
> struct {
> int a:2, b:6;
> } s;
>
> v = s.a;
>
> =>
>
> <IR>
>
> s = alloca
>
> tmp = load *{{i2, i6}}** s ; load as a very packed struct type
> v = extractvalue tmp, 0
> ```
> * Pros: Can be used to precisely lower C/C++'s struct typed function
> argument into IR
> (currently clang coerces a struct into int if small enough; I'll explain
> about this detail if anyone requests)
> * Cons: Since optimizations aren’t aware of the new type, they should be
> updated
>
> - ii. Use load-freeze
> ```
> <C>
> struct {
> int a:2, b:6;
> } s;
>
> v = s.a;
>
> =>
>
> <IR>
> s = alloca
>
> // Poison bits are frozen and returned
> tmp = *load freeze* i8* s
> v = tmp & 3
> ```
> * Pros: The change is simpler
> * Cons: Store forwarding isn’t free; needs insertion of freeze
> (store x, p; v = load freeze p => store x, p; v = freeze x)
>
>
> (3) The third case is the value of struct/union padding.
> Padding is filled with unspecified value in C, so it is too undefined to
> use poison.
> We can fill it with defined bits nondeterministically chosen at allocation
> time (freeze poison).
>
> ```
> <C>
> struct {
> char a; // 3 bytes padding
> int b;
> } s;
>
> v = s.b;
>
> =>
>
> <IR>
> s = alloca {i8, i32} // alloca initializes bytes in a type-dependent manner
> // s[0], s[4~7]: poison
> // s[1~3]: let's fill these bytes with nondet. bits
>
> s2 = gep (bitcast s to i8*), 4
> v = load i32 s2
> ```
>
>
> Thanks,
> Juneyoung
>
>
>
> [1]
> C11 6.2.6.1.5: If the stored value of an object has such a representation
> and is read by an lvalue expression that does not have character type, the
> behavior is undefined.
> (Similarly, C17 6.2.6.1.5)
>
It is important to note that this applies to trap representations and not
to unspecified values. A structure or union never has a trap representation.
> C++14 8.5.12: If an indeterminate value is produced by an evaluation, the
> behavior is undefined except in the following cases: If an indeterminate
> value of unsigned narrow character type ...
> (Similarly, C++17 11.6.12 , C++11 4.1.1)
>
While loading undef for the unsigned character type case merely produces
undef, for C++, operations such as sign-extend or zero-extend on an undef
i8 is also undefined behaviour.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20201008/c992b897/attachment-0001.html>
More information about the llvm-dev
mailing list