[PATCH] D106053: [CodeGen] Remove pending AssertZext AssertSext in promoting FP_TO_INT
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sat Jul 17 18:51:45 PDT 2021
efriedma added inline comments.
================
Comment at: llvm/test/CodeGen/X86/fptoui-may-overflow.ll:5
+; We can't only generate "cvttps2dq %xmm0, %xmm0" for this function, because
+; there may be overflow case in fptoui. The old DAG optimization will optimize
+; "%f = fptoui <4 x float> %arg to <4 x i8>" to
----------------
xiangzhangllvm wrote:
> aqjune wrote:
> > xiangzhangllvm wrote:
> > > aqjune wrote:
> > > > xiangzhangllvm wrote:
> > > > > efriedma wrote:
> > > > > > aqjune wrote:
> > > > > > > According to LangRef (https://llvm.org/docs/LangRef.html#fptoui-to-instruction), the overflow returns `poison` which means that using the value is invalid.
> > > > > > >
> > > > > > > This is analogous to using an uninitialized variable in C/C++; there is no guarantee that the compiled program will have a reasonable behavior.
> > > > > > > ```
> > > > > > > int x; // not initialized
> > > > > > > printf("%d", x); // Assume that this printed 0xDEADBEEF
> > > > > > > ... (don't update x)
> > > > > > > printf("%d", x); // There is no guarantee that this will also print 0xDEADBEEF
> > > > > > > ```
> > > > > > The comment is wrong. But the CHECK lines are correct. LangRef and alive2 say the following transform is invalid:
> > > > > >
> > > > > > ```
> > > > > > define <16 x i8> @src(<4 x float> %arg1) {
> > > > > > %ss = shufflevector <16 x i8> poison, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 1, i32 21, i32 22, i32 23, i32 2, i32 25, i32 26, i32 27, i32 3, i32 29, i32 30, i32 31>
> > > > > > ret <16 x i8> %ss
> > > > > > }
> > > > > > =>
> > > > > > define <16 x i8> @tgt(<4 x float> %arg1) {
> > > > > > ret <16 x i8> poison
> > > > > > }
> > > > > > ```
> > > > > >
> > > > > > (On a side-note, alive2 gives some confusing results for fptoui; apparently it thinks `fptoui float 31.5 to i32` is poison.)
> > > > > Sorry, not much understand, Here the test didn't use poison elements, and we don't know the fptoui will be overflow or not, it is runtime.
> > > > >
> > > > >
> > > > @efriedma Thank you for the report, I made a pull request at Alive2 that fixes the bugs. @nlopes will be back in a few weeks and have a look at it.
> > > >
> > > > @xiangzhangllvm poison is kind of a conceptual value that appears in LLVM IR's abstract machine.
> > > > It is used to carry guarantees from C/C++ that e.g., casting big floats to signed integer is not legal.
> > > > Of course, one cannot statically determine if a C program will do such cast or not.
> > > > As double free raises segmentation fault, the execution can fail or print a bogus value; integer overflow is just less visible to users.
> > > Hi @aqjune ,let's talk on the test. The current problem is
> > >
> > > "%f = fptoui <4 x float> %arg to <4 x i8>" will be convert to "fptosi <4 x float> %1 to <4 x i16> + AssertZext"
> > >
> > > Do you mean in overflow case, %f should be poison value and report error ? (I agree)
> > > But this convert self is not correct.
> > > And even we mark some elements of %f is poison, but the following shuffle didn't use (read) this poison element. It just write 0 into these elements.
> > > And this action of "write 0" will be remove by AssertZext. So this patch is try to removing this AssertZext.
> > >
> > > "%f = fptoui <4 x float> %arg to <4 x i8>" will be convert to "fptosi <4 x float> %1 to <4 x i16> + AssertZext"
> >
> > To me the transformation seems correct; let me explain why I think so.
> > I'll use <1 x float> instead of <4 x float> to avoid confusion; it seems the number of elements doesn't matter here.
> >
> > Let's assume that %arg[0] fit in [0,256); then AssertZext is fine, because 8 MSBs of i16 is anyway zero.
> > The problematic case is when %arg[0] doesn't fit in [0, 256).
> > Then, %f[0] is poison; using %f[0] will raise undefined behavior.
> > Then, the optimized code can do anything including using the register which isn't filled with zero bits...
> > So in either case, having AssertZext is fine.
> >
> > To me, efriedma's shufflevector transformation (https://reviews.llvm.org/D106053#2882183) looks fishier.
> > If 2a419a0b9957 is the root cause of the shufflevector transformation, and reverting it (or fixing it using freeze) solves the problem, then what do you think about the alternative solution?
> > 2a419a0b9957 has small diffs in the tests which implies that it will have a small impact on performance, maybe.
> **I think the key of our discussion should base on the correctness of the transformation.**
>
> The logic of [[ https://reviews.llvm.org/rG2a419a0b9957ebac9e11e4b43bc9fbe42a9207df | 2a419a0b9957]] self is no problem, it base on the how the computeKnownBits handle the AssertZext.
> Now the IR already be
> ```
> "fptosi <4 x float> %1 to <4 x i16> + AssertZext i8"
> ```
> If you think the transformation is correct, how can we know there is poison bits ?
> AssertZext self do not contain the "poison" meaning that the **bits 8-15** is poison.
> it should just tell a truth that the high N bits is zero.
It's hard to give good examples involving poison in SelectionDAG, generally. Most of the focus of poison-based optimizations has been at the IR level, IR has better documentation, and Alive2 also only works at the IR level. So it's easier to discuss examples in IR as much as possible.
For now, let's put aside the discussion of the semantics of AssertZext.
In IR, the following transform is illegal, according to LangRef, and Alive2:
```
define <16 x i8> @src() {
%and = and <4 x i32> poison, <i32 255, i32 255, i32 255, i32 255>
%bitcast = bitcast <4 x i32> %and to <16 x i8>
%ss = shufflevector <16 x i8> %bitcast, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 1, i32 21, i32 22, i32 23, i32 2, i32 25, i32 26, i32 27, i32 3, i32 29, i32 30, i32 31>
ret <16 x i8> %ss
}
define <16 x i8> @tgt() {
%and = and <4 x i32> poison, <i32 255, i32 255, i32 255, i32 255>
%bitcast = bitcast <4 x i32> %and to <16 x i8>
ret <16 x i8> %bitcast
}
```
Does this make sense?
I think 2a419a0b9957 performs the equivalent transform on SelectionDAG nodes. Do you think my understanding here is correct? Or is the SelectionDAG transform different somehow?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D106053/new/
https://reviews.llvm.org/D106053
More information about the llvm-commits
mailing list