[PATCH] D106053: [CodeGen] Remove pending AssertZext AssertSext in promoting FP_TO_INT

Sat Jul 17 09:22:44 PDT 2021

aqjune added inline comments.

================
Comment at: llvm/test/CodeGen/X86/fptoui-may-overflow.ll:5
+; We can't only generate "cvttps2dq %xmm0, %xmm0" for this function, because
+; there may be overflow case in fptoui. The old DAG optimization will optimize
+; "%f = fptoui <4 x float> %arg to <4 x i8>" to
----------------
xiangzhangllvm wrote:
> aqjune wrote:
> > xiangzhangllvm wrote:
> > > efriedma wrote:
> > > > aqjune wrote:
> > > > > According to LangRef (https://llvm.org/docs/LangRef.html#fptoui-to-instruction), the overflow returns `poison` which means that using the value is invalid.
> > > > > 
> > > > > This is analogous to using an uninitialized variable in C/C++; there is no guarantee that the compiled program will have a reasonable behavior.
> > > > > ```
> > > > > int x; // not initialized
> > > > > printf("%d", x); // Assume that this printed 0xDEADBEEF
> > > > > ... (don't update x)
> > > > > printf("%d", x); // There is no guarantee that this will also print 0xDEADBEEF
> > > > > ```
> > > > The comment is wrong.  But the CHECK lines are correct.  LangRef and alive2 say the following transform is invalid:
> > > > 
> > > > ```
> > > > define <16 x i8> @src(<4 x float> %arg1) {
> > > >   %ss = shufflevector <16 x i8> poison, <16 x i8> zeroinitializer, <16 x i32> <i32 0, i32 17, i32 18, i32 19, i32 1, i32 21, i32 22, i32 23, i32 2, i32 25, i32 26, i32 27, i32 3, i32 29, i32 30, i32 31>
> > > >   ret <16 x i8> %ss
> > > > }
> > > > =>
> > > > define <16 x i8> @tgt(<4 x float> %arg1) {
> > > >   ret <16 x i8> poison
> > > > }
> > > > ```
> > > > 
> > > > (On a side-note, alive2 gives some confusing results for fptoui; apparently it thinks `fptoui float 31.5 to i32` is poison.)
> > > Sorry, not much understand, Here the test didn't use poison elements, and we don't know the fptoui will be overflow or not, it is runtime.
> > > 
> > > 
> > @efriedma Thank you for the report, I made a pull request at Alive2 that fixes the bugs. @nlopes will be back in a few weeks and have a look at it.
> > 
> > @xiangzhangllvm poison is kind of a conceptual value that appears in LLVM IR's abstract machine. 
> > It is used to carry guarantees from C/C++ that e.g., casting big floats to signed integer is not legal.
> > Of course, one cannot statically determine if a C program will do such cast or not.
> > As double free raises segmentation fault, the execution can fail or print a bogus value; integer overflow is just less visible to users.
> Hi @aqjune ,let's talk on the test. The current problem is 
> 
> "%f = fptoui <4 x float> %arg to <4 x i8>" will be convert to "fptosi <4 x float> %1 to <4 x i16>  +  AssertZext"
> 
> Do you mean in overflow case, %f should be poison value and report error ? (I agree)
> But this convert self is not correct.
> And even we mark some elements of %f is poison, but the following shuffle didn't use (read) this poison element. It just write 0 into these elements.
> And this action of "write 0" will be remove by AssertZext.  So this patch is try to removing this AssertZext.
> 
> "%f = fptoui <4 x float> %arg to <4 x i8>" will be convert to "fptosi <4 x float> %1 to <4 x i16> + AssertZext"

To me the transformation seems correct; let me explain why I think so.
I'll use <1 x float> instead of <4 x float> to avoid confusion; it seems the number of elements doesn't matter here.

Let's assume that %arg[0] fit in [0,256); then AssertZext is fine, because 8 MSBs of i16 is anyway zero.
The problematic case is when %arg[0] doesn't fit in [0, 256).
Then, %f[0] is poison; using %f[0] will raise undefined behavior.
Then, the optimized code can do anything including using the register which isn't filled with zero bits...
So in either case, having AssertZext is fine.

To me, efriedma's shufflevector transformation (https://reviews.llvm.org/D106053#2882183) looks fishier. 
If 2a419a0b9957 is the root cause of the shufflevector transformation, and reverting it (or fixing it using freeze) solves the problem, then what do you think about the alternative solution?
2a419a0b9957 has small diffs in the tests which implies that it will have a small impact on performance, maybe.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106053/new/

https://reviews.llvm.org/D106053