[PATCH] D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC
Nicolai Hähnle via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jul 16 06:42:19 PDT 2019
nhaehnle added a comment.
In D64726#1587457 <https://reviews.llvm.org/D64726#1587457>, @arsenm wrote:
> In D64726#1587233 <https://reviews.llvm.org/D64726#1587233>, @nhaehnle wrote:
>
> > This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)
>
>
> My current understanding of G_TRUNC is it's a no-op, and supposed to always be legal. This is supposed to be the legalized MIR, so theoretically this was generated by something that knew the original argument was zeroext from i1
Where do you get this from? And if this is the case, what is the real `trunc` instruction lowered to? This smells extremely fishy to me.
Case in point; take this IR:
define amdgpu_ps float @foo(i32 %a) {
%cc = trunc i32 %a to i1
%r = zext i1 %cc to i32
%r.f = bitcast i32 %r to float
ret float %r.f
}
which becomes, before legalization:
bb.1 (%ir-block.0):
liveins: $vgpr0
%0:_(s32) = COPY $vgpr0
%1:_(s1) = G_TRUNC %0:_(s32)
%2:_(s32) = G_ZEXT %1:_(s1)
$vgpr0 = COPY %2:_(s32)
SI_RETURN_TO_EPILOG $vgpr0
So it is very plain here that we cannot assume the high bits of the input to be zero.
Now //maybe// an exception can be made here due to something that happens earlier in legalization, but it does seem highly suspicious and it would be good to have a test all the way from IR to understand why this is happening -- having different semantics for the same opcode at different stages of the compilation is not cool.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D64726/new/
https://reviews.llvm.org/D64726
More information about the llvm-commits
mailing list