[PATCH] D64726: AMDGPU/GlobalISel: Fix not constraining result reg of copies to VCC

Nicolai Hähnle via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jul 16 06:42:19 PDT 2019


nhaehnle added a comment.

In D64726#1587457 <https://reviews.llvm.org/D64726#1587457>, @arsenm wrote:

> In D64726#1587233 <https://reviews.llvm.org/D64726#1587233>, @nhaehnle wrote:
>
> > This seems incorrect, doesn't it? The truncation disappeared.... (e.g., what if $sgpr0 is 0x10)
>
>
> My current understanding of G_TRUNC is it's a no-op, and supposed to always be legal. This is supposed to be the legalized MIR, so theoretically this was generated by something that knew the original argument was zeroext from i1


Where do you get this from? And if this is the case, what is the real `trunc` instruction lowered to? This smells extremely fishy to me.

Case in point; take this IR:

  define amdgpu_ps float @foo(i32 %a) {
    %cc = trunc i32 %a to i1
    %r = zext i1 %cc to i32
    %r.f = bitcast i32 %r to float
    ret float %r.f
  }

which becomes, before legalization:

  bb.1 (%ir-block.0):
    liveins: $vgpr0
    %0:_(s32) = COPY $vgpr0
    %1:_(s1) = G_TRUNC %0:_(s32)
    %2:_(s32) = G_ZEXT %1:_(s1)
    $vgpr0 = COPY %2:_(s32)
    SI_RETURN_TO_EPILOG $vgpr0

So it is very plain here that we cannot assume the high bits of the input to be zero.

Now //maybe// an exception can be made here due to something that happens earlier in legalization, but it does seem highly suspicious and it would be good to have a test all the way from IR to understand why this is happening -- having different semantics for the same opcode at different stages of the compilation is not cool.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D64726/new/

https://reviews.llvm.org/D64726





More information about the llvm-commits mailing list