[PATCH] D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)

Tue Oct 10 03:17:56 PDT 2017

nhaehnle added a comment.

A bunch of inline comments, but also some higher level things that don't really fit anywhere. This is a useful feature, but I don't think we've ever gotten the design of kill just right, because kill is really an implicit control flow intrinsic.

So, for example, if you have

  %v = llvm.amdgcn.icmp(...) ; ballot-type instruction
  kill(...)
  use %v

then LLVM is free to move the ballot-type instruction to after the kill according to the LLVM IR semantics, even though that is incorrect.

This isn't a problem in practice yet, because the instructions most likely to be affected by this are image sample intrinsics. Those are IntrReadMem, and `kill` itself has arbitrary side effects today, so sample intrinsics cannot be moved past a `kill`. Still, it might lead to problems in the future with shaders that use ballot and DPP / reduction intrinsics. So I've been wondering if we couldn't perhaps use kill like this:

    br i1 %cond, label %kill_bb, label %cont
  kill_bb:
    call noret void @llvm.amdgcn.kill()
    ret undef
  cont:
    ...

or perhaps better:

    %kill = call i1 @llvm.amdgcn.ps.kill(%cond)
    br i1 %kill, label %kill_bb, label %cont
  kill_bb:
    call noret void @llvm.amdgcn.kill()
    ret undef
  cont:
    ...

In this second variant, the `ps.kill` intrinsic would update the live mask and return `true` for all threads that can exit, i.e. `ps.kill` would internally do the WQM vote.

The advantage is that the control flow aspect of kill is properly modeled at LLVM IR level and so we can't run into issues with convergent intrinsics moving past it. I'd feel much more comfortable with an approach like that.

================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:757
+// If false, set EXEC=0 for the current thread until the end of program.
+def int_amdgcn_kill : Intrinsic<[], [llvm_i1_ty], []>;
+
----------------
mareko wrote:
> arsenm wrote:
> > This should be convergent (and no mem?)
> amdgcn.kill shouldn't be moved across ds.bpermute, ds.swizzle, and image_sample opcodes. Does IntrNoMem or IntrConvergent assure that?
If we do stick with the simple intrinsic-based approach, I think we should keep the attributes as they are right now, i.e. keep them identical to AMDGPU.kill. That will give us fewer surprises...

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3002
     return emitIndirectDst(MI, *BB, *getSubtarget());
-  case AMDGPU::SI_KILL:
+  case AMDGPU::SI_KILL_F32_GE_0_PSEUDO:
+  case AMDGPU::SI_KILL_I1_PSEUDO:
----------------
mareko wrote:
> arsenm wrote:
> > I don't think we should have a family of different kill opcodes just for the different compare types. Can we just have SI_KILL with the condition register input? We could then have an optimization pass look for the 
> > %cond = V_CMP_*
> > SI_KILL %cond -> V_CMPX_*_term pattern. We would probably have to introduce the new _term variants of the CMPX instructions.
> I tried to do that but it's too much work. Eventually we'd like all flavors of V_CMPX, but it's not a high priority right now.
I think Matt is right. It would be cleaner and give us the benefit of optimizing more cases.

================
Comment at: test/CodeGen/AMDGPU/kill.ll:1
+; RUN: opt -S -mtriple=amdgcn-- -instcombine < %s | llc -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck -check-prefix=SI %s
+; RUN: opt -S -mtriple=amdgcn-- -instcombine < %s | llc -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck -check-prefix=SI %s
----------------
mareko wrote:
> arsenm wrote:
> > This should just run llc. The instcombine test should be separate in test/Transforms/InstCombine/AMDGPU
> > 
> > Also should use -enable-var-scope for FileCheck
> What's the deal with not running -instcombine as part of AMDGCN tests? It seems like it would be convenient everywhere.
Fewer moving parts, I think. If instcombine is run on the test as well, there's a higher chance of tests being "broken" (in a false positive way) by random unrelated changes.

https://reviews.llvm.org/D38544