[PATCH] D38544: AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)

Wed Oct 4 15:15:15 PDT 2017

mareko added inline comments.

================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:757
+// If false, set EXEC=0 for the current thread until the end of program.
+def int_amdgcn_kill : Intrinsic<[], [llvm_i1_ty], []>;
+
----------------
arsenm wrote:
> This should be convergent (and no mem?)
amdgcn.kill shouldn't be moved across ds.bpermute, ds.swizzle, and image_sample opcodes. Does IntrNoMem or IntrConvergent assure that?

================
Comment at: lib/Target/AMDGPU/AMDGPUInstructions.td:170-175
+def COND_EQ_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETOEQ;}]>;
+def COND_NE_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETONE;}]>;
+def COND_GT_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETOGT;}]>;
+def COND_GE_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETOGE;}]>;
+def COND_LT_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETOLT;}]>;
+def COND_LE_NONANS : PatLeaf <(cond), [{return (N->get() & 0x7) == ISD::SETOLE;}]>;
----------------
arsenm wrote:
> I'm not sure what this means by NONANS. I think this is just doing the same thing as the existing COND_O* patleafs by accepting the ordered and unspecified compares as ordered.
It means I'm lazy to handle OGE and !OGE separately. I'd still like to handle !OGE in a simple way and not care about NaNs. Alternatively, I can use add 2 patterns, one for COND_OGE and one for COND_UGE, both mapping to SI_KILL_F32_GE_0).

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:3002
     return emitIndirectDst(MI, *BB, *getSubtarget());
-  case AMDGPU::SI_KILL:
+  case AMDGPU::SI_KILL_F32_GE_0_PSEUDO:
+  case AMDGPU::SI_KILL_I1_PSEUDO:
----------------
arsenm wrote:
> I don't think we should have a family of different kill opcodes just for the different compare types. Can we just have SI_KILL with the condition register input? We could then have an optimization pass look for the 
> %cond = V_CMP_*
> SI_KILL %cond -> V_CMPX_*_term pattern. We would probably have to introduce the new _term variants of the CMPX instructions.
I tried to do that but it's too much work. Eventually we'd like all flavors of V_CMPX, but it's not a high priority right now.

================
Comment at: lib/Target/AMDGPU/SIInsertSkips.cpp:204-206
     BuildMI(MBB, &MI, DL, TII->get(AMDGPU::V_CMPX_LE_F32_e32))
         .addImm(0)
         .add(Op);
----------------
arsenm wrote:
> If the condition is an SGPR this will violate the operand constraints, so this should be creating the _e64 version. The problem with that is this pass runs after SIShrinkInstructions, so this won't be optimized down in the common case which is part of why this expansion should probably be done earlier.
This is not new code. It's the previous code moved by a few lines.

================
Comment at: lib/Transforms/InstCombine/InstCombineCalls.cpp:3543-3550
+  case Intrinsic::amdgcn_kill: {
+    const ConstantInt *C = dyn_cast<ConstantInt>(II->getArgOperand(0));
+    if (!C || !C->getZExtValue())
+      break;
+
+    // amdgcn.kill(i1 1) is a no-op
+    return eraseInstFromFunction(CI);
----------------
arsenm wrote:
> Test missing for this part
It's the "kill_true" test.

================
Comment at: test/CodeGen/AMDGPU/kill.ll:1
+; RUN: opt -S -mtriple=amdgcn-- -instcombine < %s | llc -march=amdgcn -mcpu=verde -verify-machineinstrs | FileCheck -check-prefix=SI %s
+; RUN: opt -S -mtriple=amdgcn-- -instcombine < %s | llc -march=amdgcn -mcpu=tonga -verify-machineinstrs | FileCheck -check-prefix=SI %s
----------------
arsenm wrote:
> This should just run llc. The instcombine test should be separate in test/Transforms/InstCombine/AMDGPU
> 
> Also should use -enable-var-scope for FileCheck
What's the deal with not running -instcombine as part of AMDGCN tests? It seems like it would be convenient everywhere.

https://reviews.llvm.org/D38544