[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Thu Apr 6 16:37:26 PDT 2017

arsenm added inline comments.

================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:117-121
+// Set EXEC according to a thread count packed in an SGPR input:
+//    thread_count = (input >> bitoffset) & 0x7f;
+// This is always moved to the beginning of the basic block.
+def int_amdgcn_init_exec_from_input : Intrinsic<[],
+  [llvm_i32_ty,       // 32-bit SGPR input
----------------
mareko wrote:
> arsenm wrote:
> > Why can't you emit this sequence and feed that into the first intrinsic?
> There are several reasons:
> - It's easier this way, because the custom inserter only has to move the COPY opcode to the beginning instead of the whole expression.
> - LLVM can't select S_BFM_B64.
> - LLVM likely can't select S_CMP_U32_EQ in this case.
> - LLVM can't select S_CMOV_B64.
I don't think we should be adding intrinsics for the sake of working around codegen defects. A better workaround would be to only ever use init_exec and then have AMDGPUCodeGenPrepare insert calls to the second intrinsic until we fix the various SCC handling issues

https://reviews.llvm.org/D31762