[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Marek Olšák via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 10 11:05:29 PDT 2017


mareko added inline comments.


================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:117-121
+// Set EXEC according to a thread count packed in an SGPR input:
+//    thread_count = (input >> bitoffset) & 0x7f;
+// This is always moved to the beginning of the basic block.
+def int_amdgcn_init_exec_from_input : Intrinsic<[],
+  [llvm_i32_ty,       // 32-bit SGPR input
----------------
arsenm wrote:
> mareko wrote:
> > arsenm wrote:
> > > Why can't you emit this sequence and feed that into the first intrinsic?
> > There are several reasons:
> > - It's easier this way, because the custom inserter only has to move the COPY opcode to the beginning instead of the whole expression.
> > - LLVM can't select S_BFM_B64.
> > - LLVM likely can't select S_CMP_U32_EQ in this case.
> > - LLVM can't select S_CMOV_B64.
> I don't think we should be adding intrinsics for the sake of working around codegen defects. A better workaround would be to only ever use init_exec and then have AMDGPUCodeGenPrepare insert calls to the second intrinsic until we fix the various SCC handling issues
That makes sense, however if we look at it from the hw perperctive, amdgcn.init.exec.from.input is all we need. We don't need any SCC handling, S_CMOV_B64 selection, etc. It would be desirable for have those, but not for init.exec. In fact, the init.exec instrinsic is already too flexible. The driver will only ever pass -1 into it. With all those in mind, it's unnecessary for init.exec to be more flexible. There is no use case for such flexibility. Would you implement something that nobody would ever use?


https://reviews.llvm.org/D31762





More information about the llvm-commits mailing list