[PATCH] D31762: AMDGPU: Add new amdgcn.init.exec intrinsics

Marek Olšák via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Apr 6 10:06:10 PDT 2017


mareko added inline comments.


================
Comment at: include/llvm/IR/IntrinsicsAMDGPU.td:117-121
+// Set EXEC according to a thread count packed in an SGPR input:
+//    thread_count = (input >> bitoffset) & 0x7f;
+// This is always moved to the beginning of the basic block.
+def int_amdgcn_init_exec_from_input : Intrinsic<[],
+  [llvm_i32_ty,       // 32-bit SGPR input
----------------
arsenm wrote:
> Why can't you emit this sequence and feed that into the first intrinsic?
There are several reasons:
- It's easier this way, because the custom inserter only has to move the COPY opcode to the beginning instead of the whole expression.
- LLVM can't select S_BFM_B64.
- LLVM likely can't select S_CMP_U32_EQ in this case.
- LLVM can't select S_CMOV_B64.


https://reviews.llvm.org/D31762





More information about the llvm-commits mailing list