[PATCH] D76772: [AMDGPU] Add __builtin_amdgcn_workgroup_size_x/y/z

Wed Mar 25 09:10:11 PDT 2020

arsenm added inline comments.

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13428
+    return Call;
+  return CGF.Builder.CreateAddrSpaceCast(Call, RetTy);
+}
----------------
Why is this necessary? The builtin always has the same return type?

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13435
+  auto *DP = EmitAMDGPUDispatchPtr(CGF);
+  auto *Offset = llvm::ConstantInt::get(CGF.Int32Ty, XOffset + Index * 2);
+  auto *GEP = CGF.Builder.CreateGEP(DP, Offset);
----------------
Comment that this is indexing the hsa_kernel_dispatch_packet sstruct?

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13442
+  llvm::MDBuilder MDHelper(CGF.getLLVMContext());
+  llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1), APInt(16, 1025));
+  LD->setMetadata(llvm::LLVMContext::MD_range, RNode);
----------------
I thought I had a patch to include the maximum group size in AMDGPUTargetInfo to avoid hardcoding it, but I guess it was never committed

================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13443
+  llvm::MDNode *RNode = MDHelper.createRange(APInt(16, 1), APInt(16, 1025));
+  LD->setMetadata(llvm::LLVMContext::MD_range, RNode);
+  return LD;
----------------
Also set it's invariant

================
Comment at: clang/test/CodeGenOpenCL/builtins-amdgcn.cl:539
+void test_get_workgroup_size(int d, global int *out)
+{
+	switch (d) {
----------------
Also run in a hip test, or some case where the addrspacecast is needed?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76772/new/

https://reviews.llvm.org/D76772