[PATCH] D71221: [HIP] Add option --gpu-max-threads-per-block=n

Fri Dec 20 05:58:46 PST 2019

yaxunl marked 2 inline comments as done.
yaxunl added a comment.

In D71221#1791802 <https://reviews.llvm.org/D71221#1791802>, @tra wrote:

> What's the use case for this flag?

If a kernel is launched with a block size greater than the default max block size, explicit launch bound is required.

Different projects have different block size usages.

If a project mostly uses block size 1024, it prefers to use 1024 as the default max block size to avoid adding explicit launch bounds to each kernel.

If a project mostly uses block size 256, it prefers to use 256 as the default max block size.

Another situation is that at the initial development stage people prefer a default max block size that works for all possible launching block sizes. Then they can just let the max block size be 1024 by using this option. Later on, they can add launch bounds and choose a different max block size.

On the other hand, we cannot simply let the default max block size be 1024 since we have large sets of existing projects assuming default max block size be 256. Changing the default max block size to 1024 will cause perf degradation in the existing projects. Adding this options provides an option for backward compatibility in case we want to change the default max block size.

================
Comment at: clang/lib/CodeGen/TargetInfo.cpp:8067
+    unsigned MaxThreadsPerBlock =
+        IsHIPKernel ? M.getLangOpts().GPUMaxThreadsPerBlock : 256;
+    std::string AttrVal = std::string("1,") + llvm::utostr(MaxThreadsPerBlock);
----------------
tra wrote:
> The magic value of 256 should be defined as a constant or macro somewhere -- you're using it in multiple places.
> Alternatively, always set LangOpts.GPUMaxThreadsPerBlock to something and skip figuring out the default everywhere else.
For the default value of LangOpts.GPUMaxThreadsPerBlock, it tends to be target dependent. I am thinking probably should add TargetInfo.getDefaultMaxThreadsPerBlock() and use it to set the default value for LangOpts.GPUMaxThreadsPerBlock.

================
Comment at: clang/test/CodeGenCUDA/amdgpu-kernel-attrs.cu:19
+
 __attribute__((amdgpu_flat_work_group_size(32, 64))) // expected-no-diagnostics
 __global__ void flat_work_group_size_32_64() {
----------------
tra wrote:
> Is this the attribute that `__launch_bounds__()` expands to in HIP?
> If __launch_bounds__ is a separate attribute, then, I guess, it should be tested, too.
yes.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D71221/new/

https://reviews.llvm.org/D71221