[PATCH] D142393: [OpenMP] Add 'amdgpu-flat-work-group-size' to OpenMP kernels

Mon Jan 23 13:47:06 PST 2023

yaxunl added inline comments.

================
Comment at: clang/lib/CodeGen/TargetInfo.cpp:9552
     F->addFnAttr("uniform-work-group-size", "true");
+  if (IsOpenMPkernel)
+    F->addFnAttr("amdgpu-flat-work-group-size",
----------------
jhuber6 wrote:
> arsenm wrote:
> > jhuber6 wrote:
> > > arsenm wrote:
> > > > Probably shouldn’t check the language, just it’s a kernel. Also shouldn’t emit this if it’s the default 1024. I’ve been trying to cut down on the superfluous attribute spam
> > > There's a section for HIP above that does the same. We could probably consolidate here for all "AMDGPU" kernels and get rid of the redundant attribute. Maybe in a separate patch?
> > All the isCUDA || HIP || OpenMP checks scattered around are driving me crazy. A bunch of the out of tree divergent patches are just adding to them. We should just purge everything checking languages to the actual features and stop putting language names in things 
> OpenCL is the odd one out as far as I know, HIP and OpenMP are mostly equivalent as far as attributes go.
OpenCL uses 256 as default max block size. This is to avoid performance regressions for existing apps. HIP uses 1024 by default.

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:61
+        DriverArgs.getLastArgValue(options::OPT_gpu_max_threads_per_block_EQ,
+                                   "1024")));
+
----------------
we should keep the default value in Options.td instead of having multiple copies at different places. save as below.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D142393/new/

https://reviews.llvm.org/D142393