[PATCH] D99949: [AMDGPU][OpenMP] Add amdgpu-arch tool to list AMD GPUs installed

Tue Apr 6 06:32:41 PDT 2021

JonChesterfield added a comment.

This change is partly motivated by wanting to check in runtime tests for openmp that execute on whatever hardware is available locally. It is functionally similar to an out of tree bash script called mygpu that contains manually curated tables of pci.ids and to a python script called rocm_agent_enumerator that calls a c++ tool called rocminfo and tries to parse the output, with a different table of pci.ids for when that fails.

Ultimately, the bottom of this stack is a map from pci.id to gfx number stored in the user space driver thunk library, roct. That is linked into hsa. It would be simpler programming to copy&paste that map from roct into the openmp clang driver at the cost of inevitable divergence between the architecture clang detects and the architecture the runtime libraries detect. Spawning a process and reading stdout is a bit messy, but it beats copying the table, and it beats linking the gpu driver into clang in order to get at the table of numbers. This seems the right balance to me.

It should be possible to do something similar with cuda for nvptx, but that should be a separate executable. Partly to handle the permutations of cuda / hsa that may be present on a system. I haven't found the corresponding API call in cuda. The standalone tool nvidia-smi might be willing to emit sm_70 or similar to stdout, but I haven't found the right command line flags for that either. Rocminfo does not appear to be configurable, and is not necessarily present when compiling for amdgpu.

A bunch of comments inline, mostly style. I think there's a use-after-free bug.

It looks like our existing command line handling could be more robust. In particular, there should be error messages about march where there seem to be asserts. That makes it slightly unclear how we should handle things like the helper tool returning a string clang doesn't recognise (e.g. doesn't start with gfx). Something to revise separate to this patch I think.

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:78
+detectSystemGPUs(const ToolChain &T) {
+  auto Program = T.GetProgramPath("amdgpu-arch");
+  llvm::SmallVector<llvm::StringRef, 8> execArgs;
----------------
AMDGPU_ARCH_PROGRAM_NAME?

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:83
+                                     OutputFile);
+  llvm::FileRemover OutputRemover(OutputFile.c_str());
+  llvm::Optional<llvm::StringRef> Redirects[] = {
----------------
This looks like there are too many stringrefs. Redirecting stdout to a temporary file seems reasonable, but I'd expect the pointers into OutputBuf to be invalid when it drops out of scope. Perhaps return a smallvector of smallstrings instead?

Also, we're probably expecting fewer than 8 different gpus, probably as few as 1 in the most common case, so maybe a smallvector<type,1>

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:90
+
+  if (const int RC =
+          llvm::sys::ExecuteAndWait(Program.c_str(), execArgs, {}, Redirects)) {
----------------
`s/const int RC =//`

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:91
+  if (const int RC =
+          llvm::sys::ExecuteAndWait(Program.c_str(), execArgs, {}, Redirects)) {
+    return {};
----------------
can we pass {} for execArgs here?

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:111
+  }
+  if (GPUArchs.size() > 1) {
+    bool AllSame = std::all_of(
----------------
Perhaps run all_of against the whole range and drop if size () > 1 test?

================
Comment at: clang/lib/Driver/ToolChains/AMDGPUOpenMP.cpp:208
   assert(GPUArch.startswith("gfx") && "Unsupported sub arch");
+  assert(!GPUArch.empty() && "Unable to detect system GPU");

----------------
We shouldn't be handling unknown or missing march= fields with asserts. I see that this is already the case in multiple places, so let's go with a matching assert for this and aspire to fix that in a separate patch.

================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:32
+
+    HSAAgentCollector *Self = static_cast<HSAAgentCollector *>(Data);
+    char GPUName[64];
----------------
Simpler code if we drop the class and pass in the vector<string> itself as the void*

================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:34
+    char GPUName[64];
+    Status = hsa_agent_get_info(Agent, HSA_AGENT_INFO_NAME, GPUName);
+    if (Status != HSA_STATUS_SUCCESS) {
----------------
Does this null terminate for any length of GPU name? Wondering if we should explicitly zero out the last char.

================
Comment at: clang/tools/amdgpu-arch/AMDGPUArch.cpp:45
+    if (Status != HSA_STATUS_SUCCESS) {
+      fprintf(stderr, "Unable to initialize HSA\n");
+    }
----------------
Unsure these should be writing to stderr. We capture stdout, stderr probably goes to the user. We could exit 1 instead as clang is going to treat any failure to guess the arch identically 

================
Comment at: clang/tools/amdgpu-arch/CMakeLists.txt:11
+
+find_package(hsa-runtime64 QUIET 1.2.0 HINTS ${CMAKE_INSTALL_PREFIX} PATHS /opt/rocm)
+if (NOT ${hsa-runtime64_FOUND})
----------------
Assuming this matches the logic in amdgpu openmp plugin

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99949/new/

https://reviews.llvm.org/D99949