[PATCH] D99235: [HIP] Change to code object v4

Tue May 18 11:54:28 PDT 2021

gregrodgers added inline comments.

================
Comment at: clang/lib/Driver/ToolChains/HIP.cpp:116
+  if (getOrCheckAMDGPUCodeObjectVersion(C.getDriver(), Args) >= 4)
+    OffloadKind = OffloadKind + "v4";
   for (const auto &II : Inputs) {
----------------
yaxunl wrote:
> tra wrote:
> > We do not do it for v2/v3. Could you elaborate on what makes v4 special that it needs its own offload kind? 
> > 
> > Will you need to target different object versions simultaneously?
> > If yes, how? AFAICT, the version specified is currently global and applies to all sub-compilations.
> > If not, then do we really need to encode the version in the offload target name?
> Introducing hipv4 is to differentiate with code object version 2 and 3 which are used by HIP applications compiled by older version of clang. ROCm platform is required to keep binary backward compatibility, i.e., old HIP applications built by ROCm 4.0 should run on ROCm 4.1. The bundle ID has different interpretation depending on whether it is version 2/3 or version 4, e.g. 'gfx906' implies xnack and sramecc off with code object v2/3 but implies xnack and sramecc ANY with v4. Since code object version 2/3 uses 'hip', code object version 4 needs to be different, therefore it uses 'hipv4'.
We need to start thinking in terms of offload requirements of a compiled image vs the capabilities of a particular active runtime on a particular GPU.   This concept can eliminate the need for a new offload kind.  For AMD, we would add the requirement of code object v4 (cov4) if built for code object v4 or greater.    This means it can only run on a system with that capability.  This concept works well with requirements xnack+, xnack-, sramecc+ and sramecc-.    The bundle entry id is the offload-kind, the triple, and the list of image requirements.  The gpu type (offload-arch) is really an image requirement.  

In this model, there is no requirement for xnack-any.  The lack of the xnack+ or xnack- requirement implies "any" which means it can run on any capable machine.  

This is a general model that is extensible.   To make this work, a runtime must be able to detect the capabilities for any requirement that could be tagged on an image.  In fact, every requirement of an embedded image must have its capability detected by the runtime for that offload image to be usable.   However, a system's runtime could have more capabilities than the requirements of an image.   So in the case of xnack, the lack of xnack- or xnack+ will be acceptable no matter what the xnack capability of the runtime is.   If the compiler driver puts the requirement cov4 in the bundle entry id requirements field the runtime will not run that image unless the GPU loader supports v4 or greater.     

The clang driver can create the requirement xnack- for code object < 4 on those GPUs that support either xnack mode.   This will ensure  the image will gracefully fail or use an alternative image if the runtime capability is xnack+.

But the cov4 requirement is mostly unrelated to xnack .  It is about the capability of the GPU loader.  If the code object version >= 4, then it will be tagged with the cov4 requirement.   This would prevent an old system that does not have a newer software stack from running an image with a cov4 requirement. 

This general notion of image requirements and runtime capabilities is extensible to other offload architectures.   Suppose cuda version 12 compilation REQUIRES that a cuda version 12 runtime.   Old runtimes would never display cuv12 capability and would fail to run any image created with the requirement cuv12.    

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99235/new/

https://reviews.llvm.org/D99235