[PATCH] D82930: [HIP] Fix rocm detection

Yaxun Liu via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 6 14:39:24 PDT 2020


yaxunl marked 4 inline comments as done.
yaxunl added inline comments.


================
Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:167
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> VersionFile =
+        FS.getBufferForFile(BinPath + "/.hipVersion");
+    if (!VersionFile)
----------------
yaxunl wrote:
> tra wrote:
> > arsenm wrote:
> > > yaxunl wrote:
> > > > arsenm wrote:
> > > > > arsenm wrote:
> > > > > > yaxunl wrote:
> > > > > > > arsenm wrote:
> > > > > > > > I don't think the detection should fail if this is missing
> > > > > > > why? this file is unique to HIP installation. If it is missing, the installation is broken.
> > > > > > Because I should be able to use this without a complete hip installation. Without a specified version, it should assume the most modern layout. This will for example break pointing --rocm-path at the device library build directory for library tests
> > > > > I also don't see what value checking the version really provides; it may be informative to print it, but I don't think it's useful to derive information from it
> > > > what is the directory structure of your most modern layout?
> > > /opt/rocm/amdgcn/bitcode/foo.bc
> > > 
> > > The plan is to remove this and rely on symlinks in the resource directory though
> > > I also don't see what value checking the version really provides; it may be informative to print it, but I don't think it's useful to derive information from it
> > 
> > In CUDA it's used to detect which back-end features to enable (they depend on what's supported by ptxas supplied by that CUDA version). I don't think that would be relevant to AMDGPU as it does not need external dependencies to generate GPU code.
> > 
> > It may be useful for filesystem layout detection. E.g. where to find bitcode files and what names to expect. This part seems to be relevant for ROCm, but I'm not sure if the layout is closely tied to the version.
> > 
> We are required to support previous ROCm releases for certain time range. To do that we need to detect HIP version and enable/disable certain HIP features based on HIP version when necessary.
> 
> Therefore if we have a new directory structure for ROCm, that directory structure should contain a HIP version file so that we can detect HIP version.
> 
> 
Currently clang includes some wrapper headers by default, which does not work with ROCm 3.5 since those device functions are defined in HIP headers of ROCm 3.5. To support ROCm 3.5, we have to disable including those wrapper headers. This is one example why we need detect HIP version.


================
Comment at: clang/lib/Driver/ToolChains/AMDGPU.cpp:167
+    llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> VersionFile =
+        FS.getBufferForFile(BinPath + "/.hipVersion");
+    if (!VersionFile)
----------------
yaxunl wrote:
> yaxunl wrote:
> > tra wrote:
> > > arsenm wrote:
> > > > yaxunl wrote:
> > > > > arsenm wrote:
> > > > > > arsenm wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > arsenm wrote:
> > > > > > > > > I don't think the detection should fail if this is missing
> > > > > > > > why? this file is unique to HIP installation. If it is missing, the installation is broken.
> > > > > > > Because I should be able to use this without a complete hip installation. Without a specified version, it should assume the most modern layout. This will for example break pointing --rocm-path at the device library build directory for library tests
> > > > > > I also don't see what value checking the version really provides; it may be informative to print it, but I don't think it's useful to derive information from it
> > > > > what is the directory structure of your most modern layout?
> > > > /opt/rocm/amdgcn/bitcode/foo.bc
> > > > 
> > > > The plan is to remove this and rely on symlinks in the resource directory though
> > > > I also don't see what value checking the version really provides; it may be informative to print it, but I don't think it's useful to derive information from it
> > > 
> > > In CUDA it's used to detect which back-end features to enable (they depend on what's supported by ptxas supplied by that CUDA version). I don't think that would be relevant to AMDGPU as it does not need external dependencies to generate GPU code.
> > > 
> > > It may be useful for filesystem layout detection. E.g. where to find bitcode files and what names to expect. This part seems to be relevant for ROCm, but I'm not sure if the layout is closely tied to the version.
> > > 
> > We are required to support previous ROCm releases for certain time range. To do that we need to detect HIP version and enable/disable certain HIP features based on HIP version when necessary.
> > 
> > Therefore if we have a new directory structure for ROCm, that directory structure should contain a HIP version file so that we can detect HIP version.
> > 
> > 
> Currently clang includes some wrapper headers by default, which does not work with ROCm 3.5 since those device functions are defined in HIP headers of ROCm 3.5. To support ROCm 3.5, we have to disable including those wrapper headers. This is one example why we need detect HIP version.
I think we need to separate HIP runtime detection and device library detection and also separate hasValidHIPRuntime and hasValidDeviceLibrary. ROCm toolchain only need to make sure hasValidDeviceLibrary but HIP toolchain need both hasValidDeviceLibrary and hasValidHIPRuntime.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D82930/new/

https://reviews.llvm.org/D82930





More information about the cfe-commits mailing list