[clang] [amdgpu-arch] Replace use of HSA with reading sysfs directly (PR #116651)

Mon Nov 18 09:54:12 PST 2024

yxsamliu wrote:

> > > @jhuber6 can you comment on "lot of overhead" and if that matters? Also, not sure why the HSA library dependence is a problem. This seems to be exposing amdgpu-arch to more maintenance overhead.
> > 
> > 
> > Sometimes the driver will hang and since this is used inside of `clang` to support `--offload-arch=native` I've had cases where the compiler hangs forever, so I added a timeout to keep it from doing that in the past. This removes that possibility entirely. I have also had reports from cluster users that it becomes very slow when others are stressing the GPU. It's faster and since this will be installed on every single LLVM build, not everyone has ROCm so it would be nice for this to work. I think ti's fair to do this as the fast-path on Linux systems and then fall-back to HIP if something goes terribly wrong.
> 
> I don't really understand why cluster users are compiling on a system where the GPUs are being stressed, and I still don't see why it's a good idea to break layering for this case. Also, I wasn't aware that the "native" offload arch is supported by ROCm.

The issue also happens to machines with one GPU if amdgpu-arch is executed multiple times in a short time due to some limitation of the driver. --offload-arch=native calls amdpu-arch to get the actual GPU archs.

https://github.com/llvm/llvm-project/pull/116651