[clang] [amdgpu-arch] Replace use of HSA with reading sysfs directly (PR #116651)
Jon Chesterfield via cfe-commits
cfe-commits at lists.llvm.org
Mon Nov 18 11:13:28 PST 2024
JonChesterfield wrote:
Oh. I now see there was a bunch of discussion about this, will add some context.
The driver has a hard limit on how many processes can open it at a time. clang calls this utility to ask what gpu to compile for by default. If you put those together, a parallel build on a vaguely modern desktop immediately blows through that process limit and proceeds to fail, so the user has to deliberately build code slowly or specify the gpu by hand, to work around our tooling falling over.
The limit on number of processes is generally reasonable - launching hundreds of processes that all open the GPU and allocate queues and launch kernels is generally a disaster and having the kernel return an equivalent to "too many open" is great. However in the specific case where we are only asking for trivial information, which doesn't need to allocate a queue or do anything whatsoever with the GPU, this is a spurious and annoying limitation.
I suppose it could be "fixed" in the driver - some sort of reference count which is incremented when you do something non-trivial instead of on open - but I'd expect the kernel people to tell us to stop opening hundreds of processes when none of them do any work.
One clumsy outstanding thing is that this should now be some code in a header that clang includes so that instead of the subprocess shell to handle arch=native clang just looks up the information directly.
@b-sumner does the additional context make the design choice clear?
https://github.com/llvm/llvm-project/pull/116651
More information about the cfe-commits
mailing list