[clang] [Clang] Add timeout for GPU detection utilities (PR #94751)

Joseph Huber via cfe-commits cfe-commits at lists.llvm.org
Fri Jun 7 11:37:18 PDT 2024


jhuber6 wrote:

> Ooh... I think I know exactly what may be causing this.

I've observed this a few times. For my case it's usually when some application hangs on the GPU and no one notices, then these tools hang forever and it takes awhile to notice. Figured an error is friendlier since I highly doubt these tools will take over ten seconds to run even in the worst case.

> On machines where NVIDIA GPUs are used for compute only (e.g. a headless server machine), NVIDIA drivers are not always loaded by default and may not have driver persistence enabled.

What's the config to set this by default without any graphics? Would be nice to not need to worry about it on my dev machine.



> For the GPU detection, we may be able to work around the issue by leaving the detection app running for the duration of the compilation, and prevent driver unloading, but it's a rather gross hack.

I know for AMD stuff we used to just probe the PCI connections, but that leaked a lot of information so this is the easier way to do it. I wonder what `__nvcc_device_query` does internally.

https://github.com/llvm/llvm-project/pull/94751


More information about the cfe-commits mailing list