[libcxx-commits] [libcxx] [libcxx] Add testing configuration for GPU targets (PR #104515)
Joseph Huber via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Aug 30 08:31:47 PDT 2024
jhuber6 wrote:
> > > Is it possible to get the added CI setup alongside this?
> >
> >
> > What would that look like? I've talked w/ @jplehr and @Artem-B about getting a build for these targets at least set-up. I don't think a full CI tester will be available for awhile -- there's still lots of failing tests and it takes a _long_ time to run.
>
> How long does it take to run the tests? What's the reason for it being that slow?
There's a lot of factors that contribute to it being really slow.
1. GPU backends in general are slow due to a lot of extra IPO passes (attributor) and instruction scheduling being more complicated.
2. Everything needs to be done through LTO because there's no backwards compatibility. This effectively allows us to defer the final architecture until it's linked in by the user. The `libc` test suite actually has an entirely separate target that's used for testing, so we could build that separately for a single target. Normally I'd say AMDGPU ELF linking isn't supported at all, but there's a good chance it would work anyway since all these massive unit tests aren't using anything problematic and likely use 100% of the register budget anyways.
3. These tests are all running on a single GPU thread. This makes it really easy to test things on the GPU, but GPUs aren't known for their single threaded performance. Performance wise, you're probably looking at two orders of magnitude slower than a server CPU. This is exacerbated by the fact that I force all of these jobs to be run _serially_. This is because the GPU drivers are prone to locking up and spuriously failing if you spam them with >64 threads trying to use the GPU at once. (This basically just claims a file lock internally so only one test can use the GPU at a time).
>From my current configuration, I'd say an average test run takes about an hour on my server. The bot that runs the NVPTX tests for example is nowhere near as powerful as my computer, so expect that to take like 10 hours?
https://github.com/llvm/llvm-project/pull/104515
More information about the libcxx-commits
mailing list