[libc-commits] [PATCH] D152486: [libc] Begin implementing a 'libmgpu.a' for math on the GPU
Jon Chesterfield via Phabricator via libc-commits
libc-commits at lists.llvm.org
Fri Jun 9 07:04:48 PDT 2023
JonChesterfield added a comment.
In D152486#4408674 <https://reviews.llvm.org/D152486#4408674>, @arsenm wrote:
> In D152486#4408610 <https://reviews.llvm.org/D152486#4408610>, @JonChesterfield wrote:
>
>> @Matt good news on ocml, thanks. I think we should add a wave size intrinsic, unconditionally expand it somewhere in clang and the backend and replace the current magic variable in ocml with it.
>
> We used to do that, and it doesn't work. We can't codegen different subtargets within the same function like that. Really having wave32 and wave64 coexist in the same module leads to a variety of untenable situations
I don't think that's quite what I'm proposing.
Let ocml be wavesize agnostic, modulo a call to a clang intrinsic __builtin_amdgcn_wavesize() or whatever. Get the ocml IR out of clang however. Some of it is handwritten as IR, some is opencl, whatever.
Let clang take an argument for wavesize, or default it from the arch, or however we want to pick between 32 and 64. Write that information in the IR somewhere and/or pass it to the backend.
Expand the wavesize intrinsic with something that robustly deletes the dead basic block. At no point does an IR module contain wave32 and wave64 code, but some of the library code doesn't known which it's going to be.
I'm aware that it's possible to have a wave32 kernel and a wave64 kernel in the same IR module, and consider that so full of hazards that clang should reject it up front, and people who want to run mixed wavesize kernels from a single application can build them into separate code objects.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D152486/new/
https://reviews.llvm.org/D152486
More information about the libc-commits
mailing list