[clang] [AMDGPU] Do not emit arch dependent macros with unspecified cpu (PR #79660)

Mon Jan 29 09:00:03 PST 2024

jhuber6 wrote:

This seems to have perturbed the HIP build. https://lab.llvm.org/staging/#/builders/22/builds/22

The problem is that we used to set `__AMDGCN_WAVEFRONTSIZE` for the host compilation as well in a bunch of the wave function macros. I think that this is just poor programming, because the host compilation has no what of knowing what the wave size is considering the fact that `--offload-arch=gfx1030,gfx90a` is totally legal and would result in two conflicting wave front sizes from the perspective of the host. The old behavior would just default this to `64` all the time.

We have two solutions, fix the headers so they don't rely on a device-only macro, or just update this to emit a dummy macro on the host.

I like the first better, but given that we've shipped this with ROCm in the past it's likely to require a workaround. @yxsamliu What do you think?

https://github.com/llvm/llvm-project/pull/79660