[clang] [AMDGPU] Introduce 'amdgpu_num_workgroups_{xyz}' builtin (PR #83927)

Tue Mar 5 20:06:00 PST 2024

jhuber6 wrote:

> I think we would be better off teaching an IR optimizer pass to recognize the divide pattern and remap it to the load from the new location, rather than forcing the complexity into every frontend

That's fair. I would've argued that this version should've been the builtin and the grid size be the computed one but it's definitely not ideal to have multiple versions of this. I'll try to find a place to do this peephole optimization. 

https://github.com/llvm/llvm-project/pull/83927