[PATCH] D104911: [OpenMP] Match initial thread pattern on AMDGPU

Fri Jun 25 07:07:40 PDT 2021

jhuber6 added a comment.

In D104911#2840699 <https://reviews.llvm.org/D104911#2840699>, @JonChesterfield wrote:

> Ah, nice catch. I have not been paying enough attention to OpenMPOpt, this pattern will indeed miss on amdgcn.
>
> __kmpc_amdgcn_gpu_num_threads is a library function because there is no corresponding intrinsic. I think each of the nvptx intrinsics has either a corresponding amdgcn intrinsic or a corresponding function call, but there might also be some things that are a scalar constant on one arch and a function returning a constant on the other.

I was debating whether or not to enter this as an intrinsic or at least an RTL function in OMPKinds.def and just settled on this ugly string comparison.

> For this patch, I'm wondering if we can use a single pattern, preceded by:
> auto &&m_BlockSize = nvidia ? m_Intrinsic<Intrinsic::nvvm_read_ptx_sreg_ntid_x>() : m_Intrinsic<Intrinsic::some-amd-name>();

The patterns are slightly different without the difference in finding the block size, AMD uses a constant bit mask while Nvidia derives it from the warp size.

> In the general case, I'd like the Opt layer to be more architecture agnostic than this. Could we insert functions at codegen like 'amdgpu_get_block_size', pattern match those in the IR opt, and lower them to the nvptx or amdgcn intrinsics later on?

This is planned when we switch over to the new device runtime library where it will be a simple comparison on a TID function to zero. Right now it needs to do some weird calls to determine if a thread is inside the "master warp" for the runtime library.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104911/new/

https://reviews.llvm.org/D104911