[PATCH] D104911: [OpenMP] Match initial thread pattern on AMDGPU

Fri Jun 25 06:49:15 PDT 2021

JonChesterfield added a comment.

Ah, nice catch. I have not been paying enough attention to OpenMPOpt, this pattern will indeed miss on amdgcn.

__kmpc_amdgcn_gpu_num_threads is a library function because there is no corresponding intrinsic. I think each of the nvptx intrinsics has either a corresponding amdgcn intrinsic or a corresponding function call, but there might also be some things that are a scalar constant on one arch and a function returning a constant on the other.

For this patch, I'm wondering if we can use a single pattern, preceded by:
auto &&m_BlockSize = nvidia ? m_Intrinsic<Intrinsic::nvvm_read_ptx_sreg_ntid_x>() : m_Intrinsic<Intrinsic::some-amd-name>();

In the general case, I'd like the Opt layer to be more architecture agnostic than this. Could we insert functions at codegen like 'amdgpu_get_block_size', pattern match those in the IR opt, and lower them to the nvptx or amdgcn intrinsics later on?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104911/new/

https://reviews.llvm.org/D104911