[PATCH] D80142: [mlir][gpu][rocdl] Introduce GPUToROCm conversion passes.

Tue May 19 09:17:00 PDT 2020

whchung marked an inline comment as done.
whchung added inline comments.

================
Comment at: mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp:50
+/// The function body is erased.
+class GpuKernelToHsacoPass
+    : public PassWrapper<GpuKernelToHsacoPass,
----------------
herhut wrote:
> This is a lot of common code. Could we have a base class, e.g., `GpuKernelToLLVMGeneratedBlobPass', that factors out all the common logic but has extensions points for the LLVM target to use, etc.?
@herhut  I would argue factoring out common logic to a base class is a premature optimization at this point.

In D80167, logic for lowering `gpu.launch` to runtime API invocations are more or less the same between the two platforms so I think it's good to make the logic generic.

In this patch however,  passes which get GPUModule to binary blobs essentially do following tasks:

1) acquire GPUModule, and acquire a LLVM context lock.
2) initialize LLVM backend.
3) translate the GPUModule to llvm/nvvm/rocdl dialect.
4) invoke LLVM backend, specify triple / datalayout / target features.
5) invoke platform-specific binary creator (on NV platform is cubin generator, on AMD platform it's LLVM lld).
6) convert the binary into a StringAttr.

1) and 6) are platform-neutral, but 2) - 5) are highly platform-dependent and may change from time to time.

Therefore, I'd like to keep both paths separate for now. Once ROCm side of logic become more matured we can consider factoring out common logic in subsequent patches.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80142/new/

https://reviews.llvm.org/D80142