[PATCH] D80142: [mlir][gpu][rocdl] Introduce GPUToROCm conversion passes.

Wed May 20 03:45:32 PDT 2020

herhut added inline comments.

================
Comment at: mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp:50
+/// The function body is erased.
+class GpuKernelToHsacoPass
+    : public PassWrapper<GpuKernelToHsacoPass,
----------------
whchung wrote:
> herhut wrote:
> > This is a lot of common code. Could we have a base class, e.g., `GpuKernelToLLVMGeneratedBlobPass', that factors out all the common logic but has extensions points for the LLVM target to use, etc.?
> @herhut  I would argue factoring out common logic to a base class is a premature optimization at this point.
> 
> In D80167, logic for lowering `gpu.launch` to runtime API invocations are more or less the same between the two platforms so I think it's good to make the logic generic.
> 
> In this patch however,  passes which get GPUModule to binary blobs essentially do following tasks:
> 
> 1) acquire GPUModule, and acquire a LLVM context lock.
> 2) initialize LLVM backend.
> 3) translate the GPUModule to llvm/nvvm/rocdl dialect.
> 4) invoke LLVM backend, specify triple / datalayout / target features.
> 5) invoke platform-specific binary creator (on NV platform is cubin generator, on AMD platform it's LLVM lld).
> 6) convert the binary into a StringAttr.
> 
> 1) and 6) are platform-neutral, but 2) - 5) are highly platform-dependent and may change from time to time.
> 
> Therefore, I'd like to keep both paths separate for now. Once ROCm side of logic become more matured we can consider factoring out common logic in subsequent patches.
The code is pretty much identical, except for

1. the target triple
2. the need for initializing an LLVM Target because of the triple,
3. the use of a different annotation name
4. a different compiler callback from ISA to blob.

1. The target could also be passed in as a parameter.

3. I am not even sure we need a different name, but that could just be a parameter to the pass.

4. is already passed in and goes from string -> vector. So there is nothing platform dependent there.

Remains 2, which we could also be a callback that is passed in, or the user has to make sure that the target is initialized. It does not need to be part of the pass itself.

The only future extension I can envision is a different configuration of the lowering pipeline during code generation. That could also be done via a callback that gets a LegacyPassManager as parameter. Similar to how the LLVM lowering can be configured.

Also, if we split this now, it will likely diverge, making later refactoring harder.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80142/new/

https://reviews.llvm.org/D80142