[PATCH] D80142: [mlir][gpu][rocdl] Introduce GPUToROCm conversion passes.

Wed May 20 10:54:43 PDT 2020

whchung marked an inline comment as done.
whchung added inline comments.

================
Comment at: mlir/lib/Conversion/GPUToROCm/ConvertKernelFuncToHsaco.cpp:50
+/// The function body is erased.
+class GpuKernelToHsacoPass
+    : public PassWrapper<GpuKernelToHsacoPass,
----------------
herhut wrote:
> whchung wrote:
> > herhut wrote:
> > > This is a lot of common code. Could we have a base class, e.g., `GpuKernelToLLVMGeneratedBlobPass', that factors out all the common logic but has extensions points for the LLVM target to use, etc.?
> > @herhut  I would argue factoring out common logic to a base class is a premature optimization at this point.
> > 
> > In D80167, logic for lowering `gpu.launch` to runtime API invocations are more or less the same between the two platforms so I think it's good to make the logic generic.
> > 
> > In this patch however,  passes which get GPUModule to binary blobs essentially do following tasks:
> > 
> > 1) acquire GPUModule, and acquire a LLVM context lock.
> > 2) initialize LLVM backend.
> > 3) translate the GPUModule to llvm/nvvm/rocdl dialect.
> > 4) invoke LLVM backend, specify triple / datalayout / target features.
> > 5) invoke platform-specific binary creator (on NV platform is cubin generator, on AMD platform it's LLVM lld).
> > 6) convert the binary into a StringAttr.
> > 
> > 1) and 6) are platform-neutral, but 2) - 5) are highly platform-dependent and may change from time to time.
> > 
> > Therefore, I'd like to keep both paths separate for now. Once ROCm side of logic become more matured we can consider factoring out common logic in subsequent patches.
> The code is pretty much identical, except for
> 
> 1. the target triple
> 2. the need for initializing an LLVM Target because of the triple,
> 3. the use of a different annotation name
> 4. a different compiler callback from ISA to blob.
> 
> 1. The target could also be passed in as a parameter.
> 
> 3. I am not even sure we need a different name, but that could just be a parameter to the pass.
> 
> 4. is already passed in and goes from string -> vector. So there is nothing platform dependent there.
> 
> Remains 2, which we could also be a callback that is passed in, or the user has to make sure that the target is initialized. It does not need to be part of the pass itself.
> 
> The only future extension I can envision is a different configuration of the lowering pipeline during code generation. That could also be done via a callback that gets a LegacyPassManager as parameter. Similar to how the LLVM lowering can be configured.
> 
> Also, if we split this now, it will likely diverge, making later refactoring harder.
> 
@herhut Points taken. I'm working on finalizing the implementation of `mlir-rocm-runner`. Let me revise the patch after its done in a couple of days.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80142/new/

https://reviews.llvm.org/D80142