[PATCH] D75287: [mlir][GPU] Expose the functionality to create a gpu.GPUFuncOp from a gpu.GPULaunchOp

Mon Mar 2 13:44:07 PST 2020

mravishankar marked an inline comment as done.
mravishankar added inline comments.

================
Comment at: mlir/lib/Dialect/GPU/Transforms/KernelOutlining.cpp:77
+    Operation *operandOp = operand.getDefiningOp();
+    if (!operandOp || !isInliningBeneficiary(operandOp))
+      continue;
----------------
mravishankar wrote:
> mehdi_amini wrote:
> > mravishankar wrote:
> > > mehdi_amini wrote:
> > > > herhut wrote:
> > > > > ftynse wrote:
> > > > > > mehdi_amini wrote:
> > > > > > > This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit". 
> > > > > > > Also it isn't clear to me why this is done during the outlining and not as a pre-pass. The launch operation with the region abstraction seems perfectly suited to model this. I rather have this exposed in a separate API / as a separate step.
> > > > > > > This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit".
> > > > > > 
> > > > > > The function just seems misnamed, should be something like `shouldSink` because it mixes validity and benefit. In practice, it only returns `true` for `constant` and `dim` operations that don't have side effects.
> > > > > > This whole sinking transformation does not seem safe in general: this should check *legality* rather than "benefit".
> > > > > 
> > > > > Well, it should check both. You do not want to move all legal operation either :)
> > > > > 
> > > > > > Also it isn't clear to me why this is done during the outlining and not as a pre-pass. The launch operation with the region abstraction seems perfectly suited to model this. I rather have this exposed in a separate API / as a separate step
> > > > > 
> > > > > This has purely historical reasons. Not long ago, the `gpu.launch` was closed from above, so this transformation was done when moving to function form. I have a separate pass for this in a local client, which I can send out next week. It just needs tests.
> > > > > 
> > > > > It was implemented as a "post transformation" to the outlining and I would prefer if we do not mix it into the outlining transformation itself. When written separately, the transformations are trivial.
> > > > > It was implemented as a "post transformation" to the outlining and
> > > > 
> > > > Pre-outlining seems easier to manage because region vs inter-procedural (and also can be kept a function pass). 
> > > > 
> > > > > I would prefer if we do not mix it into the outlining transformation itself. When written separately, the transformations are trivial.
> > > > 
> > > > Seems like we're in agreement :)
> > > A pre-pass is fine, but I think it would be better to leave it here. Eventually, it would be good if all transformations can be expressed as a pattern match and rewrite. This "outlining" is essentially converting a gpu.launchOp to a gpu.launchFuncOp. If you need to have a separate pass to sink the instructions, then it breaks the ability of going from loops -> GPU -> NVVM/SPIR-V. I am not saying anybody does this today (not doing this in IREE), but in general it seems like it would be beneficial to have transformations as patterns, and passes as just a light-weight wrapper around patterns.
> > > 
> > > Re: being able to keep it as a function pass, is related to where the gpu.module is created. As set up right now it is put outside of the function that the gpu.launch operation lives in. Thats a a very specific choice and would be very useful to allow "clients" of the outlining to decide where to put the gpu.module. 
> > >  If you need to have a separate pass to sink the instructions, then it breaks the ability of going from loops -> GPU -> NVVM/SPIR-V. 
> > 
> > I don't understand what you mean, can you elaborate?
> > 
> > > but in general it seems like it would be beneficial to have transformations as patterns, and passes as just a light-weight wrapper around patterns.
> > 
> > This is mixing an optimization within an unrelated transformation: this just does not belong here IMO.
> > 
> > > Re: being able to keep it as a function pass, is related to where the gpu.module is created. 
> > 
> > I don't know what you mean or how it answer the point about the function pass right now.
> Re : separate pass to sink instructions.
> 
> The Dialect conversion framework is designed to go from A -> B -> C. If I want to target SPIR-V/NVVM from Linalg dialect vial Loop dialect and GPU dialects (i.e. Linalg -> Loops -> GPU -> SPIRV/NVVM), I can add all the patterns for the conversion into the dialect conversion framework. Currently Loops to GPU dialect is not exposed as a conversion pattern. GPU to SPIRV is. By adding extra steps as a "pre-condition"  will limit the ability to the entire conversion being done using the dialect conversion framework (which is what it is built for). You could add the "sinking" as a canonicalization pattern, but it seems to me this sinking is useful only when the gpu.launch region is outlined to create a gpu.func operation. So doing the sinking during the conversion makes sense to me.
> 
> Re: fusion pass vs module pass
> 
> The current setup of the gpu.launch to gpu.launch_func conversion creates a gpu.module that is inserted just after the function the gpu.launch is in. This makes it a module pass, and this behavior is only relevant for the CUDA/NVVM side of things. For IREE, we are only interested in the device side for now. So we can make this a function pass if we can control where the gpu.module is inserted. See this [[ https://github.com/google/iree/pull/863/commits/5af3e8abe9ad628ca76998aab28fcba22b82baf7 | dependent PR ]] in IREE that uses this change and makes the conversion of gpu.launch to gpu.func as a function pass.
@mehdi_amini : Update on the function pass vs module pass. You were right that the outlining can only be a module pass since the gpu.module also has a symbol so it needs to be added to a module (or an op with symbol table). So I was wrong about that. I update the PR shown above to be module pass as well, but FYI there was no assert when i did it as a function pass.

Just filling in some details about discussion offline. It is true that the sinking could be done as a prepass. If so then it is a separate "pre-processing" pass. It is unclear if sinking can be expressed as a pattern. 

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75287/new/

https://reviews.llvm.org/D75287