[PATCH] D75052: [MLIR][GPU] Properly model step in parallel loop to gpu conversion.

Tue Feb 25 07:57:38 PST 2020

bondhugula added inline comments.

================
Comment at: mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir:230
       loop.parallel (%arg5, %arg6) = (%c0, %c0) to (%3, %5) step (%c1, %c1) {
-        %17 = load %6[%arg5, %arg6] : memref<?x?xf32, #map2>
-        %18 = load %11[%arg5, %arg6] : memref<?x?xf32, #map2>
-        %19 = load %16[%arg5, %arg6] : memref<?x?xf32, #map2>
+        %17 = load %6[%arg5, %arg6] : memref<?x?xf32, #map3>
+        %18 = load %11[%arg5, %arg6] : memref<?x?xf32, #map3>
----------------
herhut wrote:
> bondhugula wrote:
> > Side question: where aren't we using affine.load/store instead of load/store and loop.parallel -> affine.parallel here? With the former, you'll get things like store to load fwd'ing, redundant load elimination, composition of ops supplying subscript values into the load/store itself, etc., infra for all of which exist and whenever you need them. All the mapping metadata should nicely fit into affine.parallel as well.
> It is not pure coincidence that the mapping data fits :)
> 
> My hope is that this mapper will work equally well with affine.parallel. However, I do not want to restrict it to affine and currently the code we feed into this is not based on affine.parallel. I expect that we will generalize things in that direction eventually but would also be very happy if someone else looks into that.
> 
> 
>However, I do not want to restrict it to affine and currently the code 
>we feed into this is not based on affine.parallel. I expect that we will

"The code we feed into this": is the thing that's generating the loop.parallel's available somewhere or is it something that's planned for release in the future?

>generalize things in that direction eventually but would also be very 
>happy if someone else looks into that.

But in order to do that one would also have to look at that converter that's generating the loop dialect ops and switch it to affine dialect ones.  IMO, that'd prevent duplication of a lot of infrastructure in less powerful ways. All of these examples can be represented and transformed (whether or not you need any analysis) with the affine dialect.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75052/new/

https://reviews.llvm.org/D75052