[PATCH] D75052: [MLIR][GPU] Properly model step in parallel loop to gpu conversion.

Tue Feb 25 04:42:26 PST 2020

herhut added a comment.

Thanks for the review!

================
Comment at: mlir/test/Conversion/LoopsToGPU/parallel_loop.mlir:230
       loop.parallel (%arg5, %arg6) = (%c0, %c0) to (%3, %5) step (%c1, %c1) {
-        %17 = load %6[%arg5, %arg6] : memref<?x?xf32, #map2>
-        %18 = load %11[%arg5, %arg6] : memref<?x?xf32, #map2>
-        %19 = load %16[%arg5, %arg6] : memref<?x?xf32, #map2>
+        %17 = load %6[%arg5, %arg6] : memref<?x?xf32, #map3>
+        %18 = load %11[%arg5, %arg6] : memref<?x?xf32, #map3>
----------------
bondhugula wrote:
> Side question: where aren't we using affine.load/store instead of load/store and loop.parallel -> affine.parallel here? With the former, you'll get things like store to load fwd'ing, redundant load elimination, composition of ops supplying subscript values into the load/store itself, etc., infra for all of which exist and whenever you need them. All the mapping metadata should nicely fit into affine.parallel as well.
It is not pure coincidence that the mapping data fits :)

My hope is that this mapper will work equally well with affine.parallel. However, I do not want to restrict it to affine and currently the code we feed into this is not based on affine.parallel. I expect that we will generalize things in that direction eventually but would also be very happy if someone else looks into that.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D75052/new/

https://reviews.llvm.org/D75052