[Mlir-commits] [mlir] [mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (PR #75315)

Sun Dec 17 01:06:30 PST 2023

================
@@ -227,3 +243,20 @@ func.func @make_subgroup_reduce_uniform() {
   }
   return
 }
+
+// -----
+
+// The GPU kernel does not have any side effecting ops, so the entire
----------------
grypp wrote:

Thanks for your patience and explanation. 

`gpu.launch` doesn't write to memory directly, it performs the following actions. It'll still need to read kernel configurations even wtih empty body. However, I think it's safe to fold.

1) Reads host memory for kernel configuration and parameters.
2) Copies kernel parameters to the device memory.
3) Waits for stream(s).

Thinking about the 3rd option with IR below. If `taskC` (that has empty body) is folded , could `taskD` run before `taskB` due to not waiting for `stream2` anymore? (Currently, MLIR utilizes single stream, but one can imagine implementing support for multiple streams .)


```
%taskA = gpu.launch async [%stream1] ...
%taskB = gpu.launch async [%stream2] ...
%taskC = gpu.launch async [%stream1, %stream2] ...
%taskD = gpu.launch async [%stream1] ...
```
task execution graph:
```
a b
 |
 c
 |
 d
```

Is it overkill to think about this case?


https://github.com/llvm/llvm-project/pull/75315