[Mlir-commits] [mlir] [mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (PR #75315)

Sun Dec 17 16:59:06 PST 2023

================
@@ -227,3 +243,20 @@ func.func @make_subgroup_reduce_uniform() {
   }
   return
 }
+
+// -----
+
+// The GPU kernel does not have any side effecting ops, so the entire
----------------
matthias-springer wrote:

The kernel configuration and parameters (e.g., number of blocks, etc.) are passed to `gpu.launch` as integer operands, so the op does not read anything from memory (memref) at this point, right? (The op may be lowered to something that reads from memory; that op would then have a memory side effect.)

`gpu.launch` takes async token arguments and produces an async token result. If there is a particular order in which kernels must be launched (and waited for), I would expect this to be modeled with async tokens. IMO, execution order due to "stream is available or not" does not have to be preserved when folding `gpu.launch` ops.

```
%taskA = gpu.launch async [] ...
%taskB = gpu.launch async [] ...
%taskC = gpu.launch async [%taskA, %taskB] ...
%taskD = gpu.launch async [%taskC] ...
```

In the above example, `%taskC` can canonicalized away if it has no side effects, but then `%taskD` will depend on `%taskA` and `%taskB`.


https://github.com/llvm/llvm-project/pull/75315