[Mlir-commits] [mlir] [mlir][gpu] Add 'cluster_size' attribute to gpu.subgroup_reduce (PR #104851)
Jakub Kuderski
llvmlistbot at llvm.org
Mon Aug 19 14:02:18 PDT 2024
================
@@ -144,6 +186,33 @@ gpu.module @kernels {
gpu.return
}
+ // CHECK-SHFL-LABEL: gpu.func @kernel4_clustered(
+ // CHECK-SHFL-SAME: %[[ARG0:.+]]: vector<2xf16>)
+ gpu.func @kernel4_clustered(%arg0: vector<2xf16>) kernel {
+ // CHECK-SHFL-DAG: %[[C1:.+]] = arith.constant 1 : i32
+ // CHECK-SHFL-DAG: %[[C2:.+]] = arith.constant 2 : i32
+ // CHECK-SHFL-DAG: %[[C32:.+]] = arith.constant 32 : i32
+
+ // CHECK-SHFL: %[[V0:.+]] = vector.bitcast %[[ARG0]] : vector<2xf16> to vector<1xi32>
+ // CHECK-SHFL: %[[I0:.+]] = vector.extract %[[V0]][0] : i32 from vector<1xi32>
+ // CHECK-SHFL: %[[S0:.+]], %{{.+}} = gpu.shuffle xor %[[I0]], %[[C1]], %[[C32]] : i32
+ // CHECK-SHFL: %[[BR0:.+]] = vector.broadcast %[[S0]] : i32 to vector<1xi32>
+ // CHECK-SHFL: %[[BC0:.+]] = vector.bitcast %[[BR0]] : vector<1xi32> to vector<2xf16>
+ // CHECK-SHFL: %[[ADD0:.+]] = arith.addf %[[ARG0]], %[[BC0]] : vector<2xf16>
+ // CHECK-SHFL: %[[BC1:.+]] = vector.bitcast %[[ADD0]] : vector<2xf16> to vector<1xi32>
+ // CHECK-SHFL: %[[I1:.+]] = vector.extract %[[BC1]][0] : i32 from vector<1xi32>
+ // CHECK-SHFL: %[[SL:.+]], %{{.+}} = gpu.shuffle xor %{{.+}}, %[[C2]], %[[C32]] : i32
+ // CHECK-SHFL: %[[BRL:.+]] = vector.broadcast %[[SL]] : i32 to vector<1xi32>
+ // CHECK-SHFL: %[[BCL:.+]] = vector.bitcast %[[BRL]] : vector<1xi32> to vector<2xf16>
+ // CHECK-SHFL: %[[ADDL:.+]] = arith.addf %{{.+}}, %[[BCL]] : vector<2xf16>
+ // CHECK-SHFL: "test.consume"(%[[ADDL]]) : (vector<2xf16>) -> ()
----------------
kuhar wrote:
Do we care about the exact lowering here, or would it be enough to check that we emit N `shuffles xor`s?
https://github.com/llvm/llvm-project/pull/104851
More information about the Mlir-commits
mailing list