[Mlir-commits] [mlir] [mlir][gpu] Allow subgroup reductions over 1-d vector types (PR #76015)

Wed Dec 20 08:13:42 PST 2023

kuhar wrote:

> PTX only allows shuffling 32bit registers, so 4xf16 needs 2xshuffle. Can the PR supports that?

@grypp Sure, the lowering chain that I described can be parametrized with the largest shuffle type available.

> So my question is, how do you plan to lower large vectors (ie. `vector<2xf32>`)?
> 
> I'd prefer only accepting `vector<1xi32>`, and let the user use `vector.bitcast`.

@fabianmcg By having patterns to break them down into 'shuffable' chunks. To continue with your example, we would do something like:

1. `%a = gpu.subgroup_reduce add %x : vector<2xf32> -> vector<2xf32>`
2.
  ```mlir
  %a = vector.extract %x[0] : f32 from vector<2xf32> // or extract_strided_slice of vector<1xf32>
  %b = vector.extract %x[1] : f32 from vector<2xf32>
  %c = gpu.subgroup_reduce add %a : f32 -> f32
  %d = gpu.subgroup_reduce add %b : f32 -> f32
  %e = vector.insert %c ...
  %f = vector.insert %d ...
  ```
3. Lower `gpu.subgroup_reduce add` to shuffles if necessary.

https://github.com/llvm/llvm-project/pull/76015