[Mlir-commits] [mlir] [mlir][gpu] Allow subgroup reductions over 1-d vector types (PR #76015)

Wed Dec 20 08:30:04 PST 2023

kuhar wrote:

> > If we only allowed `vector<1xi32>` we would lose the semantic information that the lowering should perform reduction over `f32`. This is fine for `gpu.shuffle` as the reduction part happens in between the data movement, but not for `subgroup_reduce` which does perform reductions.
> 
> You're right, I was only thinking in the shuffling part. To be more specific my concern is things like `vector<4xf64>` or larger. Don't get me wrong I'm all in for supporting things like `vector<4xi8>`, but I'm not convinced it should support all sizes.

The main point of supporting more sizes in `gpu.subgroup_reduce` (not `gpu.shuffle`) is to allow for a gradual lowering path. Also, SPIR-V is less restrictive here: https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformFAdd and https://registry.khronos.org/SPIR-V/specs/unified1/SPIRV.html#OpGroupNonUniformShuffleXor. This is another reason why I don't want to encode this nvidia/amd gpu specific detail into the type system -- we can always legalize these based on what we target.

https://github.com/llvm/llvm-project/pull/76015