[Mlir-commits] [mlir] [mlir][gpu] Add `broadcast_lane` op (PR #152808)

Sat Aug 9 06:58:13 PDT 2025

kuhar wrote:

> Can broadcast lane semantics be implemented by gpu.shuffle down + gpu.shffle idx?

This op is inspired by the operations in the vulkan subgroup extension: https://www.khronos.org/blog/vulkan-subgroup-tutorial#:~:text=T-,subgroupBroadcast,-(T%20value%2C%20uint and meant to have performant lowering that uses much cheaper instructions than shuffles.

Shuffles are not the universally best abstraction for modern chips that have much more performant primitives with narrower semantics (e.g., https://github.com/nod-ai/shark-ai/blob/main/docs/amdgpu_kernel_optimization_guide.md#data-parallel-primitives-and-warp-level-reduction + new v_permlane in CDNA4). If we restrict ourselves to shuffles, we have to do some heavy pattern matching and make sure that the emitter caters towards these.

https://github.com/llvm/llvm-project/pull/152808