kuhar wrote: In my view, broadcasting a lane is a fundamental primitive both at the level of the SIMT programming model and the hardware. It can be emulated with shuffles, but not efficiently without some idiom recognition. https://github.com/llvm/llvm-project/pull/152808