https://github.com/lialan approved this pull request. Overall LGTM. The next step is to link it with the lowering of coalesced gather dma op. On gfx1250, we emit async version instead. https://github.com/llvm/llvm-project/pull/189279