[llvm] [amdgpu] Add llvm.amdgcn.init.whole.wave intrinsic (PR #105822)

via llvm-commits llvm-commits at lists.llvm.org
Tue Sep 3 19:44:11 PDT 2024


ruiling wrote:

I don't think we can simply remove the init.exec.wave in the entry block. Think about the case like below where we have a use of the function argument in tail block:
```
define amdgpu_cs_chain void @basic(<3 x i32> inreg %sgpr, ptr inreg %callee, i32 inreg %exec, { i32, ptr addrspace(5), i32, i32 } %vgpr) {
entry:
  %entry_exec = call i1 @llvm.amdgcn.init.whole.wave()
  br i1 %entry_exec, label %shader, label %tail

shader:
   ...
  br label %tail

tail:
  %x = extractelement <16 x i32> %vgpr, i32 0
  %y = add i32 %x, 32    ;<-- the instruction should operate on all the lanes of %x
  %vgpr.new = insertelement <16 x i32> %vgpr, i32 %y, i32 0
  call void(ptr, i32, <3 x i32>, { i32, ptr addrspace(5), i32, i32 }, i32, ...) @llvm.amdgcn.cs.chain(ptr %callee, i32 %exec, <3 x i32> inreg %sgpr, { i32, ptr addrspace(5), i32, i32 } %vgpr.new)
  unreachable
}
```
This would be very likely lowered to something like: (Sorry it is in very rough shape.)
```
 entry:
   ; setup exec for whole wave mode
   %arg = COPY $vgpr8
   br %old_exec, %shader, %tail

shader:
 ....

tail:
 %new_arg = V_ADD_I32 %arg, 32
 SI_CS_CHAIN .... %new_arg
```
The point is when have an instruction in tail block which want to operate on a function argument, I think the expectation is the `V_ADD_I32` here should operate(read/write) all the lanes. If we remove the exec setup in the entry block, the instruction would not be able to see the values in the lanes that were inactive at function start. Does this make sense?

https://github.com/llvm/llvm-project/pull/105822


More information about the llvm-commits mailing list