[llvm] [AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (PR #128687)

Pankaj Dwivedi via llvm-commits llvm-commits at lists.llvm.org
Tue Feb 25 21:54:01 PST 2025


================
@@ -1,3 +1,4 @@
+; XFAIL: *  
----------------
PankajDwivedi-25 wrote:

In the below test case:

> define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
  %sign = fpext bfloat %sign.bf16 to float
  %op = call float @llvm.copysign.f32(float %mag, float %sign)
  %cast = bitcast float %op to i32
  %readlane = call i32 @llvm.amdgcn.readfirstlane(i32 %cast)
  ret i32 %readlane
}

This pass optimizes the **readfirstlane** because UI says the argument is uniform. 

> define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
  %sign = fpext bfloat %sign.bf16 to float
  %op = call float @llvm.copysign.f32(float %mag, float %sign)
  %cast = bitcast float %op to i32
  ret i32 %cast
}

**Isel:** 
Later, Shader kernel calling convention expects the return value to be in SGPR, **instruction selection** introduced **COPY** which is illegal/wrong here.

> bb.0 (%ir-block.0):
  liveins: $sgpr0, $sgpr1
  %1:sgpr_32 = COPY $sgpr1
  %0:sgpr_32 = COPY $sgpr0
  %3:sreg_32 = S_MOV_B32 2147483647
  %5:vgpr_32 = COPY %0:sgpr_32
  %6:vgpr_32 = COPY %1:sgpr_32
  %4:vgpr_32 = V_BFI_B32_e64 killed %3:sreg_32, %5:vgpr_32, %6:vgpr_32, implicit $exec
  **$sgpr0 = COPY %4:vgpr_32**
  SI_RETURN_TO_EPILOG $sgpr0




https://github.com/llvm/llvm-project/pull/128687


More information about the llvm-commits mailing list