[llvm] [AMDGPU] Enable "amdgpu-uniform-intrinsic-combine" pass in pipeline. (PR #128687)
Pankaj Dwivedi via llvm-commits
llvm-commits at lists.llvm.org
Tue Feb 25 21:54:01 PST 2025
================
@@ -1,3 +1,4 @@
+; XFAIL: *
----------------
PankajDwivedi-25 wrote:
In the below test case:
> define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
%sign = fpext bfloat %sign.bf16 to float
%op = call float @llvm.copysign.f32(float %mag, float %sign)
%cast = bitcast float %op to i32
%readlane = call i32 @llvm.amdgcn.readfirstlane(i32 %cast)
ret i32 %readlane
}
This pass optimizes the **readfirstlane** because UI says the argument is uniform.
> define amdgpu_ps i32 @s_copysign_f32_bf16(float inreg %mag, bfloat inreg %sign.bf16) {
%sign = fpext bfloat %sign.bf16 to float
%op = call float @llvm.copysign.f32(float %mag, float %sign)
%cast = bitcast float %op to i32
ret i32 %cast
}
**Isel:**
Later, Shader kernel calling convention expects the return value to be in SGPR, **instruction selection** introduced **COPY** which is illegal/wrong here.
> bb.0 (%ir-block.0):
liveins: $sgpr0, $sgpr1
%1:sgpr_32 = COPY $sgpr1
%0:sgpr_32 = COPY $sgpr0
%3:sreg_32 = S_MOV_B32 2147483647
%5:vgpr_32 = COPY %0:sgpr_32
%6:vgpr_32 = COPY %1:sgpr_32
%4:vgpr_32 = V_BFI_B32_e64 killed %3:sreg_32, %5:vgpr_32, %6:vgpr_32, implicit $exec
**$sgpr0 = COPY %4:vgpr_32**
SI_RETURN_TO_EPILOG $sgpr0
https://github.com/llvm/llvm-project/pull/128687
More information about the llvm-commits
mailing list