[llvm] AMDGPU: Improve codegen for intrinsic llvm.fptrunc.round (PR #104486)
Changpeng Fang via llvm-commits
llvm-commits at lists.llvm.org
Fri Aug 16 10:04:50 PDT 2024
================
@@ -3,6 +3,15 @@
; RUN: llc -global-isel=0 -mtriple=amdgcn -mcpu=gfx1010 < %s | FileCheck -check-prefixes=CHECK,SDAG %s
; RUN: llc -global-isel -mtriple=amdgcn -mcpu=gfx1030 < %s | FileCheck -check-prefixes=CHECK,GISEL %s
+define amdgpu_gs half @v_fptrunc_round_f32_to_f16_tonearest(float %a) {
+; CHECK-LABEL: v_fptrunc_round_f32_to_f16_tonearest:
+; CHECK: ; %bb.0:
+; CHECK-NEXT: v_cvt_f16_f32_e32 v0, v0
+; CHECK-NEXT: ; return to shader part epilog
+ %res = call half @llvm.fptrunc.round.f16.f32(float %a, metadata !"round.tonearest")
----------------
changpeng wrote:
At custom lowering the the intrinsic, we can turn it to fp_round, which can also generated the expected output for "round.tonearest". However, it seems the lowering of fp_round ignores the rounding mode operand in most cases (just assume default of round-to-nearest-even)
https://github.com/llvm/llvm-project/pull/104486
More information about the llvm-commits
mailing list