[llvm] [NVPTX] fold movs into loads and stores (PR #144581)
Princeton Ferro via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 17 14:42:37 PDT 2025
================
@@ -514,15 +512,15 @@ define half @reduce_fmin_half_reassoc_nonpow2(<7 x half> %in) {
; CHECK-NEXT: // %bb.0:
; CHECK-NEXT: ld.param.b32 %r1, [reduce_fmin_half_reassoc_nonpow2_param_0+8];
; CHECK-NEXT: mov.b32 {%rs5, %rs6}, %r1;
-; CHECK-NEXT: ld.param.v4.b16 {%rs1, %rs2, %rs3, %rs4}, [reduce_fmin_half_reassoc_nonpow2_param_0];
-; CHECK-NEXT: mov.b32 %r2, {%rs1, %rs2};
-; CHECK-NEXT: mov.b32 %r3, {%rs3, %rs4};
+; CHECK-NEXT: ld.param.v2.b32 {%r2, %r3}, [reduce_fmin_half_reassoc_nonpow2_param_0];
+; CHECK-NEXT: mov.b32 {%rs3, %rs4}, %r3;
+; CHECK-NEXT: mov.b32 {%rs1, %rs2}, %r2;
----------------
Prince781 wrote:
The way this vector type (`<7 x half>`) is lowered creates 7 intermediate `CopyToReg` instructions that are normally optimized out, except this test case is in `-O0`. This is an existing issue that could be solved in another change.
https://github.com/llvm/llvm-project/pull/144581
More information about the llvm-commits
mailing list