[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)

Princeton Ferro via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 17 23:07:06 PDT 2025


================
@@ -1943,10 +1954,12 @@ define <2 x half> @test_copysign_f32(<2 x half> %a, <2 x float> %b) #0 {
 ; CHECK-F16:       {
 ; CHECK-F16-NEXT:    .reg .b16 %rs<3>;
 ; CHECK-F16-NEXT:    .reg .b32 %r<8>;
+; CHECK-F16-NEXT:    .reg .b64 %rd<2>;
 ; CHECK-F16-EMPTY:
 ; CHECK-F16-NEXT:  // %bb.0:
-; CHECK-F16-NEXT:    ld.param.v2.b32 {%r2, %r3}, [test_copysign_f32_param_1];
 ; CHECK-F16-NEXT:    ld.param.b32 %r1, [test_copysign_f32_param_0];
+; CHECK-F16-NEXT:    ld.param.v2.b32 {%r2, %r3}, [test_copysign_f32_param_1];
+; CHECK-F16-NEXT:    mov.b64 %rd1, {%r2, %r3};
----------------
Prince781 wrote:

I've found out this is due to leftover `CopyToReg` nodes that are optimized away in `-O3` but this test case is in `-O0`.

https://github.com/llvm/llvm-project/pull/126337


More information about the llvm-commits mailing list