[llvm] [NVPTX] support packed f32 instructions for sm_100+ (PR #126337)
Alex MacLean via llvm-commits
llvm-commits at lists.llvm.org
Tue Jun 10 09:30:41 PDT 2025
================
@@ -1943,10 +1954,12 @@ define <2 x half> @test_copysign_f32(<2 x half> %a, <2 x float> %b) #0 {
; CHECK-F16: {
; CHECK-F16-NEXT: .reg .b16 %rs<3>;
; CHECK-F16-NEXT: .reg .b32 %r<8>;
+; CHECK-F16-NEXT: .reg .b64 %rd<2>;
; CHECK-F16-EMPTY:
; CHECK-F16-NEXT: // %bb.0:
-; CHECK-F16-NEXT: ld.param.v2.b32 {%r2, %r3}, [test_copysign_f32_param_1];
; CHECK-F16-NEXT: ld.param.b32 %r1, [test_copysign_f32_param_0];
+; CHECK-F16-NEXT: ld.param.v2.b32 {%r2, %r3}, [test_copysign_f32_param_1];
+; CHECK-F16-NEXT: mov.b64 %rd1, {%r2, %r3};
----------------
AlexMaclean wrote:
What is going on here? Why is this unused `mov` getting generated? In a case this simple I assume that `ptxas` can handle it but it's a bit concerning to see this and seems like something we should avoid.
https://github.com/llvm/llvm-project/pull/126337
More information about the llvm-commits
mailing list