[llvm] [NVPTX] fold movs into loads and stores (PR #144581)

Tue Jun 17 12:28:38 PDT 2025

================
@@ -138,9 +138,9 @@ define ptx_kernel void @foo13(ptr noalias readonly %from, ptr %to) {
 }
 
 ; SM20-LABEL: .visible .entry foo14(
-; SM20: ld.global.v4.b16
+; SM20: ld.global.v2.b32
 ; SM35-LABEL: .visible .entry foo14(
-; SM35: ld.global.nc.v4.b16
+; SM35: ld.global.nc.v2.b32
----------------
Artem-B wrote:

This switch from v4.b16 ld/st to v2.b32 seems to go in the direction opposite of the other test changes that appear to prefer smaller elements to avoid moves. The test case here probably ended up with 32-bit loads and stores because there were no per-element moves in-between.
It's probably fine, but it would be nice to see all the generated instructions.

https://github.com/llvm/llvm-project/pull/144581