[PATCH] D140069: [DAGCombiner] Scalarize vectorized loads that are splatted

Tue Dec 20 12:11:45 PST 2022

luke added subscribers: ruiling, foad.
luke added inline comments.

================
Comment at: llvm/test/CodeGen/AMDGPU/vector_shuffle.packed.ll:1905-1914
 define <2 x half> @hi16bits(ptr addrspace(1) %x0, ptr addrspace(1) %x1) {
 ; GFX9-LABEL: hi16bits:
 ; GFX9:       ; %bb.0: ; %entry
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX9-NEXT:    global_load_dword v4, v[0:1], off
-; GFX9-NEXT:    global_load_dword v5, v[2:3], off
-; GFX9-NEXT:    s_mov_b32 s4, 0x7060302
+; GFX9-NEXT:    global_load_dword v2, v[2:3], off
 ; GFX9-NEXT:    s_waitcnt vmcnt(0)
+; GFX9-NEXT:    global_load_short_d16 v2, v[0:1], off offset:2
----------------
@foad @ruiling Apologies if I'm pinging the wrong people here, just wanted to get some AMDGPU eyes over this.
>From what I understand this looks like a regression since the two loads aren't dispatched in tandem anymore, there's separate waits. Are there any suggestions as to how to avoid this/are there any target info hooks that might be relevant here?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D140069/new/

https://reviews.llvm.org/D140069