[llvm] [AMDGPU] SelectionDAG support for vector type set 0 to multiple sgpr64 (PR #128017)

Fri Feb 21 08:11:22 PST 2025

JanekvO wrote:

> I've been working on enabling MachineCSE to work for subregister extracts, and improving folds of copies through sub registers. In doing this, I see a number of places where we get better coalescing and re-assemble materialize of 64-bit inline immediates.

This is exactly what would fix the aforementioned regressions. In the case prior to s_movb64 materialization, the MachineCSE eliminates the duplicate s_mov_b32 but it doesn't have the logic for subregister materialized immediate CSE'ing.

> I've been thinking we should teach foldImmediate and/or SIFoldOperands to reconstruct 64-bit moves. e.g. foldImmediate can see the use instruction is a reg_sequence with the same register used twice, and replace it with an s_mov_b64. It also currently skips any constants with multiple uses, but the correct heuristic is probably more refined (like only skip multiple uses for non inline-immediate)

One of the reasons I tried to solve this in the selectiondag is because it has non-opaque and nicely described helpers for checking if there's a splatted vector. Additionally, my plan was to add s_mov_b64 -> v_mov_b64 folding (for gfx942+) in SIFoldOperands and I didn't have to fit in both s_mov_b32 x2 -> s_mov_b64 and s_mov_b64 -> v_mov_b64 in the same pass.

Anywho, I'll check some of the mentioned alternatives cause this approach also requires a separate change for globalisel to do the same

https://github.com/llvm/llvm-project/pull/128017