[llvm] [AMDGPU] SIPeepholeSDWA: Add REG_SEQUENCE support (PR #133087)
Matt Arsenault via llvm-commits
llvm-commits at lists.llvm.org
Fri Mar 28 08:36:01 PDT 2025
================
@@ -382,6 +390,97 @@ uint64_t SDWASrcOperand::getSrcMods(const SIInstrInfo *TII,
return Mods;
}
+// The following functions are helpers for dealing with REG_SEQUENCE
+// instructions. Those instructions are used to represent copies to
+// subregisters in SSA form.
+//
+// This pass should be able to peak through REG_SEQUENCE
+// instructions. An access to a subregister of a register defined
+// by a REG_SEQUENCE should be handled as if the register
+// that is being copied to the subregister was accessed.
+// Consider the following example:
+// %1:vgpr_32 = V_ADD_U32_e64 %0, 10, 0
+// %2:vgpr_32 = V_ADD_U32_e64 %0, 20, 0
+// %3:sreg_32 = S_MOV_B32 255
+// %4:vgpr_32 = V_AND_B32_e64 %2, %3
+// %5:vgpr_32, %6:sreg_64_xexec = V_ADD_CO_U32_e64 %1, %4, 0
+//
+// The V_ADD_CO_U32_e64 instructions will be combined with the
+// V_AND_B32_e64 into an SDWA instruction.
+//
+// If one or more of the operands of V_ADD_CO_U32_e64 are accessed
+// through the subregisters of a REG_SEQUENCE as in the following
+// variation of the previous example, the optimization should still be
+// able to proceed in the same way:
+//
+// [...]
+// %4:vreg_64 = REG_SEQUENCE %1, %subreg.sub0, %3, %subreg.sub1
+// %5:sreg_32 = S_MOV_B32 255
+// %6:vgpr_32 = V_AND_B32_e64 %2, %5
+// %7:vreg_64 = REG_SEQUENCE %6, %subreg.sub0, %3, %subreg.sub1
+// %8:vgpr_32, %9:sreg_64_xexec = V_ADD_CO_U32_e64 %4.sub0, %7.sub0, 0
+//
+// To this end, the SDWASrcOperand implementation uses the following
+// functions to find out the register that is used as the source of
+// the subregister value and it uses this register directly instead of
+// the REG_SEQUENCE subregister.
+
+/// Return the subregister of the REG_SEQUENCE \p RegSeq
+/// which is copied from \p Op, i.e. the operand following
+/// \p Op in the operands of \p RegSeq, or nullopt if the
+/// the \p Op is not an operand of \p RegSeq.
+static std::optional<unsigned> regSequenceFindSubreg(const MachineInstr &RegSeq,
+ Register Reg) {
+ if (!RegSeq.isRegSequence())
+ return {};
+
+ auto *End = RegSeq.operands_end();
+ // Operand pair at indices (i+1, i+2) is (register, subregister)
+ for (auto *It = RegSeq.operands_begin() + 1; It != End; It += 2) {
+ if (isSameReg(*It, Reg))
+ return (It + 1)->getImm();
----------------
arsenm wrote:
Looking for an exact match of the register in the input is weird. Normal uses of reg_sequence look for compatibility of the subregister with the use index.
Why does the specific register matter? Usually you're just looking up the chain of copies for the original 32-bit source register. Maybe it's worth extracting ValueTracker out of PeepholeOpt as a utility?
https://github.com/llvm/llvm-project/pull/133087
More information about the llvm-commits
mailing list