[all-commits] [llvm/llvm-project] 2652db: Handling ADD|SUB U64 decomposed Pseudos not gettin...

Wed Nov 16 20:32:31 PST 2022

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 2652db4d68ea3d39ce97b200d0475f4fed804433
      https://github.com/llvm/llvm-project/commit/2652db4d68ea3d39ce97b200d0475f4fed804433
  Author: Yashwant Singh <Yashwant.Singh at amd.com>
  Date:   2022-11-17 (Thu, 17 Nov 2022)

  Changed paths:
    M llvm/lib/Target/AMDGPU/SIPeepholeSDWA.cpp
    M llvm/test/CodeGen/AMDGPU/sdwa-ops.mir
    M llvm/test/CodeGen/AMDGPU/v_add_u64_pseudo_sdwa.ll
    M llvm/test/CodeGen/AMDGPU/v_sub_u64_pseudo_sdwa.ll

  Log Message:
  -----------
  Handling ADD|SUB U64 decomposed Pseudos not getting lowered to SDWA form

This patch fixes some of the V_ADD/SUB_U64_PSEUDO not getting converted to their sdwa form.
We still get below patterns in generated code:
v_and_b32_e32 v0, 0xff, v0
v_add_co_u32_e32 v0, vcc, v1, v0
v_addc_co_u32_e64 v1, s[0:1], 0, 0, vcc

and,
v_and_b32_e32 v2, 0xff, v2
v_add_co_u32_e32 v0, vcc, v0, v2
v_addc_co_u32_e32 v1, vcc, 0, v1, vcc

1st and 2nd instructions of both above examples should have been folded into sdwa add with BYTE_0 src operand.

The reason being the pseudo instruction is broken down into VOP3 instruction pair of V_ADD_CO_U32_e64 and V_ADDC_U32_e64.
The sdwa pass attempts lowering them to their VOP2 form before converting them into sdwa instructions. However V_ADDC_U32_e64
cannot be shrunk to it's VOP2 form if it has non-reg src1 operand.
This change attempts to fix that problem by only shrinking V_ADD_CO_U32_e64 instruction.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D136663