[llvm] [AMDGPU] Allow shinking instruction with dead sdst (PR #68028)

Stanislav Mekhanoshin via llvm-commits llvm-commits at lists.llvm.org
Mon Oct 2 13:15:11 PDT 2023


https://github.com/rampitec created https://github.com/llvm/llvm-project/pull/68028

Pre-RA pass of instruction shrinking does not shrink instructions with sdst carry-out and carry-in. Instead it sets the allocation hint to VCC and lives it until after RA.

We still can shrink it before RA if the sdst is dead and carry-in is an immediate.

There are some other instructions which will not be shrunk after RA however, because now more instructions will clobber VCC, and if the carry is used after such clobber then RA has no choice but to allocate a non-VCC SGPR. The net effect seems to be positive though, in the affected tests we had 1231 _e64 and 1162 _e32 instructions. Now it is 1010 _e64 instructions and 1359 _e32 according to the diff (it does not add-up exactly because some checks are collapsed for different targets). This is ~8% improvement in shrinking. Also note that regression cases are the old targets without no-carry add/sub instructions, and many of the tests target GFX6.

>From 38d1bbe9ddc656356ab57ce7de1c788ab793e6f8 Mon Sep 17 00:00:00 2001
From: Stanislav Mekhanoshin <Stanislav.Mekhanoshin at amd.com>
Date: Mon, 2 Oct 2023 12:12:24 -0700
Subject: [PATCH] [AMDGPU] Allow shinking instruction with dead sdst

Pre-RA pass of instruction shrinking does not shrink instructions
with sdst carry-out and carry-in. Instead it sets the allocation
hint to VCC and lives it until after RA.

We still can shrink it before RA if the sdst is dead and carry-in
is an immediate.

There are some other instructions which will not be shrunk after
RA however, because now more instructions will clobber VCC, and
if the carry is used after such clobber then RA has no choice but
to allocate a non-VCC SGPR. The net effect seems to be positive
though, in the affected tests we had 1231 _e64 and 1162 _e32
instructions. Now it is 1010 _e64 instructions and 1359 _e32
according to the diff (it does not add-up exactly because some
checks are collapsed for different targets). This is ~8% improvement
in shrinking. Also note that regression cases are the old targets
without no-carry add/sub instructions, and many of the tests
target GFX6.
---
 .../Target/AMDGPU/SIShrinkInstructions.cpp    |  38 +-
 .../AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll     |  10 +-
 .../test/CodeGen/AMDGPU/GlobalISel/saddsat.ll | 514 ++++++-------
 .../CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll     | 160 ++--
 .../CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll     | 544 ++++++-------
 .../test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll | 336 ++++----
 .../CodeGen/AMDGPU/GlobalISel/srem.i32.ll     |  16 +-
 .../CodeGen/AMDGPU/GlobalISel/srem.i64.ll     | 440 +++++------
 .../test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll | 514 ++++++-------
 .../CodeGen/AMDGPU/GlobalISel/udiv.i32.ll     | 152 ++--
 .../CodeGen/AMDGPU/GlobalISel/udiv.i64.ll     | 458 +++++------
 .../test/CodeGen/AMDGPU/GlobalISel/udivrem.ll | 358 ++++-----
 .../CodeGen/AMDGPU/GlobalISel/urem.i64.ll     | 441 ++++++-----
 .../CodeGen/AMDGPU/ds-combine-large-stride.ll |   2 +-
 llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll    | 198 ++---
 llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll         | 724 +++++++++---------
 .../CodeGen/AMDGPU/llvm.is.fpclass.f16.ll     |   8 +-
 llvm/test/CodeGen/AMDGPU/med3-knownbits.ll    |   8 +-
 llvm/test/CodeGen/AMDGPU/shrink-dead-sdst.mir |  12 +
 19 files changed, 2453 insertions(+), 2480 deletions(-)
 create mode 100644 llvm/test/CodeGen/AMDGPU/shrink-dead-sdst.mir

diff --git a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
index 4159dc694c1e037..bba533a4ca0599a 100644
--- a/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
+++ b/llvm/lib/Target/AMDGPU/SIShrinkInstructions.cpp
@@ -961,26 +961,34 @@ bool SIShrinkInstructions::runOnMachineFunction(MachineFunction &MF) {
                                                         AMDGPU::OpName::sdst);
 
       if (SDst) {
-        bool Next = false;
-
-        if (SDst->getReg() != VCCReg) {
-          if (SDst->getReg().isVirtual())
-            MRI->setRegAllocationHint(SDst->getReg(), 0, VCCReg);
-          Next = true;
-        }
-
         // All of the instructions with carry outs also have an SGPR input in
         // src2.
         const MachineOperand *Src2 = TII->getNamedOperand(MI,
                                                           AMDGPU::OpName::src2);
-        if (Src2 && Src2->getReg() != VCCReg) {
-          if (Src2->getReg().isVirtual())
-            MRI->setRegAllocationHint(Src2->getReg(), 0, VCCReg);
-          Next = true;
-        }
 
-        if (Next)
-          continue;
+        // We can shrink the instruction right now if sdst is dead anyway and
+        // carry-in is not a register. If it is a register then VOP2 form shall
+        // have it set to the same vcc register and we may end up reading an
+        // undefined vcc.
+        if (!SDst->isDead() || SDst->getReg().isPhysical() ||
+            (Src2 && Src2->isReg())) {
+          bool Next = false;
+
+          if (SDst->getReg() != VCCReg) {
+            if (SDst->getReg().isVirtual())
+              MRI->setRegAllocationHint(SDst->getReg(), 0, VCCReg);
+            Next = true;
+          }
+
+          if (Src2 && Src2->getReg() != VCCReg) {
+            if (Src2->getReg().isVirtual())
+              MRI->setRegAllocationHint(Src2->getReg(), 0, VCCReg);
+            Next = true;
+          }
+
+          if (Next)
+            continue;
+        }
       }
 
       // Pre-GFX10, shrinking VOP3 instructions pre-RA gave us the chance to
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll
index 6eed92ba1d71ccc..e0754f62208a023 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/llvm.amdgcn.sbfe.ll
@@ -688,12 +688,12 @@ define amdgpu_kernel void @simplify_demanded_bfe_sdiv(ptr addrspace(1) %out, ptr
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v1
 ; GFX6-NEXT:    v_mul_hi_u32 v0, s0, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 1, v0
-; GFX6-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; GFX6-NEXT:    v_cmp_le_u32_e32 vcc, 2, v1
-; GFX6-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; GFX6-NEXT:    v_subrev_i32_e64 v2, s[0:1], 2, v1
-; GFX6-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GFX6-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
+; GFX6-NEXT:    v_cmp_le_u32_e64 s[0:1], 2, v1
+; GFX6-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[0:1]
+; GFX6-NEXT:    v_subrev_i32_e32 v2, vcc, 2, v1
+; GFX6-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GFX6-NEXT:    v_cmp_le_u32_e32 vcc, 2, v1
 ; GFX6-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
index cded5c94edf8cc3..d28ef2ead4bb057 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/saddsat.ll
@@ -506,47 +506,44 @@ define i32 @v_saddsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v3, 16, v0
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v4, 24, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 24, v0
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_min_i32_e32 v10, 0, v0
+; GFX6-NEXT:    v_min_i32_e32 v9, 0, v0
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v5, 8, v1
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v6, 16, v1
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v7, 24, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
 ; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v8, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, s5, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x80000000, v9
 ; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, s4, v8
-; GFX6-NEXT:    v_max_i32_e32 v1, v10, v1
+; GFX6-NEXT:    v_max_i32_e32 v1, v9, v1
 ; GFX6-NEXT:    v_min_i32_e32 v1, v1, v8
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 24, v2
 ; GFX6-NEXT:    v_min_i32_e32 v8, 0, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 24, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, s5, v8
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v2, v8, v2
 ; GFX6-NEXT:    v_min_i32_e32 v2, v2, v5
 ; GFX6-NEXT:    v_add_i32_e32 v1, vcc, v1, v2
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 24, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 24, v6
 ; GFX6-NEXT:    v_min_i32_e32 v6, 0, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v9, -2
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v2
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, s5, v6
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v9, v5
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0x80000000, v6
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v3, v6, v3
 ; GFX6-NEXT:    v_min_i32_e32 v3, v3, v5
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, v2, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 24, v4
-; GFX6-NEXT:    v_bfrev_b32_e32 v11, 1
 ; GFX6-NEXT:    v_min_i32_e32 v6, 0, v3
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 24, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 24, v7
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v11, v6
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0x80000000, v6
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 24, v0
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v9, v5
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v4, v6, v4
 ; GFX6-NEXT:    v_and_b32_e32 v1, 0xff, v1
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v2, 24, v2
@@ -1395,7 +1392,7 @@ define <3 x i32> @v_saddsat_v3i32(<3 x i32> %lhs, <3 x i32> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v3
 ; GFX6-NEXT:    v_max_i32_e32 v3, 0, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, s5, v6
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, s4, v3
+; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, 0x7fffffff, v3
 ; GFX6-NEXT:    v_max_i32_e32 v4, v6, v4
 ; GFX6-NEXT:    v_min_i32_e32 v3, v4, v3
 ; GFX6-NEXT:    v_min_i32_e32 v4, 0, v2
@@ -1423,7 +1420,7 @@ define <3 x i32> @v_saddsat_v3i32(<3 x i32> %lhs, <3 x i32> %rhs) {
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, v0, v3
 ; GFX8-NEXT:    v_max_i32_e32 v3, 0, v1
 ; GFX8-NEXT:    v_sub_u32_e32 v6, vcc, s5, v6
-; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
+; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, 0x7fffffff, v3
 ; GFX8-NEXT:    v_max_i32_e32 v4, v6, v4
 ; GFX8-NEXT:    v_min_i32_e32 v3, v4, v3
 ; GFX8-NEXT:    v_min_i32_e32 v4, 0, v2
@@ -1736,7 +1733,7 @@ define <5 x i32> @v_saddsat_v5i32(<5 x i32> %lhs, <5 x i32> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, s5, v10
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v6, v10, v6
 ; GFX6-NEXT:    v_min_i32_e32 v5, v6, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, 0, v2
@@ -1779,7 +1776,7 @@ define <5 x i32> @v_saddsat_v5i32(<5 x i32> %lhs, <5 x i32> %rhs) {
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, v0, v5
 ; GFX8-NEXT:    v_max_i32_e32 v5, 0, v1
 ; GFX8-NEXT:    v_sub_u32_e32 v10, vcc, s5, v10
-; GFX8-NEXT:    v_sub_u32_e32 v5, vcc, s4, v5
+; GFX8-NEXT:    v_sub_u32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX8-NEXT:    v_max_i32_e32 v6, v10, v6
 ; GFX8-NEXT:    v_min_i32_e32 v5, v6, v5
 ; GFX8-NEXT:    v_min_i32_e32 v6, 0, v2
@@ -1949,246 +1946,238 @@ define <16 x i32> @v_saddsat_v16i32(<16 x i32> %lhs, <16 x i32> %rhs) {
 ; GFX6-LABEL: v_saddsat_v16i32:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:    s_brev_b32 s4, 1
 ; GFX6-NEXT:    v_min_i32_e32 v31, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v31, vcc, s4, v31
+; GFX6-NEXT:    v_sub_i32_e32 v31, vcc, 0x80000000, v31
 ; GFX6-NEXT:    v_max_i32_e32 v16, v31, v16
-; GFX6-NEXT:    s_brev_b32 s5, -2
 ; GFX6-NEXT:    v_max_i32_e32 v31, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v31, vcc, s5, v31
+; GFX6-NEXT:    v_sub_i32_e32 v31, vcc, 0x7fffffff, v31
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v31
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v16
 ; GFX6-NEXT:    v_min_i32_e32 v16, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, s4, v16
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_max_i32_e32 v17, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, s5, v17
+; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_add_i32_e32 v1, vcc, v1, v16
 ; GFX6-NEXT:    v_min_i32_e32 v16, 0, v2
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, s4, v16
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
 ; GFX6-NEXT:    v_max_i32_e32 v17, 0, v2
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v18
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, s5, v17
+; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, v2, v16
-; GFX6-NEXT:    v_bfrev_b32_e32 v16, 1
-; GFX6-NEXT:    v_min_i32_e32 v17, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v16, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_bfrev_b32_e32 v18, -2
-; GFX6-NEXT:    v_max_i32_e32 v19, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v18, v19
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_add_i32_e32 v3, vcc, v3, v17
-; GFX6-NEXT:    v_min_i32_e32 v17, 0, v4
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v16, v17
-; GFX6-NEXT:    v_max_i32_e32 v19, 0, v4
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v18, v19
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_add_i32_e32 v4, vcc, v4, v17
-; GFX6-NEXT:    v_min_i32_e32 v17, 0, v5
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v16, v17
-; GFX6-NEXT:    v_max_i32_e32 v19, 0, v5
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v21
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v18, v19
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_add_i32_e32 v5, vcc, v5, v17
-; GFX6-NEXT:    v_min_i32_e32 v17, 0, v6
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v16, v17
-; GFX6-NEXT:    v_max_i32_e32 v19, 0, v6
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v22
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v18, v19
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    buffer_load_dword v19, off, s[0:3], s32
-; GFX6-NEXT:    v_add_i32_e32 v6, vcc, v6, v17
-; GFX6-NEXT:    v_min_i32_e32 v17, 0, v7
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v16, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, 0, v7
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v23
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v18, v20
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v8
-; GFX6-NEXT:    v_add_i32_e32 v7, vcc, v7, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v8
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v24
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v9
-; GFX6-NEXT:    v_add_i32_e32 v8, vcc, v8, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v9
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v25
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v10
-; GFX6-NEXT:    v_add_i32_e32 v9, vcc, v9, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v10
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v26
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v11
-; GFX6-NEXT:    v_add_i32_e32 v10, vcc, v10, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v11
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v27
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v12
-; GFX6-NEXT:    v_add_i32_e32 v11, vcc, v11, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v12
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v28
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v13
-; GFX6-NEXT:    v_add_i32_e32 v12, vcc, v12, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v13
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v29
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_min_i32_e32 v20, 0, v14
-; GFX6-NEXT:    v_add_i32_e32 v13, vcc, v13, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v14
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v16, v20
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_max_i32_e32 v20, v20, v30
-; GFX6-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX6-NEXT:    v_add_i32_e32 v14, vcc, v14, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, 0, v15
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v18, v17
-; GFX6-NEXT:    v_min_i32_e32 v18, 0, v15
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, v16, v18
-; GFX6-NEXT:    s_waitcnt vmcnt(0)
+; GFX6-NEXT:    v_min_i32_e32 v16, 0, v3
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
+; GFX6-NEXT:    v_max_i32_e32 v17, 0, v3
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v19
+; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, 0x7fffffff, v17
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX6-NEXT:    v_add_i32_e32 v3, vcc, v3, v16
+; GFX6-NEXT:    v_min_i32_e32 v16, 0, v4
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
+; GFX6-NEXT:    v_max_i32_e32 v17, 0, v4
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v20
+; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX6-NEXT:    buffer_load_dword v17, off, s[0:3], s32
+; GFX6-NEXT:    v_add_i32_e32 v4, vcc, v4, v16
+; GFX6-NEXT:    v_min_i32_e32 v16, 0, v5
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, 0, v5
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v21
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x7fffffff, v18
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v6
+; GFX6-NEXT:    v_add_i32_e32 v5, vcc, v5, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v6
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v22
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v7
+; GFX6-NEXT:    v_add_i32_e32 v6, vcc, v6, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v7
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v23
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v8
+; GFX6-NEXT:    v_add_i32_e32 v7, vcc, v7, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v8
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v24
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v9
+; GFX6-NEXT:    v_add_i32_e32 v8, vcc, v8, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v9
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v25
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v10
+; GFX6-NEXT:    v_add_i32_e32 v9, vcc, v9, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v10
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v26
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v11
+; GFX6-NEXT:    v_add_i32_e32 v10, vcc, v10, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v11
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v27
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v12
+; GFX6-NEXT:    v_add_i32_e32 v11, vcc, v11, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v12
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v28
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v13
+; GFX6-NEXT:    v_add_i32_e32 v12, vcc, v12, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v13
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v29
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v14
+; GFX6-NEXT:    v_add_i32_e32 v13, vcc, v13, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v14
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v18, v18, v30
+; GFX6-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, 0, v15
+; GFX6-NEXT:    v_add_i32_e32 v14, vcc, v14, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, 0, v15
+; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    s_waitcnt vmcnt(0)
+; GFX6-NEXT:    v_max_i32_e32 v17, v18, v17
+; GFX6-NEXT:    v_min_i32_e32 v16, v17, v16
 ; GFX6-NEXT:    v_add_i32_e32 v15, vcc, v15, v16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_saddsat_v16i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    s_brev_b32 s4, 1
 ; GFX8-NEXT:    v_min_i32_e32 v31, 0, v0
-; GFX8-NEXT:    v_sub_u32_e32 v31, vcc, s4, v31
+; GFX8-NEXT:    v_sub_u32_e32 v31, vcc, 0x80000000, v31
 ; GFX8-NEXT:    v_max_i32_e32 v16, v31, v16
-; GFX8-NEXT:    s_brev_b32 s5, -2
 ; GFX8-NEXT:    v_max_i32_e32 v31, 0, v0
-; GFX8-NEXT:    v_sub_u32_e32 v31, vcc, s5, v31
+; GFX8-NEXT:    v_sub_u32_e32 v31, vcc, 0x7fffffff, v31
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v31
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, v0, v16
 ; GFX8-NEXT:    v_min_i32_e32 v16, 0, v1
-; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, s4, v16
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x80000000, v16
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_max_i32_e32 v17, 0, v1
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, s5, v17
+; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v1, v16
 ; GFX8-NEXT:    v_min_i32_e32 v16, 0, v2
-; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, s4, v16
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x80000000, v16
 ; GFX8-NEXT:    v_max_i32_e32 v17, 0, v2
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v18
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, s5, v17
+; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_add_u32_e32 v2, vcc, v2, v16
-; GFX8-NEXT:    v_bfrev_b32_e32 v16, 1
-; GFX8-NEXT:    v_min_i32_e32 v17, 0, v3
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v16, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_bfrev_b32_e32 v18, -2
-; GFX8-NEXT:    v_max_i32_e32 v19, 0, v3
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v18, v19
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v17
-; GFX8-NEXT:    v_min_i32_e32 v17, 0, v4
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v16, v17
-; GFX8-NEXT:    v_max_i32_e32 v19, 0, v4
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v18, v19
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_add_u32_e32 v4, vcc, v4, v17
-; GFX8-NEXT:    v_min_i32_e32 v17, 0, v5
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v16, v17
-; GFX8-NEXT:    v_max_i32_e32 v19, 0, v5
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v21
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v18, v19
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_add_u32_e32 v5, vcc, v5, v17
-; GFX8-NEXT:    v_min_i32_e32 v17, 0, v6
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v16, v17
-; GFX8-NEXT:    v_max_i32_e32 v19, 0, v6
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v22
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v18, v19
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    buffer_load_dword v19, off, s[0:3], s32
-; GFX8-NEXT:    v_add_u32_e32 v6, vcc, v6, v17
-; GFX8-NEXT:    v_min_i32_e32 v17, 0, v7
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v16, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, 0, v7
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v23
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v18, v20
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v8
-; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v7, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v8
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v24
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v9
-; GFX8-NEXT:    v_add_u32_e32 v8, vcc, v8, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v9
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v25
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v10
-; GFX8-NEXT:    v_add_u32_e32 v9, vcc, v9, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v10
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v26
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v11
-; GFX8-NEXT:    v_add_u32_e32 v10, vcc, v10, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v11
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v27
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v12
-; GFX8-NEXT:    v_add_u32_e32 v11, vcc, v11, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v12
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v28
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v13
-; GFX8-NEXT:    v_add_u32_e32 v12, vcc, v12, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v13
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v29
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_min_i32_e32 v20, 0, v14
-; GFX8-NEXT:    v_add_u32_e32 v13, vcc, v13, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v14
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v16, v20
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_max_i32_e32 v20, v20, v30
-; GFX8-NEXT:    v_min_i32_e32 v17, v20, v17
-; GFX8-NEXT:    v_add_u32_e32 v14, vcc, v14, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, 0, v15
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v18, v17
-; GFX8-NEXT:    v_min_i32_e32 v18, 0, v15
-; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, v16, v18
-; GFX8-NEXT:    s_waitcnt vmcnt(0)
+; GFX8-NEXT:    v_min_i32_e32 v16, 0, v3
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x80000000, v16
+; GFX8-NEXT:    v_max_i32_e32 v17, 0, v3
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v19
+; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, 0x7fffffff, v17
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v16
+; GFX8-NEXT:    v_min_i32_e32 v16, 0, v4
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x80000000, v16
+; GFX8-NEXT:    v_max_i32_e32 v17, 0, v4
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v20
+; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, 0x7fffffff, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX8-NEXT:    buffer_load_dword v17, off, s[0:3], s32
+; GFX8-NEXT:    v_add_u32_e32 v4, vcc, v4, v16
+; GFX8-NEXT:    v_min_i32_e32 v16, 0, v5
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x80000000, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, 0, v5
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v21
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x7fffffff, v18
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v6
+; GFX8-NEXT:    v_add_u32_e32 v5, vcc, v5, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v6
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v22
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v7
+; GFX8-NEXT:    v_add_u32_e32 v6, vcc, v6, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v7
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v23
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v8
+; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v7, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v8
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v24
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v9
+; GFX8-NEXT:    v_add_u32_e32 v8, vcc, v8, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v9
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v25
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v10
+; GFX8-NEXT:    v_add_u32_e32 v9, vcc, v9, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v10
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v26
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v11
+; GFX8-NEXT:    v_add_u32_e32 v10, vcc, v10, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v11
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v27
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v12
+; GFX8-NEXT:    v_add_u32_e32 v11, vcc, v11, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v12
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v28
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v13
+; GFX8-NEXT:    v_add_u32_e32 v12, vcc, v12, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v13
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v29
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v14
+; GFX8-NEXT:    v_add_u32_e32 v13, vcc, v13, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v14
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_max_i32_e32 v18, v18, v30
+; GFX8-NEXT:    v_min_i32_e32 v16, v18, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, 0, v15
+; GFX8-NEXT:    v_add_u32_e32 v14, vcc, v14, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, 0, v15
+; GFX8-NEXT:    v_sub_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
+; GFX8-NEXT:    v_max_i32_e32 v17, v18, v17
+; GFX8-NEXT:    v_min_i32_e32 v16, v17, v16
 ; GFX8-NEXT:    v_add_u32_e32 v15, vcc, v15, v16
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -3056,42 +3045,39 @@ define <2 x float> @v_saddsat_v4i16(<4 x i16> %lhs, <4 x i16> %rhs) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_min_i32_e32 v10, 0, v0
+; GFX6-NEXT:    v_min_i32_e32 v9, 0, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v8, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, s5, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x80000000, v9
 ; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, s4, v8
-; GFX6-NEXT:    v_max_i32_e32 v4, v10, v4
+; GFX6-NEXT:    v_max_i32_e32 v4, v9, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v8
 ; GFX6-NEXT:    v_min_i32_e32 v8, 0, v1
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, s5, v8
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v4, v8, v4
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v5
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
 ; GFX6-NEXT:    v_add_i32_e32 v1, vcc, v1, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v6
 ; GFX6-NEXT:    v_min_i32_e32 v6, 0, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v9, -2
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v2
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, s5, v6
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v9, v5
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0x80000000, v6
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v4, v6, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v11, 1
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, 0, v3
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, v2, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v7
 ; GFX6-NEXT:    v_max_i32_e32 v5, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v11, v6
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v9, v5
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, 0x80000000, v6
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_max_i32_e32 v4, v6, v4
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v5
@@ -3320,42 +3306,38 @@ define <3 x float> @v_saddsat_v6i16(<6 x i16> %lhs, <6 x i16> %rhs) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_min_i32_e32 v14, 0, v0
+; GFX6-NEXT:    v_min_i32_e32 v13, 0, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v6
-; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v12, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v14, vcc, s5, v14
-; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, s4, v12
-; GFX6-NEXT:    v_max_i32_e32 v6, v14, v6
+; GFX6-NEXT:    v_sub_i32_e32 v13, vcc, 0x80000000, v13
+; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, 0x7fffffff, v12
+; GFX6-NEXT:    v_max_i32_e32 v6, v13, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v12
 ; GFX6-NEXT:    v_min_i32_e32 v12, 0, v1
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v7
 ; GFX6-NEXT:    v_max_i32_e32 v7, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, s5, v12
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, s4, v7
+; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, 0x80000000, v12
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_max_i32_e32 v6, v12, v6
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v7
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
 ; GFX6-NEXT:    v_add_i32_e32 v1, vcc, v1, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v8
 ; GFX6-NEXT:    v_min_i32_e32 v8, 0, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v13, -2
 ; GFX6-NEXT:    v_max_i32_e32 v7, 0, v2
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, s5, v8
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v13, v7
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v15, 1
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, 0, v3
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, v2, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v9
 ; GFX6-NEXT:    v_max_i32_e32 v7, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v15, v8
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v13, v7
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v7
@@ -3363,8 +3345,8 @@ define <3 x float> @v_saddsat_v6i16(<6 x i16> %lhs, <6 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v3, vcc, v3, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v10
 ; GFX6-NEXT:    v_max_i32_e32 v7, 0, v4
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v15, v8
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v13, v7
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v7
@@ -3372,9 +3354,9 @@ define <3 x float> @v_saddsat_v6i16(<6 x i16> %lhs, <6 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v4, vcc, v4, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v11
 ; GFX6-NEXT:    v_max_i32_e32 v7, 0, v5
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v15, v8
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 16, v1
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v13, v7
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_max_i32_e32 v6, v8, v6
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 16, v0
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v7
@@ -3674,42 +3656,38 @@ define <4 x float> @v_saddsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_min_i32_e32 v18, 0, v0
+; GFX6-NEXT:    v_min_i32_e32 v17, 0, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v8
-; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v16, 0, v0
-; GFX6-NEXT:    v_sub_i32_e32 v18, vcc, s5, v18
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, s4, v16
-; GFX6-NEXT:    v_max_i32_e32 v8, v18, v8
+; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, 0x80000000, v17
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_max_i32_e32 v8, v17, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v16
 ; GFX6-NEXT:    v_min_i32_e32 v16, 0, v1
 ; GFX6-NEXT:    v_add_i32_e32 v0, vcc, v0, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v9
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v1
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, s5, v16
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, s4, v9
+; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, 0x80000000, v16
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v16, v8
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
 ; GFX6-NEXT:    v_add_i32_e32 v1, vcc, v1, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v10
 ; GFX6-NEXT:    v_min_i32_e32 v10, 0, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v17, -2
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v2
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, s5, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v19, 1
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, 0, v3
 ; GFX6-NEXT:    v_add_i32_e32 v2, vcc, v2, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v11
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v3
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v19, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
@@ -3717,8 +3695,8 @@ define <4 x float> @v_saddsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v3, vcc, v3, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v12
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v4
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v19, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
@@ -3726,8 +3704,8 @@ define <4 x float> @v_saddsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v4, vcc, v4, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v13
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v5
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v19, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v6
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
@@ -3735,8 +3713,8 @@ define <4 x float> @v_saddsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v5, vcc, v5, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v14
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v6
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v19, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v7, 16, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v9
@@ -3745,9 +3723,9 @@ define <4 x float> @v_saddsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6-NEXT:    v_add_i32_e32 v6, vcc, v6, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v15
 ; GFX6-NEXT:    v_max_i32_e32 v9, 0, v7
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v19, v10
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 16, v0
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v17, v9
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_max_i32_e32 v8, v10, v8
 ; GFX6-NEXT:    v_and_b32_e32 v1, 0xffff, v1
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v2, 16, v2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll
index ab000d91a3ef23d..934ca55075f0207 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i32.ll
@@ -26,10 +26,10 @@ define i32 @v_sdiv_i32(i32 %num, i32 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v5, v4, v1
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v5, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v5, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v5, vcc, 1, v4
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v5, vcc
@@ -60,10 +60,10 @@ define i32 @v_sdiv_i32(i32 %num, i32 %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -98,10 +98,10 @@ define amdgpu_ps i32 @s_sdiv_i32(i32 inreg %num, i32 inreg %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v1, v0, s4
 ; GISEL-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; GISEL-NEXT:    v_subrev_i32_e64 v2, s[0:1], s4, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-NEXT:    v_cmp_le_u32_e64 s[0:1], s4, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[0:1]
+; GISEL-NEXT:    v_subrev_i32_e32 v2, vcc, s4, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; GISEL-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -132,10 +132,10 @@ define amdgpu_ps i32 @s_sdiv_i32(i32 inreg %num, i32 inreg %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, s2
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s2, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CGP-NEXT:    v_subrev_i32_e64 v2, s[0:1], s2, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CGP-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[0:1]
+; CGP-NEXT:    v_subrev_i32_e32 v2, vcc, s2, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s2, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -190,15 +190,15 @@ define <2 x i32> @v_sdiv_v2i32(<2 x i32> %num, <2 x i32> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, 1, v5
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v10
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -251,15 +251,15 @@ define <2 x i32> @v_sdiv_v2i32(<2 x i32> %num, <2 x i32> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v11, vcc, 1, v5
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v10
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -342,7 +342,7 @@ define <2 x i32> @v_sdiv_v2i32_pow2k_denom(<2 x i32> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v9
 ; GISEL-NEXT:    v_cmp_le_u32_e64 s[4:5], s8, v0
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v8, s[4:5]
-; GISEL-NEXT:    v_subrev_i32_e32 v7, vcc, s8, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v7, vcc, 0x1000, v0
 ; GISEL-NEXT:    v_cmp_le_u32_e64 s[6:7], s8, v1
 ; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v10, s[6:7]
 ; GISEL-NEXT:    v_subrev_i32_e32 v8, vcc, s8, v1
@@ -395,7 +395,7 @@ define <2 x i32> @v_sdiv_v2i32_pow2k_denom(<2 x i32> %num) {
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v9
 ; CGP-NEXT:    v_cmp_le_u32_e64 s[4:5], s8, v0
 ; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[4:5]
-; CGP-NEXT:    v_subrev_i32_e32 v7, vcc, s8, v0
+; CGP-NEXT:    v_subrev_i32_e32 v7, vcc, 0x1000, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v5
 ; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v10, s[6:7]
 ; CGP-NEXT:    v_subrev_i32_e32 v8, vcc, 0x1000, v1
@@ -483,7 +483,7 @@ define <2 x i32> @v_sdiv_v2i32_oddk_denom(<2 x i32> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
 ; GISEL-NEXT:    v_cmp_le_u32_e64 s[4:5], s8, v0
 ; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
-; GISEL-NEXT:    v_subrev_i32_e32 v6, vcc, s8, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v6, vcc, 0x12d8fb, v0
 ; GISEL-NEXT:    v_cmp_le_u32_e64 s[6:7], s8, v1
 ; GISEL-NEXT:    v_cndmask_b32_e64 v3, v3, v9, s[6:7]
 ; GISEL-NEXT:    v_subrev_i32_e32 v7, vcc, s8, v1
@@ -536,7 +536,7 @@ define <2 x i32> @v_sdiv_v2i32_oddk_denom(<2 x i32> %num) {
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v9
 ; CGP-NEXT:    v_cmp_le_u32_e64 s[4:5], s8, v0
 ; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[4:5]
-; CGP-NEXT:    v_subrev_i32_e32 v7, vcc, s8, v0
+; CGP-NEXT:    v_subrev_i32_e32 v7, vcc, 0x12d8fb, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v5
 ; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v10, s[6:7]
 ; CGP-NEXT:    v_subrev_i32_e32 v8, vcc, 0x12d8fb, v1
@@ -580,10 +580,10 @@ define i32 @v_sdiv_i32_pow2_shl_denom(i32 %x, i32 %y) {
 ; CHECK-NEXT:    v_mul_lo_u32 v5, v4, v1
 ; CHECK-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
 ; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v4, v4, v6, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v5, s[4:5], v0, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v5, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v4, v4, v6, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v5, vcc, v0, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v5, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v5, vcc, 1, v4
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v4, v5, vcc
@@ -640,15 +640,15 @@ define <2 x i32> @v_sdiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) {
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, 1, v7
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v8
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v10
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v6, v6, v9, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v8, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, v7, v11, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v9, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v8, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, v6, v9, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, v7, v11, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v9, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v8, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v8, vcc, 1, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v9, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v9, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v9, vcc, 1, v7
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v6, v8, vcc
@@ -703,15 +703,15 @@ define <2 x i32> @v_sdiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v11, vcc, 1, v6
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v7
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v10
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v5, v5, v9, vcc
-; CGP-NEXT:    v_sub_i32_e64 v7, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v6, v6, v11, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v9, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v7, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v6, v6, v11, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v9, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v7, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v9, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v9, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v6
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v5, v7, vcc
@@ -751,10 +751,10 @@ define i32 @v_sdiv_i32_24bit(i32 %num, i32 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v5, v4, v1
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v5, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v5, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v5, vcc, 1, v4
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v5, vcc
@@ -780,10 +780,10 @@ define i32 @v_sdiv_i32_24bit(i32 %num, i32 %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -840,15 +840,15 @@ define <2 x i32> @v_sdiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, 1, v5
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v10
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v11, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -891,15 +891,15 @@ define <2 x i32> @v_sdiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
index 4248f7b6a158312..ef7648c0c76024d 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll
@@ -123,19 +123,19 @@ define i64 @v_sdiv_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CHECK-NEXT:    v_add_i32_e32 v6, vcc, v9, v5
 ; CHECK-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v2, v6, v[4:5]
-; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v8, v3
 ; CHECK-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v1, v7, v[4:5]
-; CHECK-NEXT:    v_subb_u32_e64 v5, s[4:5], v11, v4, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v4, s[4:5], v11, v4
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v1
-; CHECK-NEXT:    v_subb_u32_e32 v4, vcc, v4, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v2
+; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v8, v3
+; CHECK-NEXT:    v_subb_u32_e64 v5, vcc, v11, v4, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, v11, v4
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v5, v1
+; CHECK-NEXT:    v_cndmask_b32_e32 v5, v8, v9, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v4, vcc, v4, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v3, v2
-; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v1
 ; CHECK-NEXT:    v_subbrev_u32_e32 v4, vcc, 0, v4, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, v8, v9, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v8, vcc, 1, v7
 ; CHECK-NEXT:    v_addc_u32_e32 v9, vcc, 0, v6, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
@@ -174,10 +174,10 @@ define i64 @v_sdiv_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, v2
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v4, v1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v1, v2
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v3, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v1, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v3, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -300,22 +300,22 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, v5, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s10, v2, v[1:2]
 ; CHECK-NEXT:    v_mov_b32_e32 v5, s13
-; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, s12, v0
-; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s11, v4, v[1:2]
 ; CHECK-NEXT:    v_mov_b32_e32 v3, s11
-; CHECK-NEXT:    v_subb_u32_e64 v2, s[0:1], v5, v1, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v1, s[0:1], s13, v1
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v2
-; CHECK-NEXT:    v_subb_u32_e32 v1, vcc, v1, v3, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[0:1]
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v0
+; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s11, v4, v[1:2]
+; CHECK-NEXT:    v_sub_i32_e64 v0, s[0:1], s12, v0
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v5, v1, s[0:1]
+; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, s13, v1
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s11, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s10, v0
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, s11, v2
+; CHECK-NEXT:    v_cndmask_b32_e32 v2, v5, v6, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v1, vcc, v1, v3, s[0:1]
 ; CHECK-NEXT:    v_subrev_i32_e32 v0, vcc, s10, v0
 ; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v4
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[0:1]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[0:1], s11, v2
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s11, v1
-; CHECK-NEXT:    v_cndmask_b32_e64 v2, v5, v6, s[0:1]
 ; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s10, v0
 ; CHECK-NEXT:    v_cndmask_b32_e64 v0, 0, -1, vcc
@@ -351,10 +351,10 @@ define amdgpu_ps i64 @s_sdiv_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, s4
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, s2, v1
-; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CHECK-NEXT:    v_subrev_i32_e64 v2, s[0:1], s4, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s4, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[0:1]
+; CHECK-NEXT:    v_subrev_i32_e32 v2, vcc, s4, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -477,18 +477,18 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, v12, v11
 ; GISEL-NEXT:    v_add_i32_e32 v14, vcc, v14, v11
 ; GISEL-NEXT:    v_mad_u64_u32 v[11:12], s[4:5], v10, v14, v[1:2]
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v5, v0
 ; GISEL-NEXT:    v_mad_u64_u32 v[11:12], s[4:5], v4, v13, v[11:12]
-; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v15, v11, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v15, v11
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v10
-; GISEL-NEXT:    v_cndmask_b32_e64 v12, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v1, v4
-; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v5, v4, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v5, v0
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v15, v11, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v15, v11
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v10
+; GISEL-NEXT:    v_cndmask_b32_e64 v12, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v1, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v12, v11, v12, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v5, v4, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v10
-; GISEL-NEXT:    v_cndmask_b32_e64 v12, v11, v12, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e32 v11, vcc, 0, v1, vcc
 ; GISEL-NEXT:    v_ashrrev_i32_e32 v5, 31, v7
 ; GISEL-NEXT:    v_add_i32_e32 v1, vcc, v6, v5
@@ -609,19 +609,19 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_xor_b32_e32 v8, v12, v13
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v13
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v6, v10, v[3:4]
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v9, v2
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v8, v13, vcc
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v9, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v15, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v15, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v6
-; GISEL-NEXT:    v_subb_u32_e32 v3, vcc, v3, v6, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v7
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v15, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v15, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v6
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v6
+; GISEL-NEXT:    v_cndmask_b32_e32 v4, v8, v9, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v6, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v2, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v6
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, v8, v9, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v8, vcc, 1, v10
 ; GISEL-NEXT:    v_addc_u32_e32 v9, vcc, 0, v11, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
@@ -757,19 +757,19 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v12, v5
 ; CGP-NEXT:    v_add_i32_e32 v12, vcc, v15, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v2, v12, v[4:5]
-; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v11, v3
 ; CGP-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v1, v14, v[4:5]
-; CGP-NEXT:    v_subb_u32_e64 v5, s[4:5], v10, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v10, v4
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v1
-; CGP-NEXT:    v_subb_u32_e32 v4, vcc, v4, v1, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v2
+; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v11, v3
+; CGP-NEXT:    v_subb_u32_e64 v5, vcc, v10, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v10, v4
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v5, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v5, v10, v11, vcc
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v4, v1, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v3, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v1
 ; CGP-NEXT:    v_subbrev_u32_e32 v4, vcc, 0, v4, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v10, v11, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, 1, v14
 ; CGP-NEXT:    v_addc_u32_e32 v11, vcc, 0, v12, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
@@ -809,10 +809,10 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, v4
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v10, v1
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v1, v4
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v1, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -930,19 +930,19 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, v10, v7
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, v13, v7
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v4, v10, v[6:7]
-; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v9, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v3, v12, v[6:7]
-; CGP-NEXT:    v_subb_u32_e64 v7, s[4:5], v8, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v8, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v3
-; CGP-NEXT:    v_subb_u32_e32 v6, vcc, v6, v3, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v4
+; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v9, v5
+; CGP-NEXT:    v_subb_u32_e64 v7, vcc, v8, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v8, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v7, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v7, v8, v9, vcc
+; CGP-NEXT:    v_subb_u32_e64 v6, vcc, v6, v3, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v5, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v7, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v6, vcc, 0, v6, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v7, v8, v9, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v8, vcc, 1, v12
 ; CGP-NEXT:    v_addc_u32_e32 v9, vcc, 0, v10, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v3
@@ -981,10 +981,10 @@ define <2 x i64> @v_sdiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v6
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v8, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v3, v6
-; CGP-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v3, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
 ; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
@@ -1092,17 +1092,17 @@ define i64 @v_sdiv_i64_pow2k_denom(i64 %num) {
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, v5, v2
 ; CHECK-NEXT:    v_add_i32_e32 v5, vcc, v9, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[4:5], s6, v5, v[1:2]
-; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v3, v0
-; CHECK-NEXT:    v_subb_u32_e64 v2, s[4:5], v4, v1, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v1, s[4:5], v4, v1
+; CHECK-NEXT:    v_sub_i32_e64 v0, s[4:5], v3, v0
 ; CHECK-NEXT:    v_mov_b32_e32 v6, 0x1000
-; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v6
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v4, v1, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v4, v1
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v6
+; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v2
+; CHECK-NEXT:    v_cndmask_b32_e32 v2, -1, v3, vcc
+; CHECK-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
-; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v2
 ; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v2, -1, v3, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v8
 ; CHECK-NEXT:    v_addc_u32_e32 v4, vcc, 0, v5, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v6
@@ -1224,16 +1224,17 @@ define <2 x i64> @v_sdiv_v2i64_pow2k_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v8, v7
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, v11, v7
 ; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], v5, v11, v[1:2]
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v9, v0
-; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], 0, v10, v[7:8]
 ; GISEL-NEXT:    s_sub_u32 s6, 0, 0x1000
 ; GISEL-NEXT:    s_subb_u32 s7, 0, 0
-; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v12, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[4:5], v12, v7
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; GISEL-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v7, vcc
+; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], 0, v10, v[7:8]
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v9, v0
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v12, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v12, v7
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GISEL-NEXT:    v_cndmask_b32_e32 v8, -1, v8, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v7, s[4:5]
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v7, 0x1000
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
 ; GISEL-NEXT:    v_subbrev_u32_e32 v9, vcc, 0, v1, vcc
@@ -1246,11 +1247,10 @@ define <2 x i64> @v_sdiv_v2i64_pow2k_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_trunc_f32_e32 v6, v6
 ; GISEL-NEXT:    v_mac_f32_e32 v1, 0xcf800000, v6
 ; GISEL-NEXT:    v_cvt_u32_f32_e32 v14, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, -1, v8, s[4:5]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v5
 ; GISEL-NEXT:    v_cvt_u32_f32_e32 v15, v6
-; GISEL-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], s6, v14, 0
 ; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], s6, v14, 0
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
 ; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v7, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v15, v[1:2]
@@ -1346,16 +1346,16 @@ define <2 x i64> @v_sdiv_v2i64_pow2k_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v9, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], 0, v10, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v11, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v12, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v12, v3
-; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v5
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v11, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v12, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v12, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v4, -1, v6, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, vcc, 0, v3, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v2, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v4
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, -1, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v10
 ; GISEL-NEXT:    v_addc_u32_e32 v7, vcc, 0, v13, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
@@ -1473,17 +1473,17 @@ define <2 x i64> @v_sdiv_v2i64_pow2k_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, v10, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v10, v[1:2]
-; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v8, v0
+; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v8, v0
 ; CGP-NEXT:    v_mov_b32_e32 v4, 0x1000
-; CGP-NEXT:    v_subb_u32_e64 v1, s[4:5], v11, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v11, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; CGP-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v6, vcc
+; CGP-NEXT:    v_subb_u32_e64 v1, vcc, v11, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v11, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v8, -1, v7, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v6, s[4:5]
 ; CGP-NEXT:    v_cvt_f32_u32_e32 v6, 0x1000
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v8, -1, v7, s[4:5]
 ; CGP-NEXT:    v_subbrev_u32_e32 v7, vcc, 0, v1, vcc
 ; CGP-NEXT:    v_cvt_f32_ubyte0_e32 v1, 0
 ; CGP-NEXT:    v_mac_f32_e32 v6, 0x4f800000, v1
@@ -1592,16 +1592,16 @@ define <2 x i64> @v_sdiv_v2i64_pow2k_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, v10, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], s6, v10, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
-; CGP-NEXT:    v_subb_u32_e64 v3, s[4:5], v12, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v12, v5
-; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v4
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v12, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v12, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v3, -1, v6, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v2, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v3, -1, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v9
 ; CGP-NEXT:    v_addc_u32_e32 v7, vcc, 0, v10, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
@@ -1722,17 +1722,17 @@ define i64 @v_sdiv_i64_oddk_denom(i64 %num) {
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, v5, v2
 ; CHECK-NEXT:    v_add_i32_e32 v5, vcc, v9, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[4:5], s6, v5, v[1:2]
-; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v3, v0
-; CHECK-NEXT:    v_subb_u32_e64 v2, s[4:5], v4, v1, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v1, s[4:5], v4, v1
+; CHECK-NEXT:    v_sub_i32_e64 v0, s[4:5], v3, v0
 ; CHECK-NEXT:    v_mov_b32_e32 v6, 0x12d8fb
-; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v6
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v4, v1, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v4, v1
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v6
+; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v2
+; CHECK-NEXT:    v_cndmask_b32_e32 v2, -1, v3, vcc
+; CHECK-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
-; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v2
 ; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v2, -1, v3, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v8
 ; CHECK-NEXT:    v_addc_u32_e32 v4, vcc, 0, v5, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v6
@@ -1854,16 +1854,17 @@ define <2 x i64> @v_sdiv_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v8, v7
 ; GISEL-NEXT:    v_add_i32_e32 v11, vcc, v11, v7
 ; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], v5, v11, v[1:2]
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v9, v0
-; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], 0, v10, v[7:8]
 ; GISEL-NEXT:    s_sub_u32 s6, 0, 0x12d8fb
 ; GISEL-NEXT:    s_subb_u32 s7, 0, 0
-; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v12, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[4:5], v12, v7
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; GISEL-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v7, vcc
+; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], 0, v10, v[7:8]
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v9, v0
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v12, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v12, v7
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; GISEL-NEXT:    v_cndmask_b32_e32 v8, -1, v8, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v7, s[4:5]
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v7, 0x12d8fb
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
 ; GISEL-NEXT:    v_subbrev_u32_e32 v9, vcc, 0, v1, vcc
@@ -1876,11 +1877,10 @@ define <2 x i64> @v_sdiv_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_trunc_f32_e32 v6, v6
 ; GISEL-NEXT:    v_mac_f32_e32 v1, 0xcf800000, v6
 ; GISEL-NEXT:    v_cvt_u32_f32_e32 v14, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, -1, v8, s[4:5]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v5
 ; GISEL-NEXT:    v_cvt_u32_f32_e32 v15, v6
-; GISEL-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], s6, v14, 0
 ; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], s6, v14, 0
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
 ; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v7, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v15, v[1:2]
@@ -1976,16 +1976,16 @@ define <2 x i64> @v_sdiv_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v9, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], 0, v10, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v11, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v12, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v12, v3
-; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v5
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v11, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v12, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v12, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v4, -1, v6, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, vcc, 0, v3, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v2, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v4
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, -1, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v10
 ; GISEL-NEXT:    v_addc_u32_e32 v7, vcc, 0, v13, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
@@ -2103,17 +2103,17 @@ define <2 x i64> @v_sdiv_v2i64_oddk_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, v10, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v10, v[1:2]
-; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v8, v0
+; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v8, v0
 ; CGP-NEXT:    v_mov_b32_e32 v4, 0x12d8fb
-; CGP-NEXT:    v_subb_u32_e64 v1, s[4:5], v11, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v11, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; CGP-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v6, vcc
+; CGP-NEXT:    v_subb_u32_e64 v1, vcc, v11, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v11, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v8, -1, v7, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v6, s[4:5]
 ; CGP-NEXT:    v_cvt_f32_u32_e32 v6, 0x12d8fb
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v8, -1, v7, s[4:5]
 ; CGP-NEXT:    v_subbrev_u32_e32 v7, vcc, 0, v1, vcc
 ; CGP-NEXT:    v_cvt_f32_ubyte0_e32 v1, 0
 ; CGP-NEXT:    v_mac_f32_e32 v6, 0x4f800000, v1
@@ -2222,16 +2222,16 @@ define <2 x i64> @v_sdiv_v2i64_oddk_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, v10, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], s6, v10, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
-; CGP-NEXT:    v_subb_u32_e64 v3, s[4:5], v12, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v12, v5
-; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v4
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v12, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v12, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v3, -1, v6, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v2, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v3, -1, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v9
 ; CGP-NEXT:    v_addc_u32_e32 v7, vcc, 0, v10, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
@@ -2374,19 +2374,19 @@ define i64 @v_sdiv_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CHECK-NEXT:    v_add_i32_e32 v6, vcc, v10, v5
 ; CHECK-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v2, v6, v[4:5]
-; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v7, v3
 ; CHECK-NEXT:    v_mad_u64_u32 v[4:5], s[4:5], v1, v8, v[4:5]
-; CHECK-NEXT:    v_subb_u32_e64 v5, s[4:5], v12, v4, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v4, s[4:5], v12, v4
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v1
-; CHECK-NEXT:    v_subb_u32_e32 v4, vcc, v4, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v2
+; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v7, v3
+; CHECK-NEXT:    v_subb_u32_e64 v5, vcc, v12, v4, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, v12, v4
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v5, v1
+; CHECK-NEXT:    v_cndmask_b32_e32 v5, v7, v10, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v4, vcc, v4, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v3, v2
-; CHECK-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v1
 ; CHECK-NEXT:    v_subbrev_u32_e32 v4, vcc, 0, v4, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, v7, v10, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v7, vcc, 1, v8
 ; CHECK-NEXT:    v_addc_u32_e32 v10, vcc, 0, v6, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
@@ -2425,10 +2425,10 @@ define i64 @v_sdiv_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, v5
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v3, v1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v2, s[4:5], v1, v5
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v2, vcc, v1, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -2547,21 +2547,21 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v11, v9
 ; GISEL-NEXT:    v_mad_u64_u32 v[9:10], s[6:7], v8, v16, v[1:2]
 ; GISEL-NEXT:    v_lshl_b64 v[11:12], s[4:5], v6
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v13, v0
 ; GISEL-NEXT:    v_mad_u64_u32 v[9:10], s[4:5], v5, v15, v[9:10]
-; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v14, v9, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v14, v9
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v1, v5
-; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v6, v5, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v13, v0
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v14, v9
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v8
+; GISEL-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v1, v5
+; GISEL-NEXT:    v_cndmask_b32_e32 v13, v9, v10, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v6, v5, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v8
 ; GISEL-NEXT:    v_subbrev_u32_e32 v14, vcc, 0, v1, vcc
 ; GISEL-NEXT:    v_ashrrev_i32_e32 v6, 31, v12
 ; GISEL-NEXT:    v_add_i32_e32 v1, vcc, v11, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v13, v9, v10, s[4:5]
 ; GISEL-NEXT:    v_addc_u32_e32 v9, vcc, v12, v6, vcc
 ; GISEL-NEXT:    v_xor_b32_e32 v10, v1, v6
 ; GISEL-NEXT:    v_xor_b32_e32 v9, v9, v6
@@ -2678,19 +2678,19 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_xor_b32_e32 v8, v8, v7
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v7
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v9, v11, v[3:4]
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v5, v2
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v8, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v5, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v15, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v15, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v9
-; GISEL-NEXT:    v_subb_u32_e32 v3, vcc, v3, v9, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v10
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v15, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v15, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v9
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v10
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v9
+; GISEL-NEXT:    v_cndmask_b32_e32 v4, v5, v7, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v9, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v2, v10
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v9
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, v5, v7, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v5, vcc, 1, v11
 ; GISEL-NEXT:    v_addc_u32_e32 v7, vcc, 0, v12, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
@@ -2828,19 +2828,19 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v8, vcc, v9, v8
 ; CGP-NEXT:    v_add_i32_e32 v12, vcc, v12, v8
 ; CGP-NEXT:    v_mad_u64_u32 v[8:9], s[4:5], v2, v12, v[4:5]
-; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v11, v3
 ; CGP-NEXT:    v_mad_u64_u32 v[8:9], s[4:5], v1, v10, v[8:9]
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v14, v8, vcc
-; CGP-NEXT:    v_sub_i32_e64 v8, s[4:5], v14, v8
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v1
-; CGP-NEXT:    v_subb_u32_e32 v8, vcc, v8, v1, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v2
+; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v11, v3
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v14, v8, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v14, v8
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v4, v9, v11, vcc
+; CGP-NEXT:    v_subb_u32_e64 v8, vcc, v8, v1, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v3, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v1
 ; CGP-NEXT:    v_subbrev_u32_e32 v8, vcc, 0, v8, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v4, v9, v11, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v10
 ; CGP-NEXT:    v_addc_u32_e32 v11, vcc, 0, v12, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v1
@@ -2882,10 +2882,10 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, v2
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v8, v1
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v1, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v1, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -3005,19 +3005,19 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, v8, v7
 ; CGP-NEXT:    v_add_i32_e32 v8, vcc, v11, v7
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v4, v8, v[6:7]
-; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v10, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v3, v9, v[6:7]
-; CGP-NEXT:    v_subb_u32_e64 v7, s[4:5], v13, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v13, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v3
-; CGP-NEXT:    v_subb_u32_e32 v6, vcc, v6, v3, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v4
+; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v10, v5
+; CGP-NEXT:    v_subb_u32_e64 v7, vcc, v13, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v13, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v7, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v7, v10, v11, vcc
+; CGP-NEXT:    v_subb_u32_e64 v6, vcc, v6, v3, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v5, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v7, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v6, vcc, 0, v6, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v7, v10, v11, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, 1, v9
 ; CGP-NEXT:    v_addc_u32_e32 v11, vcc, 0, v8, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v3
@@ -3056,10 +3056,10 @@ define <2 x i64> @v_sdiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v9
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v5, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v3, v9
-; CGP-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v3, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
 ; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
@@ -3089,10 +3089,10 @@ define i64 @v_sdiv_i64_24bit(i64 %num, i64 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -3223,22 +3223,22 @@ define <2 x i64> @v_sdiv_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], v3, v0, v[5:6]
 ; GISEL-NEXT:    v_and_b32_e32 v2, 0xffffff, v6
 ; GISEL-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v1, v9, v[7:8]
-; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v10, v4
-; GISEL-NEXT:    v_subb_u32_e64 v7, s[4:5], v11, v5, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v11, v5
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v6, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 0, v2
-; GISEL-NEXT:    v_addc_u32_e64 v2, s[4:5], 0, 0, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v10, v4
+; GISEL-NEXT:    v_subb_u32_e64 v7, vcc, v11, v5, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v11, v5
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v4, vcc, 0, v2
+; GISEL-NEXT:    v_addc_u32_e64 v2, s[6:7], 0, 0, vcc
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v11, v4
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v13, v2
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v7, v1
-; GISEL-NEXT:    v_subb_u32_e32 v5, vcc, v5, v1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v7, v1
+; GISEL-NEXT:    v_cndmask_b32_e32 v8, v8, v10, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v5, vcc, v5, v1, s[4:5]
 ; GISEL-NEXT:    v_mac_f32_e32 v11, 0x4f800000, v13
 ; GISEL-NEXT:    v_rcp_iflag_f32_e32 v7, v11
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, v8, v10, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v10, vcc, v6, v3
 ; GISEL-NEXT:    v_subbrev_u32_e32 v11, vcc, 0, v5, vcc
 ; GISEL-NEXT:    v_mul_f32_e32 v5, 0x5f7ffffc, v7
@@ -3290,38 +3290,38 @@ define <2 x i64> @v_sdiv_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_addc_u32_e32 v14, vcc, 0, v18, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v16, v10, v[6:7]
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v1, v17, v13, vcc
-; GISEL-NEXT:    v_cndmask_b32_e32 v3, v18, v14, vcc
-; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v8
+; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v8
 ; GISEL-NEXT:    v_mul_lo_u32 v7, v11, v5
 ; GISEL-NEXT:    v_mul_lo_u32 v8, v10, v6
+; GISEL-NEXT:    v_cndmask_b32_e32 v1, v17, v13, vcc
 ; GISEL-NEXT:    v_mul_hi_u32 v13, v10, v5
-; GISEL-NEXT:    v_cndmask_b32_e32 v1, v9, v1, vcc
-; GISEL-NEXT:    v_add_i32_e64 v9, s[4:5], 0, v12
-; GISEL-NEXT:    v_addc_u32_e64 v12, s[4:5], 0, 0, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v7, s[4:5], v7, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v7, s[4:5], v7, v13
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e32 v3, v18, v14, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v9, v1, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, 0, v12
+; GISEL-NEXT:    v_addc_u32_e64 v12, s[6:7], 0, 0, vcc
+; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v7, v8
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v7, v13
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
 ; GISEL-NEXT:    v_mul_lo_u32 v13, v11, v6
 ; GISEL-NEXT:    v_mul_hi_u32 v5, v11, v5
-; GISEL-NEXT:    v_add_i32_e64 v7, s[4:5], v8, v7
+; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v8, v7
 ; GISEL-NEXT:    v_mul_hi_u32 v8, v10, v6
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v13, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v13, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v5, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v8, s[4:5], v13, v8
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v13, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v13, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v5, v8
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v13, v8
 ; GISEL-NEXT:    v_mul_hi_u32 v6, v11, v6
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v5, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v7, s[4:5], v8, v7
-; GISEL-NEXT:    v_add_i32_e64 v6, s[4:5], v6, v7
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v10, v5
-; GISEL-NEXT:    v_addc_u32_e64 v6, s[4:5], v11, v6, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v5, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v8, v7
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v7
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v10, v5
+; GISEL-NEXT:    v_addc_u32_e32 v6, vcc, v11, v6, vcc
 ; GISEL-NEXT:    v_mul_lo_u32 v7, v12, v5
 ; GISEL-NEXT:    v_mul_lo_u32 v8, v9, v6
-; GISEL-NEXT:    v_cndmask_b32_e32 v3, v0, v3, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, v0, v3, s[4:5]
 ; GISEL-NEXT:    v_mul_hi_u32 v0, v9, v5
 ; GISEL-NEXT:    v_mul_hi_u32 v5, v12, v5
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v7, v8
@@ -3347,18 +3347,18 @@ define <2 x i64> @v_sdiv_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_subrev_i32_e32 v0, vcc, 0, v1
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v2, v8, v[6:7]
 ; GISEL-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v9, v5
-; GISEL-NEXT:    v_subb_u32_e64 v5, s[4:5], v12, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v12, v6
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v2
-; GISEL-NEXT:    v_subb_u32_e32 v6, vcc, v6, v2, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v4
+; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v9, v5
+; GISEL-NEXT:    v_subb_u32_e64 v5, vcc, v12, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v12, v6
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v5, v2
+; GISEL-NEXT:    v_cndmask_b32_e32 v5, v7, v9, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v6, vcc, v6, v2, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v2
 ; GISEL-NEXT:    v_subbrev_u32_e32 v6, vcc, 0, v6, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v7, v9, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v8
 ; GISEL-NEXT:    v_addc_u32_e32 v9, vcc, 0, v10, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
index 4c444f46ff3dddd..8333024d8767a71 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll
@@ -31,16 +31,16 @@ define amdgpu_kernel void @sdivrem_i32(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s5, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s6, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s6, v2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s4, v3
 ; GFX8-NEXT:    flat_store_dword v[0:1], v2
@@ -247,18 +247,18 @@ define amdgpu_kernel void @sdivrem_i64(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v5, v2
 ; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s8, v3, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v6, s11
-; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, s10, v0
-; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s9, v4, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s9
-; GFX8-NEXT:    v_subb_u32_e64 v2, s[0:1], v6, v1, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v1, s[0:1], s11, v1
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s9, v2
-; GFX8-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v0
-; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s9, v2
-; GFX8-NEXT:    v_subb_u32_e32 v1, vcc, v1, v5, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v6, v6, v7, s[0:1]
+; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s9, v4, v[1:2]
+; GFX8-NEXT:    v_sub_u32_e64 v0, s[0:1], s10, v0
+; GFX8-NEXT:    v_subb_u32_e64 v2, vcc, v6, v1, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v1, vcc, s11, v1
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v0
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e32 v6, v6, v7, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v1, vcc, v1, v5, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s8, v0
 ; GFX8-NEXT:    v_subbrev_u32_e64 v8, s[0:1], 0, v1, vcc
 ; GFX8-NEXT:    v_add_u32_e64 v9, s[0:1], 1, v4
@@ -652,18 +652,18 @@ define amdgpu_kernel void @sdivrem_v2i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v0
 ; GFX8-NEXT:    v_mul_hi_u32 v2, v1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s0, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    s_xor_b32 s0, s12, s2
 ; GFX8-NEXT:    s_ashr_i32 s2, s9, 31
 ; GFX8-NEXT:    s_add_i32 s1, s9, s2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
 ; GFX8-NEXT:    s_xor_b32 s1, s1, s2
 ; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v1, v2
 ; GFX8-NEXT:    v_mul_hi_u32 v1, s1, v1
@@ -671,19 +671,19 @@ define amdgpu_kernel void @sdivrem_v2i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_xor_b32_e32 v0, s0, v0
 ; GFX8-NEXT:    v_subrev_u32_e32 v0, vcc, s0, v0
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s11
-; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s12, v2
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
+; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s12, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    s_xor_b32 s0, s2, s10
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
 ; GFX8-NEXT:    v_xor_b32_e32 v1, s0, v1
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s4
 ; GFX8-NEXT:    v_subrev_u32_e32 v1, vcc, s0, v1
@@ -880,79 +880,79 @@ define amdgpu_kernel void @sdivrem_v4i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v0, s3
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s0, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s3, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s3, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s3, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s3, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_mul_lo_u32 v3, s8, v1
 ; GFX8-NEXT:    s_xor_b32 s0, s12, s2
 ; GFX8-NEXT:    s_ashr_i32 s2, s9, 31
 ; GFX8-NEXT:    s_add_i32 s1, s9, s2
 ; GFX8-NEXT:    v_mul_hi_u32 v3, v1, v3
 ; GFX8-NEXT:    s_xor_b32 s1, s1, s2
-; GFX8-NEXT:    v_xor_b32_e32 v0, s0, v0
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s12, v2
+; GFX8-NEXT:    s_ashr_i32 s3, s14, 31
 ; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v1, v3
 ; GFX8-NEXT:    v_mul_hi_u32 v1, s1, v1
-; GFX8-NEXT:    s_ashr_i32 s3, s14, 31
-; GFX8-NEXT:    v_subrev_u32_e32 v0, vcc, s0, v0
-; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s13
+; GFX8-NEXT:    v_xor_b32_e32 v0, s0, v0
 ; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s12, v2
-; GFX8-NEXT:    s_add_i32 s0, s14, s3
+; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s13
+; GFX8-NEXT:    s_add_i32 s8, s14, s3
+; GFX8-NEXT:    v_subrev_u32_e32 v0, vcc, s0, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s1, v3
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v2
-; GFX8-NEXT:    s_xor_b32 s8, s0, s3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v2
+; GFX8-NEXT:    s_xor_b32 s8, s8, s3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v3, s[0:1]
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v3, s8
-; GFX8-NEXT:    v_subrev_u32_e64 v5, s[0:1], s13, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
+; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s13, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[0:1]
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v3, v3
 ; GFX8-NEXT:    v_add_u32_e32 v5, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v2
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v2
 ; GFX8-NEXT:    v_mul_f32_e32 v3, 0x4f7ffffe, v3
 ; GFX8-NEXT:    v_cvt_u32_f32_e32 v3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v5, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v5, s[0:1], s13, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v5, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s13, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[0:1]
 ; GFX8-NEXT:    s_sub_i32 s0, 0, s8
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v5, s0, v3
 ; GFX8-NEXT:    s_ashr_i32 s9, s10, 31
 ; GFX8-NEXT:    s_add_i32 s1, s10, s9
 ; GFX8-NEXT:    s_xor_b32 s1, s1, s9
 ; GFX8-NEXT:    v_mul_hi_u32 v5, v3, v5
-; GFX8-NEXT:    s_xor_b32 s0, s2, s16
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s2, v2
+; GFX8-NEXT:    s_xor_b32 s0, s2, s16
 ; GFX8-NEXT:    v_xor_b32_e32 v1, s0, v1
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v5
 ; GFX8-NEXT:    v_mul_hi_u32 v3, s1, v3
 ; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s2, v2
 ; GFX8-NEXT:    s_ashr_i32 s2, s15, 31
 ; GFX8-NEXT:    v_mul_lo_u32 v6, v3, s8
+; GFX8-NEXT:    s_add_i32 s10, s15, s2
 ; GFX8-NEXT:    v_subrev_u32_e32 v1, vcc, s0, v1
-; GFX8-NEXT:    s_add_i32 s0, s15, s2
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s1, v6
 ; GFX8-NEXT:    v_add_u32_e32 v6, vcc, 1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v2
-; GFX8-NEXT:    s_xor_b32 s10, s0, s2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v6, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v2
+; GFX8-NEXT:    s_xor_b32 s10, s10, s2
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v6, s[0:1]
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v6, s10
-; GFX8-NEXT:    v_subrev_u32_e64 v7, s[0:1], s8, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v7, vcc
+; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v7, s[0:1]
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v6, v6
 ; GFX8-NEXT:    v_add_u32_e32 v7, vcc, 1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v2
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v2
 ; GFX8-NEXT:    v_mul_f32_e32 v6, 0x4f7ffffe, v6
 ; GFX8-NEXT:    v_cvt_u32_f32_e32 v6, v6
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v7, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v7, s[0:1], s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v7, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, v2, v7, s[0:1]
 ; GFX8-NEXT:    s_sub_i32 s0, 0, s10
-; GFX8-NEXT:    v_cndmask_b32_e32 v7, v2, v7, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v2, s0, v6
 ; GFX8-NEXT:    s_xor_b32 s0, s9, s3
 ; GFX8-NEXT:    s_ashr_i32 s3, s11, 31
@@ -968,15 +968,15 @@ define amdgpu_kernel void @sdivrem_v4i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_subrev_u32_e32 v6, vcc, s9, v3
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s1, v7
 ; GFX8-NEXT:    v_add_u32_e32 v7, vcc, 1, v8
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s10, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v7, v8, v7, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v8, s[0:1], s10, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v8, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, v8, v7, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v8, vcc, s10, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 1, v7
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s10, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v7, v7, v8, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v8, s[0:1], s10, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v8, v3, v8, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, v7, v8, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v8, vcc, s10, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v8, v3, v8, s[0:1]
 ; GFX8-NEXT:    s_xor_b32 s0, s3, s2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s0, v7
 ; GFX8-NEXT:    v_xor_b32_e32 v7, s3, v8
@@ -1383,20 +1383,19 @@ define amdgpu_kernel void @sdivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v5, v2
 ; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s12, v3, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v6, s17
-; GFX8-NEXT:    v_sub_u32_e32 v7, vcc, s16, v0
-; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s13, v4, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s13
-; GFX8-NEXT:    s_ashr_i32 s16, s3, 31
-; GFX8-NEXT:    v_subb_u32_e64 v6, s[0:1], v6, v1, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v0, s[0:1], s17, v1
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v6
-; GFX8-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s12, v7
-; GFX8-NEXT:    v_subb_u32_e32 v0, vcc, v0, v5, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s13, v6
+; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s13, v4, v[1:2]
+; GFX8-NEXT:    v_sub_u32_e64 v7, s[0:1], s16, v0
+; GFX8-NEXT:    v_subb_u32_e64 v6, vcc, v6, v1, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, s17, v1
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v6
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s12, v7
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s13, v6
+; GFX8-NEXT:    v_cndmask_b32_e32 v2, v1, v2, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v0, vcc, v0, v5, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v8, vcc, s12, v7
-; GFX8-NEXT:    v_cndmask_b32_e64 v2, v1, v2, s[0:1]
 ; GFX8-NEXT:    v_subbrev_u32_e64 v9, s[0:1], 0, v0, vcc
 ; GFX8-NEXT:    v_add_u32_e64 v1, s[0:1], 1, v4
 ; GFX8-NEXT:    v_addc_u32_e64 v10, s[0:1], 0, v3, s[0:1]
@@ -1408,6 +1407,7 @@ define amdgpu_kernel void @sdivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_cndmask_b32_e64 v11, v11, v12, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e64 v12, s[0:1], 1, v1
 ; GFX8-NEXT:    v_addc_u32_e64 v13, s[0:1], 0, v10, s[0:1]
+; GFX8-NEXT:    s_ashr_i32 s16, s3, 31
 ; GFX8-NEXT:    s_add_u32 s0, s14, s6
 ; GFX8-NEXT:    s_addc_u32 s1, s15, s6
 ; GFX8-NEXT:    s_add_u32 s2, s2, s16
@@ -1530,18 +1530,18 @@ define amdgpu_kernel void @sdivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v9, vcc, v9, v6
 ; GFX8-NEXT:    v_mad_u64_u32 v[6:7], s[0:1], s2, v9, v[3:4]
 ; GFX8-NEXT:    v_mov_b32_e32 v10, s13
-; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s12, v2
-; GFX8-NEXT:    v_mad_u64_u32 v[6:7], s[0:1], s3, v8, v[6:7]
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s3
-; GFX8-NEXT:    v_subb_u32_e64 v7, s[0:1], v10, v6, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v6, s[0:1], s13, v6
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v7
-; GFX8-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s3, v7
-; GFX8-NEXT:    v_subb_u32_e32 v6, vcc, v6, v3, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v10, v10, v11, s[0:1]
+; GFX8-NEXT:    v_mad_u64_u32 v[6:7], s[0:1], s3, v8, v[6:7]
+; GFX8-NEXT:    v_sub_u32_e64 v2, s[0:1], s12, v2
+; GFX8-NEXT:    v_subb_u32_e64 v7, vcc, v10, v6, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v6, vcc, s13, v6
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v7
+; GFX8-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s3, v7
+; GFX8-NEXT:    v_cndmask_b32_e32 v10, v10, v11, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v6, vcc, v6, v3, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v11, vcc, s2, v2
 ; GFX8-NEXT:    v_subbrev_u32_e64 v12, s[0:1], 0, v6, vcc
 ; GFX8-NEXT:    v_add_u32_e64 v13, s[0:1], 1, v8
@@ -2226,16 +2226,16 @@ define amdgpu_kernel void @sdiv_i8(ptr addrspace(1) %out0, ptr addrspace(1) %out
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s8
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s6, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s6, v2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s5, v3
 ; GFX8-NEXT:    flat_store_byte v[0:1], v2
@@ -2375,16 +2375,16 @@ define amdgpu_kernel void @sdivrem_v2i8(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v0, s8
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s0, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s8, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s8, v2
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s8, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    s_sub_i32 s1, 0, s11
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v3, s1, v1
 ; GFX8-NEXT:    s_bfe_i32 s1, s2, 0x80008
 ; GFX8-NEXT:    s_ashr_i32 s2, s1, 31
@@ -2395,23 +2395,23 @@ define amdgpu_kernel void @sdivrem_v2i8(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_xor_b32_e32 v0, s0, v0
 ; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v1, v3
 ; GFX8-NEXT:    v_mul_hi_u32 v1, s1, v1
-; GFX8-NEXT:    v_xor_b32_e32 v2, s9, v2
 ; GFX8-NEXT:    v_subrev_u32_e32 v0, vcc, s0, v0
+; GFX8-NEXT:    v_xor_b32_e32 v2, s9, v2
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s11
-; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s9, v2
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
+; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s9, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    s_xor_b32 s0, s2, s10
 ; GFX8-NEXT:    v_xor_b32_e32 v1, s0, v1
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
 ; GFX8-NEXT:    v_subrev_u32_e32 v1, vcc, s0, v1
 ; GFX8-NEXT:    v_and_b32_e32 v1, 0xff, v1
 ; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
@@ -2635,16 +2635,16 @@ define amdgpu_kernel void @sdiv_i16(ptr addrspace(1) %out0, ptr addrspace(1) %ou
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s8
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s6, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s6, v2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s5, v3
 ; GFX8-NEXT:    flat_store_short v[0:1], v2
@@ -2784,16 +2784,16 @@ define amdgpu_kernel void @sdivrem_v2i16(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v0, s9
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s0, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s9, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s9, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s9, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s9, v2
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s9, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    s_sub_i32 s1, 0, s11
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v3, s1, v1
 ; GFX8-NEXT:    s_bfe_i32 s1, s2, 0x100010
 ; GFX8-NEXT:    s_ashr_i32 s2, s1, 31
@@ -2804,23 +2804,23 @@ define amdgpu_kernel void @sdivrem_v2i16(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_xor_b32_e32 v0, s0, v0
 ; GFX8-NEXT:    v_add_u32_e32 v1, vcc, v1, v3
 ; GFX8-NEXT:    v_mul_hi_u32 v1, s1, v1
-; GFX8-NEXT:    v_xor_b32_e32 v2, s3, v2
 ; GFX8-NEXT:    v_subrev_u32_e32 v0, vcc, s0, v0
+; GFX8-NEXT:    v_xor_b32_e32 v2, s3, v2
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s11
-; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s3, v2
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
+; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s3, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    s_xor_b32 s0, s2, s10
 ; GFX8-NEXT:    v_xor_b32_e32 v1, s0, v1
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
 ; GFX8-NEXT:    v_subrev_u32_e32 v1, vcc, s0, v1
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s2, v3
 ; GFX8-NEXT:    v_and_b32_e32 v1, 0xffff, v1
@@ -3041,16 +3041,16 @@ define amdgpu_kernel void @sdivrem_i3(ptr addrspace(1) %out0, ptr addrspace(1) %
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s8
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s6, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s6, v2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s5, v3
 ; GFX8-NEXT:    v_and_b32_e32 v2, 7, v2
@@ -3192,16 +3192,16 @@ define amdgpu_kernel void @sdivrem_i27(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s8
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s8, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s8, v3
 ; GFX8-NEXT:    v_xor_b32_e32 v2, s6, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v2, vcc, s6, v2
 ; GFX8-NEXT:    v_xor_b32_e32 v3, s5, v3
 ; GFX8-NEXT:    v_and_b32_e32 v2, 0x7ffffff, v2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll
index 88ace1c51f5b023..98bf88e41e8388b 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i32.ll
@@ -311,13 +311,13 @@ define <2 x i32> @v_srem_v2i32_pow2k_denom(<2 x i32> %num) {
 ; GISEL-NEXT:    v_lshlrev_b32_e32 v3, 12, v3
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v4
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v3
-; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, 0x1000, v0
 ; GISEL-NEXT:    v_subrev_i32_e32 v4, vcc, s4, v1
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, 0x1000, v0
 ; GISEL-NEXT:    v_subrev_i32_e32 v4, vcc, s4, v1
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -360,13 +360,13 @@ define <2 x i32> @v_srem_v2i32_pow2k_denom(<2 x i32> %num) {
 ; CGP-NEXT:    v_lshlrev_b32_e32 v4, 12, v4
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v4
-; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, 0x1000, v0
 ; CGP-NEXT:    v_subrev_i32_e32 v4, vcc, 0x1000, v1
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
 ; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, 0x1000, v0
 ; CGP-NEXT:    v_subrev_i32_e32 v4, vcc, 0x1000, v1
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -442,13 +442,13 @@ define <2 x i32> @v_srem_v2i32_oddk_denom(<2 x i32> %num) {
 ; GISEL-NEXT:    v_mul_lo_u32 v3, v3, s4
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v4
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v3
-; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, 0x12d8fb, v0
 ; GISEL-NEXT:    v_subrev_i32_e32 v4, vcc, s4, v1
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; GISEL-NEXT:    v_subrev_i32_e32 v3, vcc, 0x12d8fb, v0
 ; GISEL-NEXT:    v_subrev_i32_e32 v4, vcc, s4, v1
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -491,13 +491,13 @@ define <2 x i32> @v_srem_v2i32_oddk_denom(<2 x i32> %num) {
 ; CGP-NEXT:    v_mul_lo_u32 v4, v4, s4
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v4
-; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, 0x12d8fb, v0
 ; CGP-NEXT:    v_subrev_i32_e32 v4, vcc, 0x12d8fb, v1
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
 ; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v0
+; CGP-NEXT:    v_subrev_i32_e32 v3, vcc, 0x12d8fb, v0
 ; CGP-NEXT:    v_subrev_i32_e32 v4, vcc, 0x12d8fb, v1
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s4, v0
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
index d0c55c69f508775..7cf6daa894dab94 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll
@@ -121,17 +121,17 @@ define i64 @v_srem_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v6, v4
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v8, v4
 ; CHECK-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v0, v4, v[3:4]
-; CHECK-NEXT:    v_sub_i32_e32 v2, vcc, v5, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v1, v7, v[3:4]
-; CHECK-NEXT:    v_subb_u32_e64 v4, s[4:5], v10, v3, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v10, v3
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v1
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v0
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v1
-; CHECK-NEXT:    v_subb_u32_e32 v3, vcc, v3, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, v5, v6, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e64 v2, s[4:5], v5, v2
+; CHECK-NEXT:    v_subb_u32_e64 v4, vcc, v10, v3, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v10, v3
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
+; CHECK-NEXT:    v_cndmask_b32_e32 v5, v5, v6, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v3, vcc, v3, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v6, vcc, v2, v0
 ; CHECK-NEXT:    v_subbrev_u32_e64 v7, s[4:5], 0, v3, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v1
@@ -294,21 +294,21 @@ define amdgpu_ps i64 @s_srem_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, v5, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s8, v2, v[1:2]
 ; CHECK-NEXT:    v_mov_b32_e32 v5, s11
-; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, s10, v0
-; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s9, v4, v[1:2]
 ; CHECK-NEXT:    v_mov_b32_e32 v3, s9
-; CHECK-NEXT:    v_subb_u32_e64 v2, s[0:1], v5, v1, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v1, s[0:1], s11, v1
-; CHECK-NEXT:    v_subb_u32_e32 v1, vcc, v1, v3, vcc
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s9, v2
+; CHECK-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s9, v4, v[1:2]
+; CHECK-NEXT:    v_sub_i32_e64 v0, s[0:1], s10, v0
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v5, v1, s[0:1]
+; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, s11, v1
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s9, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s8, v0
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, s9, v2
+; CHECK-NEXT:    v_cndmask_b32_e32 v2, v4, v5, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v1, vcc, v1, v3, s[0:1]
 ; CHECK-NEXT:    v_subrev_i32_e32 v3, vcc, s8, v0
-; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[0:1]
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s8, v0
 ; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[0:1]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[0:1], s9, v2
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s9, v1
-; CHECK-NEXT:    v_cndmask_b32_e64 v2, v4, v5, s[0:1]
 ; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, vcc
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s8, v3
 ; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
@@ -468,16 +468,16 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v13, v9
 ; GISEL-NEXT:    v_mad_u64_u32 v[9:10], s[4:5], v5, v9, v[1:2]
 ; GISEL-NEXT:    v_mad_u64_u32 v[9:10], s[4:5], v8, v12, v[9:10]
-; GISEL-NEXT:    v_sub_i32_e32 v10, vcc, v11, v0
-; GISEL-NEXT:    v_subb_u32_e64 v11, s[4:5], v14, v9, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v14, v9
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v11, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v11, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v12, v1, v9, s[4:5]
-; GISEL-NEXT:    v_subb_u32_e32 v9, vcc, v0, v8, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v10, s[4:5], v11, v0
+; GISEL-NEXT:    v_subb_u32_e64 v11, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v14, v9
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v11, v8
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v11, v8
+; GISEL-NEXT:    v_cndmask_b32_e32 v12, v1, v9, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, v0, v8, s[4:5]
 ; GISEL-NEXT:    v_ashrrev_i32_e32 v0, 31, v7
 ; GISEL-NEXT:    v_add_i32_e32 v1, vcc, v6, v0
 ; GISEL-NEXT:    v_addc_u32_e32 v7, vcc, v7, v0, vcc
@@ -598,16 +598,16 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v5, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v7, v13, v[8:9]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v11, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v12, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v12, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v7
-; GISEL-NEXT:    v_subb_u32_e32 v3, vcc, v3, v7, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v8, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v11, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v12, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v12, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v6
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v7
+; GISEL-NEXT:    v_cndmask_b32_e32 v5, v5, v8, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v7, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v2, v6
 ; GISEL-NEXT:    v_subbrev_u32_e64 v9, s[4:5], 0, v3, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v7
@@ -743,17 +743,17 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v5, v4
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v14, v4
 ; CGP-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v0, v4, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v11, v2
 ; CGP-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v1, v13, v[3:4]
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v10, v3, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v10, v3
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v1
-; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v0
-; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v1
-; CGP-NEXT:    v_subb_u32_e32 v3, vcc, v3, v1, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v10, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v11, v2
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v10, v3, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v10, v3
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v5, v5, v10, vcc
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v3, v1, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v10, vcc, v2, v0
 ; CGP-NEXT:    v_subbrev_u32_e64 v11, s[4:5], 0, v3, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v11, v1
@@ -912,17 +912,17 @@ define <2 x i64> @v_srem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v12, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v2, v6, v[5:6]
-; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v9, v4
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v3, v11, v[5:6]
-; CGP-NEXT:    v_subb_u32_e64 v6, s[4:5], v8, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v8, v5
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v6, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v6, v3
-; CGP-NEXT:    v_subb_u32_e32 v5, vcc, v5, v3, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v7, v7, v8, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v9, v4
+; CGP-NEXT:    v_subb_u32_e64 v6, vcc, v8, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v8, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v6, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v7, v7, v8, vcc
+; CGP-NEXT:    v_subb_u32_e64 v5, vcc, v5, v3, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v4, v2
 ; CGP-NEXT:    v_subbrev_u32_e64 v9, s[4:5], 0, v5, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v3
@@ -1203,16 +1203,16 @@ define <2 x i64> @v_srem_v2i64_pow2k_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    s_sub_u32 s6, 0, 0x1000
 ; GISEL-NEXT:    s_subb_u32 s7, 0, 0
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], 0, v8, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v10, v0
-; GISEL-NEXT:    v_subb_u32_e64 v9, s[4:5], v11, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v11, v6
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, -1, v1, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v8, s[4:5], v10, v0
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, v11, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v11, v6
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
+; GISEL-NEXT:    v_cndmask_b32_e32 v10, -1, v1, vcc
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v1, 0x1000
 ; GISEL-NEXT:    v_cvt_f32_ubyte0_e32 v6, 0
-; GISEL-NEXT:    v_subbrev_u32_e32 v0, vcc, 0, v0, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v0, vcc, 0, v0, s[4:5]
 ; GISEL-NEXT:    v_mac_f32_e32 v1, 0x4f800000, v6
 ; GISEL-NEXT:    v_rcp_iflag_f32_e32 v1, v1
 ; GISEL-NEXT:    v_sub_i32_e32 v11, vcc, v8, v5
@@ -1321,22 +1321,22 @@ define <2 x i64> @v_srem_v2i64_pow2k_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v9, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], 0, v11, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v10, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v12, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v12, v3
-; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v10, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v12, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v12, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, vcc, 0, v3, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v2, v5
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v5
 ; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v5
 ; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v9, vcc
 ; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v7, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v4
 ; GISEL-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[4:5]
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v9
 ; GISEL-NEXT:    v_cndmask_b32_e32 v5, v7, v5, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e32 v3, v3, v10, vcc
@@ -1447,16 +1447,16 @@ define <2 x i64> @v_srem_v2i64_pow2k_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v9, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v6, v[1:2]
-; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v8, v0
-; CGP-NEXT:    v_subb_u32_e64 v9, s[4:5], v11, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v11, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v9
-; CGP-NEXT:    v_cndmask_b32_e64 v10, -1, v1, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v8, s[4:5], v8, v0
+; CGP-NEXT:    v_subb_u32_e64 v9, vcc, v11, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v11, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
+; CGP-NEXT:    v_cndmask_b32_e32 v10, -1, v1, vcc
 ; CGP-NEXT:    v_cvt_f32_u32_e32 v1, 0x1000
 ; CGP-NEXT:    v_cvt_f32_ubyte0_e32 v6, 0
-; CGP-NEXT:    v_subbrev_u32_e32 v0, vcc, 0, v0, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v0, vcc, 0, v0, s[4:5]
 ; CGP-NEXT:    v_mac_f32_e32 v1, 0x4f800000, v6
 ; CGP-NEXT:    v_rcp_iflag_f32_e32 v1, v1
 ; CGP-NEXT:    v_sub_i32_e32 v11, vcc, v8, v4
@@ -1563,22 +1563,22 @@ define <2 x i64> @v_srem_v2i64_pow2k_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v9, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], s6, v5, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
-; CGP-NEXT:    v_subb_u32_e64 v3, s[4:5], v12, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v12, v5
-; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v12, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v12, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v2, v4
 ; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v4
 ; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v5
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v4
 ; CGP-NEXT:    v_cndmask_b32_e32 v9, -1, v9, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v7, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[4:5]
 ; CGP-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v9
 ; CGP-NEXT:    v_cndmask_b32_e32 v4, v7, v4, vcc
 ; CGP-NEXT:    v_cndmask_b32_e32 v5, v5, v10, vcc
@@ -1824,16 +1824,16 @@ define <2 x i64> @v_srem_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    s_sub_u32 s6, 0, 0x12d8fb
 ; GISEL-NEXT:    s_subb_u32 s7, 0, 0
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], 0, v8, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v10, v0
-; GISEL-NEXT:    v_subb_u32_e64 v9, s[4:5], v11, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v11, v6
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, -1, v1, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v8, s[4:5], v10, v0
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, v11, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v11, v6
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
+; GISEL-NEXT:    v_cndmask_b32_e32 v10, -1, v1, vcc
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v1, 0x12d8fb
 ; GISEL-NEXT:    v_cvt_f32_ubyte0_e32 v6, 0
-; GISEL-NEXT:    v_subbrev_u32_e32 v0, vcc, 0, v0, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v0, vcc, 0, v0, s[4:5]
 ; GISEL-NEXT:    v_mac_f32_e32 v1, 0x4f800000, v6
 ; GISEL-NEXT:    v_rcp_iflag_f32_e32 v1, v1
 ; GISEL-NEXT:    v_sub_i32_e32 v11, vcc, v8, v5
@@ -1942,22 +1942,22 @@ define <2 x i64> @v_srem_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v9, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], 0, v11, v[6:7]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v10, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v12, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v12, v3
-; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v10, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v12, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v12, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, vcc, 0, v3, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v2, v5
 ; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v5
 ; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v5
 ; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v9, vcc
 ; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v7, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v4
 ; GISEL-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[4:5]
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v9
 ; GISEL-NEXT:    v_cndmask_b32_e32 v5, v7, v5, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e32 v3, v3, v10, vcc
@@ -2068,16 +2068,16 @@ define <2 x i64> @v_srem_v2i64_oddk_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v9, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], s6, v6, v[1:2]
-; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v8, v0
-; CGP-NEXT:    v_subb_u32_e64 v9, s[4:5], v11, v6, vcc
-; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v11, v6
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v9
-; CGP-NEXT:    v_cndmask_b32_e64 v10, -1, v1, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v8, s[4:5], v8, v0
+; CGP-NEXT:    v_subb_u32_e64 v9, vcc, v11, v6, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v11, v6
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
+; CGP-NEXT:    v_cndmask_b32_e32 v10, -1, v1, vcc
 ; CGP-NEXT:    v_cvt_f32_u32_e32 v1, 0x12d8fb
 ; CGP-NEXT:    v_cvt_f32_ubyte0_e32 v6, 0
-; CGP-NEXT:    v_subbrev_u32_e32 v0, vcc, 0, v0, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v0, vcc, 0, v0, s[4:5]
 ; CGP-NEXT:    v_mac_f32_e32 v1, 0x4f800000, v6
 ; CGP-NEXT:    v_rcp_iflag_f32_e32 v1, v1
 ; CGP-NEXT:    v_sub_i32_e32 v11, vcc, v8, v4
@@ -2184,22 +2184,22 @@ define <2 x i64> @v_srem_v2i64_oddk_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v6, v5
 ; CGP-NEXT:    v_add_i32_e32 v5, vcc, v9, v5
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], s6, v5, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
-; CGP-NEXT:    v_subb_u32_e64 v3, s[4:5], v12, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v12, v5
-; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v12, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v12, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v2, v4
 ; CGP-NEXT:    v_subbrev_u32_e32 v5, vcc, 0, v5, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v4
 ; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v5
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v4
 ; CGP-NEXT:    v_cndmask_b32_e32 v9, -1, v9, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v7, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
 ; CGP-NEXT:    v_subbrev_u32_e32 v10, vcc, 0, v5, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[4:5]
 ; CGP-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v9
 ; CGP-NEXT:    v_cndmask_b32_e32 v4, v7, v4, vcc
 ; CGP-NEXT:    v_cndmask_b32_e32 v5, v5, v10, vcc
@@ -2336,17 +2336,17 @@ define i64 @v_srem_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v5, v4
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v8, v4
 ; CHECK-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v0, v4, v[3:4]
-; CHECK-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
 ; CHECK-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v1, v6, v[3:4]
-; CHECK-NEXT:    v_subb_u32_e64 v4, s[4:5], v10, v3, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v10, v3
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v1
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v0
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v1
-; CHECK-NEXT:    v_subb_u32_e32 v3, vcc, v3, v1, vcc
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, v5, v6, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
+; CHECK-NEXT:    v_subb_u32_e64 v4, vcc, v10, v3, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v10, v3
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
+; CHECK-NEXT:    v_cndmask_b32_e32 v5, v5, v6, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v3, vcc, v3, v1, s[4:5]
 ; CHECK-NEXT:    v_sub_i32_e32 v6, vcc, v2, v0
 ; CHECK-NEXT:    v_subbrev_u32_e64 v7, s[4:5], 0, v3, vcc
 ; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v1
@@ -2504,18 +2504,18 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v10, v8
 ; GISEL-NEXT:    v_mad_u64_u32 v[8:9], s[6:7], v5, v8, v[1:2]
 ; GISEL-NEXT:    v_lshl_b64 v[10:11], s[4:5], v6
-; GISEL-NEXT:    v_sub_i32_e32 v12, vcc, v12, v0
 ; GISEL-NEXT:    v_mad_u64_u32 v[8:9], s[4:5], v7, v14, v[8:9]
-; GISEL-NEXT:    v_subb_u32_e64 v14, s[4:5], v13, v8, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v13, v8
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v14, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v12, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v14, v7
-; GISEL-NEXT:    v_subb_u32_e32 v9, vcc, v0, v7, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v12, s[4:5], v12, v0
+; GISEL-NEXT:    v_subb_u32_e64 v14, vcc, v13, v8, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v13, v8
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v14, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v12, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v14, v7
+; GISEL-NEXT:    v_cndmask_b32_e32 v13, v1, v6, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, v0, v7, s[4:5]
 ; GISEL-NEXT:    v_ashrrev_i32_e32 v0, 31, v11
-; GISEL-NEXT:    v_cndmask_b32_e64 v13, v1, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v1, vcc, v10, v0
 ; GISEL-NEXT:    v_addc_u32_e32 v8, vcc, v11, v0, vcc
 ; GISEL-NEXT:    v_xor_b32_e32 v6, v1, v0
@@ -2635,16 +2635,16 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v1, v4
 ; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v5, v4, vcc
 ; GISEL-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v8, v11, v[9:10]
-; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v12, v2
-; GISEL-NEXT:    v_subb_u32_e64 v4, s[4:5], v13, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v13, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v8
-; GISEL-NEXT:    v_subb_u32_e32 v3, vcc, v3, v8, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v12, v2
+; GISEL-NEXT:    v_subb_u32_e64 v4, vcc, v13, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v13, v3
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v8
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v6
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v8
+; GISEL-NEXT:    v_cndmask_b32_e32 v5, v5, v9, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v8, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v9, vcc, v2, v6
 ; GISEL-NEXT:    v_subbrev_u32_e64 v10, s[4:5], 0, v3, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v8
@@ -2782,17 +2782,17 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v10, v4
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v13, v4
 ; CGP-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v0, v4, v[3:4]
-; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v9, v2
 ; CGP-NEXT:    v_mad_u64_u32 v[3:4], s[4:5], v1, v12, v[3:4]
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v8, v3, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v8, v3
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v1
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v0
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v4, v1
-; CGP-NEXT:    v_subb_u32_e32 v3, vcc, v3, v1, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v8, v8, v9, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v9, v2
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v8, v3, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v8, v3
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v1
+; CGP-NEXT:    v_cndmask_b32_e32 v8, v8, v9, vcc
+; CGP-NEXT:    v_subb_u32_e64 v3, vcc, v3, v1, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v9, vcc, v2, v0
 ; CGP-NEXT:    v_subbrev_u32_e64 v10, s[4:5], 0, v3, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v1
@@ -2955,17 +2955,17 @@ define <2 x i64> @v_srem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v8, v6
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, v10, v6
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v2, v6, v[5:6]
-; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v7, v4
 ; CGP-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v3, v9, v[5:6]
-; CGP-NEXT:    v_subb_u32_e64 v6, s[4:5], v13, v5, vcc
-; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v13, v5
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v6, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], v6, v3
-; CGP-NEXT:    v_subb_u32_e32 v5, vcc, v5, v3, vcc
-; CGP-NEXT:    v_cndmask_b32_e64 v7, v7, v8, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v7, v4
+; CGP-NEXT:    v_subb_u32_e64 v6, vcc, v13, v5, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v13, v5
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v6, v3
+; CGP-NEXT:    v_cndmask_b32_e32 v7, v7, v8, vcc
+; CGP-NEXT:    v_subb_u32_e64 v5, vcc, v5, v3, s[4:5]
 ; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v4, v2
 ; CGP-NEXT:    v_subbrev_u32_e64 v9, s[4:5], 0, v5, vcc
 ; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v3
@@ -3170,29 +3170,29 @@ define <2 x i64> @v_srem_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mad_u64_u32 v[7:8], s[4:5], v1, v7, v[0:1]
 ; GISEL-NEXT:    v_and_b32_e32 v0, 0xffffff, v6
 ; GISEL-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v3, v9, v[7:8]
-; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v10, v4
-; GISEL-NEXT:    v_subb_u32_e64 v9, s[4:5], v11, v5, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v11, v5
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], 0, v0
-; GISEL-NEXT:    v_addc_u32_e64 v4, s[4:5], 0, 0, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v8, s[4:5], v10, v4
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, v11, v5, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v11, v5
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v9, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v2, vcc, 0, v0
+; GISEL-NEXT:    v_addc_u32_e64 v4, s[6:7], 0, 0, vcc
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v0, v2
 ; GISEL-NEXT:    v_cvt_f32_u32_e32 v10, v4
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v9, v3
-; GISEL-NEXT:    v_subb_u32_e32 v13, vcc, v5, v3, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v9, v3
+; GISEL-NEXT:    v_cndmask_b32_e32 v11, v6, v7, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v13, vcc, v5, v3, s[4:5]
 ; GISEL-NEXT:    v_mac_f32_e32 v0, 0x4f800000, v10
 ; GISEL-NEXT:    v_rcp_iflag_f32_e32 v0, v0
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, v6, v7, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v10, vcc, v8, v1
+; GISEL-NEXT:    v_subbrev_u32_e64 v14, s[4:5], 0, v13, vcc
 ; GISEL-NEXT:    v_mul_f32_e32 v0, 0x5f7ffffc, v0
 ; GISEL-NEXT:    v_mul_f32_e32 v5, 0x2f800000, v0
 ; GISEL-NEXT:    v_trunc_f32_e32 v7, v5
 ; GISEL-NEXT:    v_mac_f32_e32 v0, 0xcf800000, v7
 ; GISEL-NEXT:    v_cvt_u32_f32_e32 v15, v0
-; GISEL-NEXT:    v_subbrev_u32_e64 v14, s[4:5], 0, v13, vcc
 ; GISEL-NEXT:    v_sub_i32_e64 v16, s[4:5], 0, v2
 ; GISEL-NEXT:    v_subb_u32_e64 v17, s[4:5], 0, v4, s[4:5]
 ; GISEL-NEXT:    v_mad_u64_u32 v[5:6], s[4:5], v16, v15, 0
@@ -3238,37 +3238,37 @@ define <2 x i64> @v_srem_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mad_u64_u32 v[0:1], s[4:5], v17, v13, v[0:1]
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v7
 ; GISEL-NEXT:    v_cndmask_b32_e32 v6, v10, v18, vcc
-; GISEL-NEXT:    v_cndmask_b32_e32 v3, v14, v3, vcc
-; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v11
-; GISEL-NEXT:    v_cndmask_b32_e32 v1, v8, v6, vcc
+; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v11
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v8, v6, s[4:5]
 ; GISEL-NEXT:    v_mul_lo_u32 v6, v15, v5
 ; GISEL-NEXT:    v_mul_lo_u32 v7, v13, v0
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v13, v5
-; GISEL-NEXT:    v_add_i32_e64 v8, s[4:5], 0, v12
-; GISEL-NEXT:    v_addc_u32_e64 v10, s[4:5], 0, 0, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v6, s[4:5], v6, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v6, s[4:5], v6, v11
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e32 v3, v14, v3, vcc
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, 0, v12
+; GISEL-NEXT:    v_addc_u32_e64 v10, s[6:7], 0, 0, vcc
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v11
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, 1, vcc
 ; GISEL-NEXT:    v_mul_lo_u32 v11, v15, v0
 ; GISEL-NEXT:    v_mul_hi_u32 v5, v15, v5
-; GISEL-NEXT:    v_add_i32_e64 v6, s[4:5], v7, v6
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
 ; GISEL-NEXT:    v_mul_hi_u32 v7, v13, v0
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v11, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v5, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v7, s[4:5], v11, v7
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v11, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v5, v7
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v7, vcc, v11, v7
 ; GISEL-NEXT:    v_mul_hi_u32 v0, v15, v0
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v5, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v6, s[4:5], v7, v6
-; GISEL-NEXT:    v_add_i32_e64 v0, s[4:5], v0, v6
-; GISEL-NEXT:    v_add_i32_e64 v5, s[4:5], v13, v5
-; GISEL-NEXT:    v_addc_u32_e64 v0, s[4:5], v15, v0, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v5, v6
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v7, v6
+; GISEL-NEXT:    v_add_i32_e32 v0, vcc, v0, v6
+; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v13, v5
+; GISEL-NEXT:    v_addc_u32_e32 v0, vcc, v15, v0, vcc
 ; GISEL-NEXT:    v_mul_lo_u32 v6, v10, v5
 ; GISEL-NEXT:    v_mul_lo_u32 v7, v8, v0
-; GISEL-NEXT:    v_cndmask_b32_e32 v3, v9, v3, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, v9, v3, s[4:5]
 ; GISEL-NEXT:    v_mul_hi_u32 v9, v8, v5
 ; GISEL-NEXT:    v_mul_hi_u32 v5, v10, v5
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v7
@@ -3294,16 +3294,16 @@ define <2 x i64> @v_srem_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_subrev_i32_e32 v0, vcc, 0, v1
 ; GISEL-NEXT:    v_mad_u64_u32 v[6:7], s[4:5], v4, v9, v[6:7]
 ; GISEL-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v8, v5
-; GISEL-NEXT:    v_subb_u32_e64 v5, s[4:5], v10, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v10, v6
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v2
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v5, v4
-; GISEL-NEXT:    v_subb_u32_e32 v6, vcc, v6, v4, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, v7, v8, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v8, v5
+; GISEL-NEXT:    v_subb_u32_e64 v5, vcc, v10, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v10, v6
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, v5, v4
+; GISEL-NEXT:    v_cndmask_b32_e32 v7, v7, v8, vcc
+; GISEL-NEXT:    v_subb_u32_e64 v6, vcc, v6, v4, s[4:5]
 ; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, v3, v2
 ; GISEL-NEXT:    v_subbrev_u32_e64 v9, s[4:5], 0, v6, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v4
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
index 65455d754be4f53..706c2334175b6a1 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/ssubsat.ll
@@ -512,41 +512,38 @@ define i32 @v_ssubsat_v4i8(i32 %lhs.arg, i32 %rhs.arg) {
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v6, 16, v1
 ; GFX6-NEXT:    v_lshrrev_b32_e32 v7, 24, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 24, v1
-; GFX6-NEXT:    s_brev_b32 s5, 1
 ; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, s4, v8
-; GFX6-NEXT:    v_min_i32_e32 v10, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, s5, v10
+; GFX6-NEXT:    v_min_i32_e32 v9, -1, v0
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x80000000, v9
 ; GFX6-NEXT:    v_max_i32_e32 v1, v8, v1
-; GFX6-NEXT:    v_min_i32_e32 v1, v1, v10
+; GFX6-NEXT:    v_min_i32_e32 v1, v1, v9
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 24, v2
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 24, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, s5, v8
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v2, v5, v2
 ; GFX6-NEXT:    v_min_i32_e32 v2, v2, v8
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, v1, v2
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 24, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v9, -2
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v2
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 24, v6
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, -1, v2
-; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, s5, v6
+; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, 0x80000000, v6
 ; GFX6-NEXT:    v_max_i32_e32 v3, v5, v3
 ; GFX6-NEXT:    v_min_i32_e32 v3, v3, v6
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, v2, v3
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 24, v4
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v11, 1
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 24, v1
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 24, v7
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, -1, v3
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 24, v0
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v6, v11
+; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, 0x80000000, v6
 ; GFX6-NEXT:    v_max_i32_e32 v4, v5, v4
 ; GFX6-NEXT:    v_and_b32_e32 v1, 0xff, v1
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v2, 24, v2
@@ -1393,7 +1390,7 @@ define <3 x i32> @v_ssubsat_v3i32(<3 x i32> %lhs, <3 x i32> %rhs) {
 ; GFX6-NEXT:    v_min_i32_e32 v3, v3, v7
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
 ; GFX6-NEXT:    v_max_i32_e32 v3, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v3, vcc, s4, v3
+; GFX6-NEXT:    v_subrev_i32_e32 v3, vcc, 0x7fffffff, v3
 ; GFX6-NEXT:    v_min_i32_e32 v6, -1, v1
 ; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, s5, v6
 ; GFX6-NEXT:    v_max_i32_e32 v3, v3, v4
@@ -1421,7 +1418,7 @@ define <3 x i32> @v_ssubsat_v3i32(<3 x i32> %lhs, <3 x i32> %rhs) {
 ; GFX8-NEXT:    v_min_i32_e32 v3, v3, v7
 ; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, v0, v3
 ; GFX8-NEXT:    v_max_i32_e32 v3, -1, v1
-; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s4, v3
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, 0x7fffffff, v3
 ; GFX8-NEXT:    v_min_i32_e32 v6, -1, v1
 ; GFX8-NEXT:    v_subrev_u32_e32 v6, vcc, s5, v6
 ; GFX8-NEXT:    v_max_i32_e32 v3, v3, v4
@@ -1734,7 +1731,7 @@ define <5 x i32> @v_ssubsat_v5i32(<5 x i32> %lhs, <5 x i32> %rhs) {
 ; GFX6-NEXT:    v_min_i32_e32 v5, v5, v12
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v1
 ; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, s5, v10
 ; GFX6-NEXT:    v_max_i32_e32 v5, v5, v6
@@ -1777,7 +1774,7 @@ define <5 x i32> @v_ssubsat_v5i32(<5 x i32> %lhs, <5 x i32> %rhs) {
 ; GFX8-NEXT:    v_min_i32_e32 v5, v5, v12
 ; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, v0, v5
 ; GFX8-NEXT:    v_max_i32_e32 v5, -1, v1
-; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s4, v5
+; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX8-NEXT:    v_min_i32_e32 v10, -1, v1
 ; GFX8-NEXT:    v_subrev_u32_e32 v10, vcc, s5, v10
 ; GFX8-NEXT:    v_max_i32_e32 v5, v5, v6
@@ -1949,246 +1946,238 @@ define <16 x i32> @v_ssubsat_v16i32(<16 x i32> %lhs, <16 x i32> %rhs) {
 ; GFX6-LABEL: v_ssubsat_v16i32:
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v31, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v31, vcc, s4, v31
+; GFX6-NEXT:    v_subrev_i32_e32 v31, vcc, 0x7fffffff, v31
 ; GFX6-NEXT:    v_max_i32_e32 v16, v31, v16
-; GFX6-NEXT:    s_brev_b32 s5, 1
 ; GFX6-NEXT:    v_min_i32_e32 v31, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v31, vcc, s5, v31
+; GFX6-NEXT:    v_subrev_i32_e32 v31, vcc, 0x80000000, v31
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v31
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v16
 ; GFX6-NEXT:    v_max_i32_e32 v16, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, s4, v16
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_min_i32_e32 v17, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, s5, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, 0x80000000, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, v1, v16
 ; GFX6-NEXT:    v_max_i32_e32 v16, -1, v2
-; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, s4, v16
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
 ; GFX6-NEXT:    v_min_i32_e32 v17, -1, v2
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v18
-; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, s5, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, 0x80000000, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, v2, v16
-; GFX6-NEXT:    v_bfrev_b32_e32 v16, -2
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v3
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_bfrev_b32_e32 v18, 1
-; GFX6-NEXT:    v_min_i32_e32 v19, -1, v3
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v19, v18
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, v3, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v4
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v19, -1, v4
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v19, v18
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, v4, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v5
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v19, -1, v5
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v21
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v19, v18
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v6
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v19, -1, v6
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v22
-; GFX6-NEXT:    v_sub_i32_e32 v19, vcc, v19, v18
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX6-NEXT:    buffer_load_dword v19, off, s[0:3], s32
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v6, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v7
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v7
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v23
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v8
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v8
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v24
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v8, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v9
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v9
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v25
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v10
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v10
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v26
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v11
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v11
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v27
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v11, vcc, v11, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v12
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v12
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v28
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, v12, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v13
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v13
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v29
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v13, vcc, v13, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v14
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v20, -1, v14
-; GFX6-NEXT:    v_sub_i32_e32 v20, vcc, v20, v18
-; GFX6-NEXT:    v_max_i32_e32 v17, v17, v30
-; GFX6-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX6-NEXT:    v_sub_i32_e32 v14, vcc, v14, v17
-; GFX6-NEXT:    v_max_i32_e32 v17, -1, v15
-; GFX6-NEXT:    v_sub_i32_e32 v16, vcc, v17, v16
-; GFX6-NEXT:    v_min_i32_e32 v17, -1, v15
-; GFX6-NEXT:    v_sub_i32_e32 v17, vcc, v17, v18
-; GFX6-NEXT:    s_waitcnt vmcnt(0)
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v3
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v17, -1, v3
 ; GFX6-NEXT:    v_max_i32_e32 v16, v16, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, 0x80000000, v17
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, v3, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v4
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v17, -1, v4
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v20
+; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, 0x80000000, v17
 ; GFX6-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX6-NEXT:    buffer_load_dword v17, off, s[0:3], s32
+; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, v4, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v5
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v5
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v21
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v6
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v6
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v22
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v6, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v7
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v7
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v23
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v8
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v8
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v24
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v8, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v25
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v10
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v10
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v26
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v11
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v11
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v27
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v11, vcc, v11, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v12
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v12
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v28
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v12, vcc, v12, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v29
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v13, vcc, v13, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v14
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v14
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v30
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX6-NEXT:    v_sub_i32_e32 v14, vcc, v14, v16
+; GFX6-NEXT:    v_max_i32_e32 v16, -1, v15
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v18, -1, v15
+; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, 0x80000000, v18
+; GFX6-NEXT:    s_waitcnt vmcnt(0)
+; GFX6-NEXT:    v_max_i32_e32 v16, v16, v17
+; GFX6-NEXT:    v_min_i32_e32 v16, v16, v18
 ; GFX6-NEXT:    v_sub_i32_e32 v15, vcc, v15, v16
 ; GFX6-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: v_ssubsat_v16i32:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    s_brev_b32 s4, -2
 ; GFX8-NEXT:    v_max_i32_e32 v31, -1, v0
-; GFX8-NEXT:    v_subrev_u32_e32 v31, vcc, s4, v31
+; GFX8-NEXT:    v_subrev_u32_e32 v31, vcc, 0x7fffffff, v31
 ; GFX8-NEXT:    v_max_i32_e32 v16, v31, v16
-; GFX8-NEXT:    s_brev_b32 s5, 1
 ; GFX8-NEXT:    v_min_i32_e32 v31, -1, v0
-; GFX8-NEXT:    v_subrev_u32_e32 v31, vcc, s5, v31
+; GFX8-NEXT:    v_subrev_u32_e32 v31, vcc, 0x80000000, v31
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v31
 ; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, v0, v16
 ; GFX8-NEXT:    v_max_i32_e32 v16, -1, v1
-; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, s4, v16
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_min_i32_e32 v17, -1, v1
-; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, s5, v17
+; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, 0x80000000, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_sub_u32_e32 v1, vcc, v1, v16
 ; GFX8-NEXT:    v_max_i32_e32 v16, -1, v2
-; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, s4, v16
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
 ; GFX8-NEXT:    v_min_i32_e32 v17, -1, v2
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v18
-; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, s5, v17
+; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, 0x80000000, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, v2, v16
-; GFX8-NEXT:    v_bfrev_b32_e32 v16, -2
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v3
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_bfrev_b32_e32 v18, 1
-; GFX8-NEXT:    v_min_i32_e32 v19, -1, v3
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v19, v18
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, v3, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v4
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v19, -1, v4
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v19, v18
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_sub_u32_e32 v4, vcc, v4, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v5
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v19, -1, v5
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v21
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v19, v18
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    v_sub_u32_e32 v5, vcc, v5, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v6
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v19, -1, v6
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v22
-; GFX8-NEXT:    v_sub_u32_e32 v19, vcc, v19, v18
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v19
-; GFX8-NEXT:    buffer_load_dword v19, off, s[0:3], s32
-; GFX8-NEXT:    v_sub_u32_e32 v6, vcc, v6, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v7
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v7
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v23
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v7, vcc, v7, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v8
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v8
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v24
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v8, vcc, v8, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v9
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v9
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v25
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v9, vcc, v9, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v10
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v10
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v26
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v10, vcc, v10, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v11
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v11
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v27
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v11, vcc, v11, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v12
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v12
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v28
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v12, vcc, v12, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v13
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v13
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v29
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v13, vcc, v13, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v14
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v20, -1, v14
-; GFX8-NEXT:    v_sub_u32_e32 v20, vcc, v20, v18
-; GFX8-NEXT:    v_max_i32_e32 v17, v17, v30
-; GFX8-NEXT:    v_min_i32_e32 v17, v17, v20
-; GFX8-NEXT:    v_sub_u32_e32 v14, vcc, v14, v17
-; GFX8-NEXT:    v_max_i32_e32 v17, -1, v15
-; GFX8-NEXT:    v_sub_u32_e32 v16, vcc, v17, v16
-; GFX8-NEXT:    v_min_i32_e32 v17, -1, v15
-; GFX8-NEXT:    v_sub_u32_e32 v17, vcc, v17, v18
-; GFX8-NEXT:    s_waitcnt vmcnt(0)
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v3
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v17, -1, v3
 ; GFX8-NEXT:    v_max_i32_e32 v16, v16, v19
+; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, 0x80000000, v17
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, v3, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v4
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v17, -1, v4
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v20
+; GFX8-NEXT:    v_subrev_u32_e32 v17, vcc, 0x80000000, v17
 ; GFX8-NEXT:    v_min_i32_e32 v16, v16, v17
+; GFX8-NEXT:    buffer_load_dword v17, off, s[0:3], s32
+; GFX8-NEXT:    v_sub_u32_e32 v4, vcc, v4, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v5
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v5
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v21
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v5, vcc, v5, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v6
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v6
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v22
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v6, vcc, v6, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v7
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v7
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v23
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v7, vcc, v7, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v8
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v8
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v24
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v8, vcc, v8, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v9
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v9
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v25
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v9, vcc, v9, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v10
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v10
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v26
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v10, vcc, v10, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v11
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v11
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v27
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v11, vcc, v11, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v12
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v12
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v28
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v12, vcc, v12, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v13
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v13
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v29
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v13, vcc, v13, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v14
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v14
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v30
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
+; GFX8-NEXT:    v_sub_u32_e32 v14, vcc, v14, v16
+; GFX8-NEXT:    v_max_i32_e32 v16, -1, v15
+; GFX8-NEXT:    v_subrev_u32_e32 v16, vcc, 0x7fffffff, v16
+; GFX8-NEXT:    v_min_i32_e32 v18, -1, v15
+; GFX8-NEXT:    v_subrev_u32_e32 v18, vcc, 0x80000000, v18
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
+; GFX8-NEXT:    v_max_i32_e32 v16, v16, v17
+; GFX8-NEXT:    v_min_i32_e32 v16, v16, v18
 ; GFX8-NEXT:    v_sub_u32_e32 v15, vcc, v15, v16
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -3059,39 +3048,36 @@ define <2 x float> @v_ssubsat_v4i16(<4 x i16> %lhs, <4 x i16> %rhs) {
 ; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v8, -1, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
-; GFX6-NEXT:    s_brev_b32 s5, 1
 ; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, s4, v8
-; GFX6-NEXT:    v_min_i32_e32 v10, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, s5, v10
+; GFX6-NEXT:    v_min_i32_e32 v9, -1, v0
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x80000000, v9
 ; GFX6-NEXT:    v_max_i32_e32 v4, v8, v4
-; GFX6-NEXT:    v_min_i32_e32 v4, v4, v10
+; GFX6-NEXT:    v_min_i32_e32 v4, v4, v9
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v5
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, s4, v5
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, s5, v8
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v4, v5, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v9, -2
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v8
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v2
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, v1, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v6
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, -1, v2
-; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, s5, v6
+; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, 0x80000000, v6
 ; GFX6-NEXT:    v_max_i32_e32 v4, v5, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v6
 ; GFX6-NEXT:    v_max_i32_e32 v5, -1, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v11, 1
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, v2, v4
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v7
-; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v5, vcc, 0x7fffffff, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, -1, v3
-; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v6, v11
+; GFX6-NEXT:    v_subrev_i32_e32 v6, vcc, 0x80000000, v6
 ; GFX6-NEXT:    v_max_i32_e32 v4, v5, v4
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_min_i32_e32 v4, v4, v6
@@ -3320,61 +3306,57 @@ define <3 x float> @v_ssubsat_v6i16(<6 x i16> %lhs, <6 x i16> %rhs) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
-; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v12, -1, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v6
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_subrev_i32_e32 v12, vcc, s4, v12
-; GFX6-NEXT:    v_min_i32_e32 v14, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v14, vcc, s5, v14
+; GFX6-NEXT:    v_subrev_i32_e32 v12, vcc, 0x7fffffff, v12
+; GFX6-NEXT:    v_min_i32_e32 v13, -1, v0
+; GFX6-NEXT:    v_subrev_i32_e32 v13, vcc, 0x80000000, v13
 ; GFX6-NEXT:    v_max_i32_e32 v6, v12, v6
-; GFX6-NEXT:    v_min_i32_e32 v6, v6, v14
+; GFX6-NEXT:    v_min_i32_e32 v6, v6, v13
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v7
 ; GFX6-NEXT:    v_max_i32_e32 v7, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, s4, v7
+; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_min_i32_e32 v12, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v12, vcc, s5, v12
+; GFX6-NEXT:    v_subrev_i32_e32 v12, vcc, 0x80000000, v12
 ; GFX6-NEXT:    v_max_i32_e32 v6, v7, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v13, -2
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v12
 ; GFX6-NEXT:    v_max_i32_e32 v7, -1, v2
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, v1, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v8
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v2
-; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, s5, v8
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v6, v7, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v8
 ; GFX6-NEXT:    v_max_i32_e32 v7, -1, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v15, 1
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, v2, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v9
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v3
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v8, v15
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v6, v7, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v8
 ; GFX6-NEXT:    v_max_i32_e32 v7, -1, v4
 ; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, v3, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v10
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v4
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v8, v15
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v6, v7, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v8
 ; GFX6-NEXT:    v_max_i32_e32 v7, -1, v5
 ; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, v4, v6
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v11
-; GFX6-NEXT:    v_sub_i32_e32 v7, vcc, v7, v13
+; GFX6-NEXT:    v_subrev_i32_e32 v7, vcc, 0x7fffffff, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, -1, v5
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 16, v1
-; GFX6-NEXT:    v_sub_i32_e32 v8, vcc, v8, v15
+; GFX6-NEXT:    v_subrev_i32_e32 v8, vcc, 0x80000000, v8
 ; GFX6-NEXT:    v_max_i32_e32 v6, v7, v6
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 16, v0
 ; GFX6-NEXT:    v_min_i32_e32 v6, v6, v8
@@ -3674,69 +3656,65 @@ define <4 x float> @v_ssubsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6:       ; %bb.0:
 ; GFX6-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v0, 16, v0
-; GFX6-NEXT:    s_brev_b32 s4, -2
 ; GFX6-NEXT:    v_max_i32_e32 v16, -1, v0
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v8
-; GFX6-NEXT:    s_brev_b32 s5, 1
-; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, s4, v16
-; GFX6-NEXT:    v_min_i32_e32 v18, -1, v0
-; GFX6-NEXT:    v_subrev_i32_e32 v18, vcc, s5, v18
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x7fffffff, v16
+; GFX6-NEXT:    v_min_i32_e32 v17, -1, v0
+; GFX6-NEXT:    v_subrev_i32_e32 v17, vcc, 0x80000000, v17
 ; GFX6-NEXT:    v_max_i32_e32 v8, v16, v8
-; GFX6-NEXT:    v_min_i32_e32 v8, v8, v18
+; GFX6-NEXT:    v_min_i32_e32 v8, v8, v17
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v0, vcc, v0, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v9
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, s4, v9
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v16, -1, v1
-; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, s5, v16
+; GFX6-NEXT:    v_subrev_i32_e32 v16, vcc, 0x80000000, v16
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v2, 16, v2
-; GFX6-NEXT:    v_bfrev_b32_e32 v17, -2
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v16
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v2
 ; GFX6-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v10
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v2
-; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, s5, v10
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v3, 16, v3
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v10
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v3
-; GFX6-NEXT:    v_bfrev_b32_e32 v19, 1
 ; GFX6-NEXT:    v_sub_i32_e32 v2, vcc, v2, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v11
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v3
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v4, 16, v4
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v10
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v4
 ; GFX6-NEXT:    v_sub_i32_e32 v3, vcc, v3, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v12
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v4
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v5, 16, v5
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v10
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v5
 ; GFX6-NEXT:    v_sub_i32_e32 v4, vcc, v4, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v13
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v5
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v6, 16, v6
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v10
 ; GFX6-NEXT:    v_max_i32_e32 v9, -1, v6
 ; GFX6-NEXT:    v_sub_i32_e32 v5, vcc, v5, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v14
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v6
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v7, 16, v7
 ; GFX6-NEXT:    v_min_i32_e32 v8, v8, v10
@@ -3744,10 +3722,10 @@ define <4 x float> @v_ssubsat_v8i16(<8 x i16> %lhs, <8 x i16> %rhs) {
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v1, 16, v1
 ; GFX6-NEXT:    v_sub_i32_e32 v6, vcc, v6, v8
 ; GFX6-NEXT:    v_lshlrev_b32_e32 v8, 16, v15
-; GFX6-NEXT:    v_sub_i32_e32 v9, vcc, v9, v17
+; GFX6-NEXT:    v_subrev_i32_e32 v9, vcc, 0x7fffffff, v9
 ; GFX6-NEXT:    v_min_i32_e32 v10, -1, v7
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v0, 16, v0
-; GFX6-NEXT:    v_sub_i32_e32 v10, vcc, v10, v19
+; GFX6-NEXT:    v_subrev_i32_e32 v10, vcc, 0x80000000, v10
 ; GFX6-NEXT:    v_max_i32_e32 v8, v9, v8
 ; GFX6-NEXT:    v_and_b32_e32 v1, 0xffff, v1
 ; GFX6-NEXT:    v_ashrrev_i32_e32 v2, 16, v2
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll
index 926c3d59e2e463b..bb0af5ffb67fca2 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i32.ll
@@ -20,10 +20,10 @@ define i32 @v_udiv_i32(i32 %num, i32 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -44,10 +44,10 @@ define i32 @v_udiv_i32(i32 %num, i32 %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -74,10 +74,10 @@ define amdgpu_ps i32 @s_udiv_i32(i32 inreg %num, i32 inreg %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v1, v0, s1
 ; GISEL-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s1, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; GISEL-NEXT:    v_subrev_i32_e64 v2, s[2:3], s1, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-NEXT:    v_cmp_le_u32_e64 s[2:3], s1, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[2:3]
+; GISEL-NEXT:    v_subrev_i32_e32 v2, vcc, s1, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[2:3]
 ; GISEL-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; GISEL-NEXT:    v_cmp_le_u32_e32 vcc, s1, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -98,10 +98,10 @@ define amdgpu_ps i32 @s_udiv_i32(i32 inreg %num, i32 inreg %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, s1
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s1, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CGP-NEXT:    v_subrev_i32_e64 v2, s[2:3], s1, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CGP-NEXT:    v_cmp_le_u32_e64 s[2:3], s1, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[2:3]
+; CGP-NEXT:    v_subrev_i32_e32 v2, vcc, s1, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[2:3]
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_le_u32_e32 vcc, s1, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -140,15 +140,15 @@ define <2 x i32> @v_udiv_v2i32(<2 x i32> %num, <2 x i32> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -183,15 +183,15 @@ define <2 x i32> @v_udiv_v2i32(<2 x i32> %num, <2 x i32> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -275,10 +275,10 @@ define i32 @v_udiv_i32_pow2_shl_denom(i32 %x, i32 %y) {
 ; CHECK-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -318,15 +318,15 @@ define <2 x i32> @v_udiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) {
 ; GISEL-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -363,15 +363,15 @@ define <2 x i32> @v_udiv_v2i32_pow2_shl_denom(<2 x i32> %x, <2 x i32> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -401,10 +401,10 @@ define i32 @v_udiv_i32_24bit(i32 %num, i32 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -427,10 +427,10 @@ define i32 @v_udiv_i32_24bit(i32 %num, i32 %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -473,15 +473,15 @@ define <2 x i32> @v_udiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
@@ -520,15 +520,15 @@ define <2 x i32> @v_udiv_v2i32_24bit(<2 x i32> %num, <2 x i32> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v9, vcc, 1, v5
 ; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v0, v6
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
-; CGP-NEXT:    v_sub_i32_e64 v6, s[4:5], v0, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v7, s[6:7], v1, v3
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v6, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v0, v2
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v5, v5, v9, s[6:7]
+; CGP-NEXT:    v_sub_i32_e32 v7, vcc, v1, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v6, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v6, vcc, 1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[6:7]
 ; CGP-NEXT:    v_add_i32_e32 v7, vcc, 1, v5
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v4, v6, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
index 77737b356ff6e9c..f04997c3858ac5e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll
@@ -125,14 +125,14 @@ define i64 @v_udiv_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_add_i32_e32 v8, vcc, 1, v10
 ; CHECK-NEXT:    v_addc_u32_e32 v12, vcc, 0, v11, vcc
 ; CHECK-NEXT:    v_add_i32_e32 v6, vcc, v6, v9
-; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, v4, v7
-; CHECK-NEXT:    v_subb_u32_e64 v7, s[4:5], v5, v6, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v5, s[4:5], v5, v6
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v2
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v3
-; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v5, vcc, v5, v3, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v4, s[4:5], v4, v7
+; CHECK-NEXT:    v_subb_u32_e64 v7, vcc, v5, v6, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v5, vcc, v5, v6
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v3
+; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v5, vcc, v5, v3, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v7, v3
 ; CHECK-NEXT:    v_cndmask_b32_e32 v6, v9, v6, vcc
 ; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, v4, v2
@@ -166,10 +166,10 @@ define i64 @v_udiv_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, v2
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v4, v1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v1, v2
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v3, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v1, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v3, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -295,14 +295,14 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v4, s2, v4
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v7, v4
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v4, v8
-; CHECK-NEXT:    v_sub_i32_e32 v6, vcc, s0, v6
-; CHECK-NEXT:    v_subb_u32_e64 v3, s[4:5], v3, v4, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v4, s[4:5], s1, v4
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[4:5], s2, v6
-; CHECK-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[4:5], s3, v3
-; CHECK-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v0, vcc, v4, v0, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v6, s[4:5], s0, v6
+; CHECK-NEXT:    v_subb_u32_e64 v3, vcc, v3, v4, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, s1, v4
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s2, v6
+; CHECK-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
+; CHECK-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v0, vcc, v4, v0, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, s3, v3
 ; CHECK-NEXT:    v_cndmask_b32_e32 v3, v8, v7, vcc
 ; CHECK-NEXT:    v_subrev_i32_e32 v4, vcc, s2, v6
@@ -338,10 +338,10 @@ define amdgpu_ps i64 @s_udiv_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, s2
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, s0, v1
-; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s2, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CHECK-NEXT:    v_subrev_i32_e64 v2, s[0:1], s2, v1
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[0:1]
+; CHECK-NEXT:    v_subrev_i32_e32 v2, vcc, s2, v1
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s2, v1
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -398,17 +398,17 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v14, v18
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v11, v17
-; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v19, v20
-; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v19, v20
+; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
+; GISEL-NEXT:    v_add_i32_e64 v19, s[6:7], v19, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v8, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v20, v16
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v20, v16
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v8, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v20
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v13, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v21, v10, v16
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v21, v10, v19
 ; GISEL-NEXT:    v_add_i32_e64 v20, s[8:9], v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v13, v19
@@ -421,14 +421,14 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[14:15], v19, v20
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v17
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[16:17], v18, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v21, s[6:7], v21, v22
-; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v21, vcc, v21, v22
+; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v22, vcc, v22, v23
 ; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v24, 0, 1, s[16:17]
@@ -479,17 +479,17 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mul_hi_u32 v9, v14, v9
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[10:11], v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
-; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v17, v15
+; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], v17, v15
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v16, v12
 ; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[8:9], v19, v18
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v20, v19
+; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GISEL-NEXT:    v_add_i32_e64 v15, s[4:5], v15, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v15, v20
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
@@ -497,20 +497,20 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e64 v17, s[4:5], v18, v17
 ; GISEL-NEXT:    v_cndmask_b32_e64 v18, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v10, v12
-; GISEL-NEXT:    v_add_i32_e64 v11, s[4:5], v11, v17
-; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v15, v18
-; GISEL-NEXT:    v_add_i32_e64 v15, s[6:7], v16, v19
+; GISEL-NEXT:    v_add_i32_e64 v10, s[4:5], v10, v12
+; GISEL-NEXT:    v_add_i32_e64 v11, s[6:7], v11, v17
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v15, v18
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v16, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v16, v1, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v17, v0, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v10, v1, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v18, v3, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v2, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v3, v11
-; GISEL-NEXT:    v_add_i32_e64 v8, s[6:7], v8, v12
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v15
-; GISEL-NEXT:    v_addc_u32_e32 v8, vcc, v13, v8, vcc
-; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v8, v12
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v15
+; GISEL-NEXT:    v_addc_u32_e64 v8, vcc, v13, v8, s[4:5]
+; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[6:7]
 ; GISEL-NEXT:    v_mul_lo_u32 v12, v0, v8
 ; GISEL-NEXT:    v_mul_lo_u32 v13, v1, v8
 ; GISEL-NEXT:    v_mul_hi_u32 v14, v0, v8
@@ -551,50 +551,50 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v11, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v6, v9
 ; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v12, v13
-; GISEL-NEXT:    v_add_i32_e32 v13, vcc, 1, v8
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v0, v14
-; GISEL-NEXT:    v_add_i32_e64 v14, s[6:7], 1, v9
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[8:9], v2, v18
-; GISEL-NEXT:    v_add_i32_e64 v18, s[10:11], 1, v13
-; GISEL-NEXT:    v_add_i32_e64 v10, s[12:13], v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], 1, v14
-; GISEL-NEXT:    v_add_i32_e64 v12, s[14:15], v21, v12
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[14:15], v0, v4
+; GISEL-NEXT:    v_add_i32_e64 v13, s[4:5], 1, v8
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[6:7], v0, v14
+; GISEL-NEXT:    v_add_i32_e64 v14, s[8:9], 1, v9
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[10:11], v2, v18
+; GISEL-NEXT:    v_add_i32_e64 v18, s[12:13], 1, v13
+; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v15, v10
+; GISEL-NEXT:    v_add_i32_e64 v15, s[14:15], 1, v14
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v21, v12
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v4
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[16:17], v2, v6
 ; GISEL-NEXT:    v_sub_i32_e64 v0, s[18:19], v0, v4
 ; GISEL-NEXT:    v_sub_i32_e64 v2, s[20:21], v2, v6
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v4, v10
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[22:23], v0, v4
-; GISEL-NEXT:    v_addc_u32_e32 v0, vcc, 0, v10, vcc
+; GISEL-NEXT:    v_addc_u32_e64 v0, s[4:5], 0, v10, s[4:5]
 ; GISEL-NEXT:    v_mul_lo_u32 v4, v6, v12
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v6
-; GISEL-NEXT:    v_addc_u32_e64 v2, s[6:7], 0, v12, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v20
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v6
+; GISEL-NEXT:    v_addc_u32_e64 v2, s[8:9], 0, v12, s[8:9]
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, -1, s[16:17]
-; GISEL-NEXT:    v_add_i32_e64 v4, s[6:7], v19, v4
-; GISEL-NEXT:    v_addc_u32_e64 v19, s[6:7], 0, v0, s[10:11]
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v17
-; GISEL-NEXT:    v_addc_u32_e64 v17, s[6:7], 0, v2, s[12:13]
-; GISEL-NEXT:    v_add_i32_e64 v4, s[6:7], v4, v11
-; GISEL-NEXT:    v_subb_u32_e64 v11, s[6:7], v1, v16, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v1, s[6:7], v1, v16
-; GISEL-NEXT:    v_subb_u32_e64 v16, s[6:7], v3, v4, s[8:9]
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[6:7], v3, v4
+; GISEL-NEXT:    v_add_i32_e32 v4, vcc, v19, v4
+; GISEL-NEXT:    v_addc_u32_e64 v19, vcc, 0, v0, s[12:13]
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v17
+; GISEL-NEXT:    v_addc_u32_e64 v17, vcc, 0, v2, s[14:15]
+; GISEL-NEXT:    v_add_i32_e32 v4, vcc, v4, v11
+; GISEL-NEXT:    v_subb_u32_e64 v11, vcc, v1, v16, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v16
+; GISEL-NEXT:    v_subb_u32_e64 v16, vcc, v3, v4, s[10:11]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v4
 ; GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[22:23]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v11, v5
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[10:11], v11, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
-; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v1, v5, s[4:5]
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v16, v7
-; GISEL-NEXT:    v_subb_u32_e64 v3, s[4:5], v3, v7, s[8:9]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v16, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, v16, v6, s[10:11]
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v11, v5
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v11, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
+; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v1, v5, s[6:7]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v16, v7
+; GISEL-NEXT:    v_subb_u32_e64 v3, s[6:7], v3, v7, s[10:11]
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v16, v7
 ; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v6, v16, v6, s[8:9]
+; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[18:19]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v3, vcc, 0, v3, s[20:21]
-; GISEL-NEXT:    v_cndmask_b32_e64 v16, v16, v20, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v16, v16, v20, s[6:7]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v7
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v1, v5
@@ -733,14 +733,14 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v12, vcc, 1, v14
 ; CGP-NEXT:    v_addc_u32_e32 v16, vcc, 0, v15, vcc
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, v2, v13
-; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v10, v3
-; CGP-NEXT:    v_subb_u32_e64 v10, s[4:5], v11, v2, vcc
-; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v11, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v5
-; CGP-NEXT:    v_cndmask_b32_e64 v13, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v2, vcc, v2, v5, vcc
+; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v10, v3
+; CGP-NEXT:    v_subb_u32_e64 v10, vcc, v11, v2, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v11, v2
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v5
+; CGP-NEXT:    v_cndmask_b32_e64 v13, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v2, vcc, v2, v5, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v10, v5
 ; CGP-NEXT:    v_cndmask_b32_e32 v10, v13, v11, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v3, v4
@@ -775,10 +775,10 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, v4
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v10, v1
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v1, v4
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v1, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -900,14 +900,14 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_add_i32_e32 v10, vcc, 1, v12
 ; CGP-NEXT:    v_addc_u32_e32 v14, vcc, 0, v13, vcc
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v4, v11
-; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v8, v5
-; CGP-NEXT:    v_subb_u32_e64 v8, s[4:5], v9, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v9, v4
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v6
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v7
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v4, vcc, v4, v7, vcc
+; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v8, v5
+; CGP-NEXT:    v_subb_u32_e64 v8, vcc, v9, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v9, v4
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v7
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v4, v7, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v8, v7
 ; CGP-NEXT:    v_cndmask_b32_e32 v8, v11, v9, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v5, v6
@@ -941,10 +941,10 @@ define <2 x i64> @v_udiv_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v6
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v8, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v3, v6
-; CGP-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v3, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
 ; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
@@ -1187,14 +1187,14 @@ define i64 @v_udiv_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_add_i32_e32 v8, vcc, 1, v10
 ; CHECK-NEXT:    v_addc_u32_e32 v12, vcc, 0, v11, vcc
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, v2, v9
-; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v3, v7
-; CHECK-NEXT:    v_subb_u32_e64 v7, s[4:5], v4, v2, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v2, s[4:5], v4, v2
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v5
-; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v7, v6
-; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v2, vcc, v2, v6, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v3, s[4:5], v3, v7
+; CHECK-NEXT:    v_subb_u32_e64 v7, vcc, v4, v2, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v2, vcc, v4, v2
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v7, v6
+; CHECK-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v2, v6, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v7, v6
 ; CHECK-NEXT:    v_cndmask_b32_e32 v4, v9, v4, vcc
 ; CHECK-NEXT:    v_sub_i32_e32 v3, vcc, v3, v5
@@ -1228,10 +1228,10 @@ define i64 @v_udiv_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v0, v5
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v3, v1
-; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
-; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v2, s[4:5], v1, v5
-; CHECK-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v2, vcc, v1, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CHECK-NEXT:    v_add_i32_e32 v2, vcc, 1, v0
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
 ; CHECK-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
@@ -1283,17 +1283,17 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v14, v18
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v11, v17
-; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v19, v20
-; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v19, v20
+; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
+; GISEL-NEXT:    v_add_i32_e64 v19, s[6:7], v19, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v6, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v20, v16
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v20, v16
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v6, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v20
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v13, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v21, v10, v16
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v21, v10, v19
 ; GISEL-NEXT:    v_add_i32_e64 v20, s[8:9], v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v13, v19
@@ -1306,14 +1306,14 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[14:15], v19, v20
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v17
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[16:17], v18, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v21, s[6:7], v21, v22
-; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v21, vcc, v21, v22
+; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v22, vcc, v22, v23
 ; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v24, 0, 1, s[16:17]
@@ -1364,17 +1364,17 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_mul_hi_u32 v9, v14, v9
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[10:11], v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
-; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v17, v15
+; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], v17, v15
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v16, v12
 ; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[8:9], v19, v18
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v20, v19
+; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GISEL-NEXT:    v_add_i32_e64 v15, s[4:5], v15, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v15, v20
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
@@ -1382,20 +1382,20 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e64 v17, s[4:5], v18, v17
 ; GISEL-NEXT:    v_cndmask_b32_e64 v18, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v10, v12
-; GISEL-NEXT:    v_add_i32_e64 v11, s[4:5], v11, v17
-; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v15, v18
-; GISEL-NEXT:    v_add_i32_e64 v15, s[6:7], v16, v19
+; GISEL-NEXT:    v_add_i32_e64 v10, s[4:5], v10, v12
+; GISEL-NEXT:    v_add_i32_e64 v11, s[6:7], v11, v17
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v15, v18
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v16, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v16, v1, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v17, v0, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v10, v1, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v18, v3, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v2, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v3, v11
-; GISEL-NEXT:    v_add_i32_e64 v6, s[6:7], v6, v12
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v15
-; GISEL-NEXT:    v_addc_u32_e32 v6, vcc, v13, v6, vcc
-; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v12
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v15
+; GISEL-NEXT:    v_addc_u32_e64 v6, vcc, v13, v6, s[4:5]
+; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[6:7]
 ; GISEL-NEXT:    v_mul_lo_u32 v12, v0, v6
 ; GISEL-NEXT:    v_mul_lo_u32 v13, v1, v6
 ; GISEL-NEXT:    v_mul_hi_u32 v14, v0, v6
@@ -1436,70 +1436,70 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v11, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v4, v9
 ; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v12, v13
-; GISEL-NEXT:    v_add_i32_e32 v13, vcc, 1, v6
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v0, v14
-; GISEL-NEXT:    v_add_i32_e64 v14, s[6:7], 1, v9
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[8:9], v2, v18
-; GISEL-NEXT:    v_add_i32_e64 v18, s[10:11], 1, v13
-; GISEL-NEXT:    v_add_i32_e64 v10, s[12:13], v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], 1, v14
-; GISEL-NEXT:    v_add_i32_e64 v12, s[14:15], v21, v12
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[14:15], v0, v7
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[16:17], v2, v4
-; GISEL-NEXT:    v_sub_i32_e64 v0, s[18:19], v0, v7
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[20:21], v2, v4
+; GISEL-NEXT:    v_add_i32_e64 v13, s[4:5], 1, v6
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[6:7], v0, v14
+; GISEL-NEXT:    v_add_i32_e64 v14, s[8:9], 1, v9
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[10:11], v2, v18
+; GISEL-NEXT:    v_add_i32_e64 v18, s[12:13], 1, v13
+; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v15, v10
+; GISEL-NEXT:    v_add_i32_e64 v15, s[14:15], 1, v14
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v21, v12
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[16:17], v0, v7
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[18:19], v2, v4
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[20:21], v0, v7
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[22:23], v2, v4
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v7, v10
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[22:23], v0, v7
-; GISEL-NEXT:    v_addc_u32_e32 v0, vcc, 0, v10, vcc
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[24:25], v0, v7
+; GISEL-NEXT:    v_addc_u32_e64 v0, vcc, 0, v10, s[4:5]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v4
 ; GISEL-NEXT:    v_mul_lo_u32 v2, v4, v12
-; GISEL-NEXT:    v_add_i32_e64 v4, s[24:25], v16, v20
-; GISEL-NEXT:    v_addc_u32_e64 v7, s[6:7], 0, v12, s[6:7]
-; GISEL-NEXT:    v_add_i32_e64 v2, s[6:7], v19, v2
-; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v4, s[6:7], v4, v17
-; GISEL-NEXT:    v_subb_u32_e64 v17, s[6:7], v1, v4, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v1, s[6:7], v1, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[16:17]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v17, v8
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[14:15], v17, v8
-; GISEL-NEXT:    v_addc_u32_e64 v17, s[10:11], 0, v0, s[10:11]
-; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v1, v8, s[4:5]
-; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[18:19]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v8
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[10:11], v1, v8
-; GISEL-NEXT:    v_addc_u32_e64 v1, s[12:13], 0, v7, s[12:13]
-; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[22:23]
-; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, -1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v4, vcc, v16, v20
+; GISEL-NEXT:    v_addc_u32_e64 v7, vcc, 0, v12, s[8:9]
+; GISEL-NEXT:    v_add_i32_e32 v2, vcc, v19, v2
+; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[16:17]
+; GISEL-NEXT:    v_add_i32_e32 v4, vcc, v4, v17
+; GISEL-NEXT:    v_subb_u32_e64 v17, vcc, v1, v4, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[18:19]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v17, v8
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[16:17], v17, v8
+; GISEL-NEXT:    v_addc_u32_e64 v17, vcc, 0, v0, s[12:13]
+; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v1, v8, s[6:7]
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[20:21]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v1, v8
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[12:13], v1, v8
+; GISEL-NEXT:    v_addc_u32_e64 v1, vcc, 0, v7, s[14:15]
+; GISEL-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[24:25]
+; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, -1, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v2, vcc, v2, v11
-; GISEL-NEXT:    v_subb_u32_e64 v11, vcc, v3, v2, s[8:9]
+; GISEL-NEXT:    v_subb_u32_e64 v11, vcc, v3, v2, s[10:11]
 ; GISEL-NEXT:    v_sub_i32_e32 v2, vcc, v3, v2
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v11, v5
-; GISEL-NEXT:    v_subb_u32_e64 v2, s[8:9], v2, v5, s[8:9]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v11, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[6:7]
+; GISEL-NEXT:    v_subb_u32_e64 v2, s[4:5], v2, v5, s[10:11]
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v11, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[8:9]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
-; GISEL-NEXT:    v_subbrev_u32_e64 v2, vcc, 0, v2, s[20:21]
-; GISEL-NEXT:    v_cndmask_b32_e64 v3, v3, v16, s[14:15]
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, v11, v4, s[8:9]
+; GISEL-NEXT:    v_subbrev_u32_e64 v2, vcc, 0, v2, s[22:23]
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, v3, v16, s[16:17]
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v11, v4, s[4:5]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v2, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s[4:5]
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v2, v5
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v3
-; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v8, s[10:11]
-; GISEL-NEXT:    v_cndmask_b32_e64 v3, v5, v19, s[6:7]
-; GISEL-NEXT:    v_cmp_ne_u32_e64 s[6:7], 0, v2
+; GISEL-NEXT:    v_cmp_ne_u32_e64 s[6:7], 0, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v8, s[12:13]
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, v5, v19, s[4:5]
+; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v2
 ; GISEL-NEXT:    v_cmp_ne_u32_e64 s[8:9], 0, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v2, v13, v18, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v13, v18, s[4:5]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v3, v14, v15, s[8:9]
-; GISEL-NEXT:    v_cndmask_b32_e64 v4, v0, v17, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v4, v0, v17, s[4:5]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, v7, v1, s[8:9]
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v6, v2, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v2, v9, v3, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v9, v3, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e32 v1, v10, v4, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v3, v12, v5, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v3, v12, v5, s[6:7]
 ; GISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; CGP-LABEL: v_udiv_v2i64_pow2_shl_denom:
@@ -1620,14 +1620,14 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v11, vcc, 1, v13
 ; CGP-NEXT:    v_addc_u32_e32 v15, vcc, 0, v14, vcc
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v4, v12
-; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v8, v10
-; CGP-NEXT:    v_subb_u32_e64 v10, s[4:5], v9, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v9, v4
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v12, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v4, vcc, v4, v3, vcc
+; CGP-NEXT:    v_sub_i32_e64 v8, s[4:5], v8, v10
+; CGP-NEXT:    v_subb_u32_e64 v10, vcc, v9, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v9, v4
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v12, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v4, v3, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v10, v3
 ; CGP-NEXT:    v_cndmask_b32_e32 v9, v12, v9, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v8, v2
@@ -1664,10 +1664,10 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v0, v2
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v8, v1
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v1, v2
-; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v1, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v1, v1, v3, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, 1, v0
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
@@ -1789,14 +1789,14 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_add_i32_e32 v8, vcc, 1, v12
 ; CGP-NEXT:    v_addc_u32_e32 v14, vcc, 0, v13, vcc
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, v4, v11
-; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v5, v6
-; CGP-NEXT:    v_subb_u32_e64 v6, s[4:5], v7, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v7, v4
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v9
-; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v6, v10
-; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v4, vcc, v4, v10, vcc
+; CGP-NEXT:    v_sub_i32_e64 v5, s[4:5], v5, v6
+; CGP-NEXT:    v_subb_u32_e64 v6, vcc, v7, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v7, v4
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v10
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v4, v10, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v6, v10
 ; CGP-NEXT:    v_cndmask_b32_e32 v6, v11, v7, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v5, vcc, v5, v9
@@ -1830,10 +1830,10 @@ define <2 x i64> @v_udiv_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v2, v9
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v5, v3
-; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
-; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v3, v9
-; CGP-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v4, vcc, v3, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[4:5]
 ; CGP-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
 ; CGP-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
@@ -1863,10 +1863,10 @@ define i64 @v_udiv_i64_24bit(i64 %num, i64 %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v3, v2, v1
 ; GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v2
 ; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v3
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v0, v1
-; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v0, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; GISEL-NEXT:    v_add_i32_e32 v3, vcc, 1, v2
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v2, v3, vcc
@@ -2106,22 +2106,22 @@ define <2 x i64> @v_udiv_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_addc_u32_e32 v21, vcc, 0, v19, vcc
 ; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v8, v13
 ; GISEL-NEXT:    v_add_i32_e32 v13, vcc, v14, v15
-; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v9
-; GISEL-NEXT:    v_subb_u32_e64 v9, s[4:5], 0, v8, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v8, s[4:5], 0, v8
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v2, v11
-; GISEL-NEXT:    v_subb_u32_e64 v11, s[6:7], 0, v13, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v13, s[6:7], 0, v13
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v2, v0
-; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], 0, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, -1, v14, s[6:7]
-; GISEL-NEXT:    v_subbrev_u32_e32 v8, vcc, 0, v8, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v3, v9
+; GISEL-NEXT:    v_subb_u32_e64 v9, vcc, 0, v8, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v8, vcc, 0, v8
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[6:7], v2, v11
+; GISEL-NEXT:    v_subb_u32_e64 v11, vcc, 0, v13, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v13, vcc, 0, v13
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v9
+; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v14, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v8, vcc, 0, v8, s[4:5]
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v11
 ; GISEL-NEXT:    v_cndmask_b32_e32 v11, -1, v15, vcc
-; GISEL-NEXT:    v_subbrev_u32_e64 v13, vcc, 0, v13, s[4:5]
+; GISEL-NEXT:    v_subbrev_u32_e64 v13, vcc, 0, v13, s[6:7]
 ; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v1
 ; GISEL-NEXT:    v_subbrev_u32_e32 v8, vcc, 0, v8, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v1
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
index fba8ef2948ade94..80199ce3df8672c 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll
@@ -24,17 +24,17 @@ define amdgpu_kernel void @udivrem_i32(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s6, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_mov_b32_e32 v1, s3
 ; GFX8-NEXT:    flat_store_dword v[0:1], v3
 ; GFX8-NEXT:    s_endpgm
@@ -206,16 +206,16 @@ define amdgpu_kernel void @udivrem_i64(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mov_b32_e32 v6, s9
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s11
 ; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s11, v4, v[1:2]
-; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s8, v0
-; GFX8-NEXT:    v_subb_u32_e64 v6, s[0:1], v6, v1, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v0, s[0:1], s9, v1
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v6
-; GFX8-NEXT:    v_cndmask_b32_e64 v1, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v2
-; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s11, v6
-; GFX8-NEXT:    v_subb_u32_e32 v0, vcc, v0, v5, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v7, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e64 v2, s[0:1], s8, v0
+; GFX8-NEXT:    v_subb_u32_e64 v6, vcc, v6, v1, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v0, vcc, s9, v1
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v6
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s10, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s11, v6
+; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v7, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v0, vcc, v0, v5, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s10, v2
 ; GFX8-NEXT:    v_subbrev_u32_e64 v8, s[0:1], 0, v0, vcc
 ; GFX8-NEXT:    v_add_u32_e64 v9, s[0:1], 1, v4
@@ -552,26 +552,26 @@ define amdgpu_kernel void @udivrem_v2i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_mul_lo_u32 v4, v1, s11
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s8, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s10, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s10, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s10, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s10, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s10, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s10, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s10, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s9, v4
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s11, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s11, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s4
 ; GFX8-NEXT:    v_mov_b32_e32 v5, s5
 ; GFX8-NEXT:    flat_store_dwordx2 v[4:5], v[0:1]
@@ -700,6 +700,7 @@ define amdgpu_kernel void @udivrem_v4i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v6, s14
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v0, v0
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v1, v1
+; GFX8-NEXT:    s_sub_i32 s2, 0, s14
 ; GFX8-NEXT:    v_mul_f32_e32 v0, 0x4f7ffffe, v0
 ; GFX8-NEXT:    v_cvt_u32_f32_e32 v0, v0
 ; GFX8-NEXT:    v_mul_f32_e32 v1, 0x4f7ffffe, v1
@@ -717,67 +718,66 @@ define amdgpu_kernel void @udivrem_v4i32(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_mul_lo_u32 v5, v1, s13
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s8, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s12, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s12, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s12, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s12, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s12, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s12, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v4, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s12, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s12, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v4, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v3, v6
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s9, v5
 ; GFX8-NEXT:    v_add_u32_e32 v5, vcc, 1, v1
 ; GFX8-NEXT:    v_mul_f32_e32 v3, 0x4f7ffffe, v3
 ; GFX8-NEXT:    v_cvt_u32_f32_e32 v3, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v5, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v5, s[0:1], s13, v2
-; GFX8-NEXT:    s_sub_i32 s0, 0, s14
-; GFX8-NEXT:    v_mul_lo_u32 v6, s0, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v5, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s13, v2
+; GFX8-NEXT:    v_mul_lo_u32 v6, s2, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v5, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v2
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v2
 ; GFX8-NEXT:    v_mul_hi_u32 v6, v3, v6
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v5, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v5, s[0:1]
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v5, s15
-; GFX8-NEXT:    v_add_u32_e64 v3, s[0:1], v3, v6
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v6
 ; GFX8-NEXT:    v_rcp_iflag_f32_e32 v6, v5
 ; GFX8-NEXT:    v_mul_hi_u32 v3, s10, v3
-; GFX8-NEXT:    v_subrev_u32_e64 v5, s[0:1], s13, v2
+; GFX8-NEXT:    v_subrev_u32_e32 v5, vcc, s13, v2
 ; GFX8-NEXT:    v_mul_f32_e32 v6, 0x4f7ffffe, v6
 ; GFX8-NEXT:    v_cvt_u32_f32_e32 v6, v6
+; GFX8-NEXT:    v_cndmask_b32_e64 v5, v2, v5, s[0:1]
 ; GFX8-NEXT:    s_sub_i32 s0, 0, s15
-; GFX8-NEXT:    v_cndmask_b32_e32 v5, v2, v5, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v3, s14
 ; GFX8-NEXT:    v_mul_lo_u32 v7, s0, v6
 ; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 1, v3
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s10, v2
 ; GFX8-NEXT:    v_mul_hi_u32 v7, v6, v7
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s14, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v8, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v8, s[0:1], s14, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v8, v2, v8, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s14, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v8, vcc, s14, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v8, v2, v8, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v2, vcc, v6, v7
 ; GFX8-NEXT:    v_mul_hi_u32 v7, s11, v2
 ; GFX8-NEXT:    v_add_u32_e32 v2, vcc, 1, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s14, v8
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s14, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[0:1]
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v7, s15
-; GFX8-NEXT:    v_subrev_u32_e64 v6, s[0:1], s14, v8
-; GFX8-NEXT:    v_cndmask_b32_e32 v6, v8, v6, vcc
+; GFX8-NEXT:    v_subrev_u32_e32 v6, vcc, s14, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v6, v8, v6, s[0:1]
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s11, v3
 ; GFX8-NEXT:    v_add_u32_e32 v8, vcc, 1, v7
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s15, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v7, v7, v8, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v8, s[0:1], s15, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v8, v3, v8, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s15, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, v7, v8, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v8, vcc, s15, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v8, v3, v8, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v7
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s15, v8
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v7, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v7, s[0:1], s15, v8
-; GFX8-NEXT:    v_cndmask_b32_e32 v7, v8, v7, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s15, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v7, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s15, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, v8, v7, s[0:1]
 ; GFX8-NEXT:    v_mov_b32_e32 v9, s5
 ; GFX8-NEXT:    v_mov_b32_e32 v8, s4
 ; GFX8-NEXT:    flat_store_dwordx4 v[8:9], v[0:3]
@@ -1080,19 +1080,19 @@ define amdgpu_kernel void @udivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v4, v2
 ; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s12, v7, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v3, s9
-; GFX8-NEXT:    v_sub_u32_e32 v8, vcc, s8, v0
-; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s13, v6, v[1:2]
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s13
-; GFX8-NEXT:    v_subb_u32_e64 v0, s[0:1], v3, v1, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v1, s[0:1], s9, v1
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s13, v0
-; GFX8-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s12, v8
-; GFX8-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s13, v0
-; GFX8-NEXT:    v_cndmask_b32_e64 v9, v2, v3, s[0:1]
+; GFX8-NEXT:    v_mad_u64_u32 v[1:2], s[0:1], s13, v6, v[1:2]
+; GFX8-NEXT:    v_sub_u32_e64 v8, s[0:1], s8, v0
+; GFX8-NEXT:    v_subb_u32_e64 v0, vcc, v3, v1, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v1, vcc, s9, v1
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s13, v0
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s12, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s13, v0
+; GFX8-NEXT:    v_cndmask_b32_e32 v9, v2, v3, vcc
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v2, s15
-; GFX8-NEXT:    v_subb_u32_e32 v5, vcc, v1, v4, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v5, vcc, v1, v4, s[0:1]
 ; GFX8-NEXT:    v_cvt_f32_u32_e32 v1, s14
 ; GFX8-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v2
 ; GFX8-NEXT:    v_subrev_u32_e32 v10, vcc, s12, v8
@@ -1151,40 +1151,40 @@ define amdgpu_kernel void @udivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_mov_b32_e32 v1, v4
 ; GFX8-NEXT:    v_mad_u64_u32 v[4:5], s[0:1], s2, v14, v[1:2]
 ; GFX8-NEXT:    v_cndmask_b32_e32 v12, v13, v18, vcc
-; GFX8-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v9
-; GFX8-NEXT:    v_mad_u64_u32 v[4:5], s[0:1], s3, v15, v[4:5]
-; GFX8-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, v16
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v6, v2, vcc
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v7, v12, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v5, v10, v19, s[0:1]
+; GFX8-NEXT:    v_cmp_ne_u32_e64 s[0:1], 0, v9
+; GFX8-NEXT:    v_mad_u64_u32 v[4:5], s[2:3], s3, v15, v[4:5]
+; GFX8-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v16
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v6, v2, s[0:1]
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v7, v12, s[0:1]
+; GFX8-NEXT:    v_cndmask_b32_e32 v5, v10, v19, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v7, v14, v3
 ; GFX8-NEXT:    v_mul_lo_u32 v9, v15, v4
-; GFX8-NEXT:    v_cndmask_b32_e32 v5, v8, v5, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v5, v8, v5, s[0:1]
 ; GFX8-NEXT:    v_mul_hi_u32 v8, v15, v3
-; GFX8-NEXT:    v_cndmask_b32_e64 v6, v11, v20, s[0:1]
-; GFX8-NEXT:    v_add_u32_e64 v7, s[0:1], v7, v9
-; GFX8-NEXT:    v_cndmask_b32_e64 v9, 0, 1, s[0:1]
-; GFX8-NEXT:    v_add_u32_e64 v7, s[0:1], v7, v8
-; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[0:1]
+; GFX8-NEXT:    v_cndmask_b32_e32 v6, v11, v20, vcc
+; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v7, v9
+; GFX8-NEXT:    v_cndmask_b32_e64 v9, 0, 1, vcc
+; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v7, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v8, v14, v4
 ; GFX8-NEXT:    v_mul_hi_u32 v3, v14, v3
-; GFX8-NEXT:    v_add_u32_e64 v7, s[0:1], v9, v7
+; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v9, v7
 ; GFX8-NEXT:    v_mul_hi_u32 v9, v15, v4
-; GFX8-NEXT:    v_add_u32_e64 v3, s[0:1], v8, v3
-; GFX8-NEXT:    v_cndmask_b32_e64 v8, 0, 1, s[0:1]
-; GFX8-NEXT:    v_add_u32_e64 v3, s[0:1], v3, v9
-; GFX8-NEXT:    v_cndmask_b32_e64 v9, 0, 1, s[0:1]
-; GFX8-NEXT:    v_add_u32_e64 v8, s[0:1], v8, v9
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v8, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v8, 0, 1, vcc
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v9
+; GFX8-NEXT:    v_cndmask_b32_e64 v9, 0, 1, vcc
+; GFX8-NEXT:    v_add_u32_e32 v8, vcc, v8, v9
 ; GFX8-NEXT:    v_mul_hi_u32 v4, v14, v4
-; GFX8-NEXT:    v_add_u32_e64 v3, s[0:1], v3, v7
-; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, 1, s[0:1]
-; GFX8-NEXT:    v_add_u32_e64 v7, s[0:1], v8, v7
-; GFX8-NEXT:    v_add_u32_e64 v4, s[0:1], v4, v7
-; GFX8-NEXT:    v_add_u32_e64 v3, s[0:1], v15, v3
-; GFX8-NEXT:    v_addc_u32_e64 v4, s[0:1], v14, v4, s[0:1]
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v3, v7
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, 1, vcc
+; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v8, v7
+; GFX8-NEXT:    v_add_u32_e32 v4, vcc, v4, v7
+; GFX8-NEXT:    v_add_u32_e32 v3, vcc, v15, v3
+; GFX8-NEXT:    v_addc_u32_e32 v4, vcc, v14, v4, vcc
 ; GFX8-NEXT:    v_mul_lo_u32 v7, s11, v3
 ; GFX8-NEXT:    v_mul_lo_u32 v8, s10, v4
-; GFX8-NEXT:    v_cndmask_b32_e32 v6, v0, v6, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v6, v0, v6, s[0:1]
 ; GFX8-NEXT:    v_mul_hi_u32 v0, s10, v3
 ; GFX8-NEXT:    v_mul_hi_u32 v3, s11, v3
 ; GFX8-NEXT:    v_add_u32_e32 v7, vcc, v7, v8
@@ -1210,16 +1210,16 @@ define amdgpu_kernel void @udivrem_v2i64(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_mov_b32_e32 v4, s11
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s15
 ; GFX8-NEXT:    v_mad_u64_u32 v[7:8], s[0:1], s15, v9, v[7:8]
-; GFX8-NEXT:    v_sub_u32_e32 v8, vcc, s10, v3
-; GFX8-NEXT:    v_subb_u32_e64 v11, s[0:1], v4, v7, vcc
-; GFX8-NEXT:    v_sub_u32_e64 v3, s[0:1], s11, v7
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s15, v11
-; GFX8-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s14, v8
-; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[0:1]
-; GFX8-NEXT:    v_cmp_eq_u32_e64 s[0:1], s15, v11
-; GFX8-NEXT:    v_subb_u32_e32 v3, vcc, v3, v0, vcc
-; GFX8-NEXT:    v_cndmask_b32_e64 v4, v4, v7, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e64 v8, s[0:1], s10, v3
+; GFX8-NEXT:    v_subb_u32_e64 v11, vcc, v4, v7, s[0:1]
+; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s11, v7
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s15, v11
+; GFX8-NEXT:    v_cndmask_b32_e64 v4, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s14, v8
+; GFX8-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GFX8-NEXT:    v_cmp_eq_u32_e32 vcc, s15, v11
+; GFX8-NEXT:    v_cndmask_b32_e32 v4, v4, v7, vcc
+; GFX8-NEXT:    v_subb_u32_e64 v3, vcc, v3, v0, s[0:1]
 ; GFX8-NEXT:    v_subrev_u32_e32 v7, vcc, s14, v8
 ; GFX8-NEXT:    v_subbrev_u32_e64 v12, s[0:1], 0, v3, vcc
 ; GFX8-NEXT:    v_add_u32_e64 v13, s[0:1], 1, v9
@@ -1807,17 +1807,17 @@ define amdgpu_kernel void @udiv_i8(ptr addrspace(1) %out0, ptr addrspace(1) %out
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    flat_store_byte v[0:1], v2
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_mov_b32_e32 v1, s3
 ; GFX8-NEXT:    flat_store_byte v[0:1], v3
 ; GFX8-NEXT:    s_endpgm
@@ -1927,29 +1927,29 @@ define amdgpu_kernel void @udivrem_v2i8(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v0, s2
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s3
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s8, v3
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
 ; GFX8-NEXT:    v_and_b32_e32 v1, 0xff, v1
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
 ; GFX8-NEXT:    v_lshlrev_b16_e32 v1, 8, v1
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_or_b32_sdwa v4, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:BYTE_0 src1_sel:DWORD
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s4
 ; GFX8-NEXT:    v_mov_b32_e32 v1, s5
@@ -2116,17 +2116,17 @@ define amdgpu_kernel void @udiv_i16(ptr addrspace(1) %out0, ptr addrspace(1) %ou
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    flat_store_short v[0:1], v2
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_mov_b32_e32 v1, s3
 ; GFX8-NEXT:    flat_store_short v[0:1], v3
 ; GFX8-NEXT:    s_endpgm
@@ -2236,28 +2236,28 @@ define amdgpu_kernel void @udivrem_v2i16(ptr addrspace(1) %out0, ptr addrspace(1
 ; GFX8-NEXT:    v_mul_lo_u32 v2, v0, s2
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
 ; GFX8-NEXT:    v_sub_u32_e32 v2, vcc, s0, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v3, s[0:1], s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v3, vcc, s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v3, vcc, 1, v0
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[0:1]
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v1, s3
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s2, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s2, v2
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s8, v3
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v1
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s3, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s3, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s3, v3
 ; GFX8-NEXT:    v_and_b32_e32 v1, 0xffff, v1
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_lshlrev_b32_e32 v1, 16, v1
 ; GFX8-NEXT:    v_or_b32_sdwa v4, v0, v1 dst_sel:DWORD dst_unused:UNUSED_PAD src0_sel:WORD_0 src1_sel:DWORD
 ; GFX8-NEXT:    v_and_b32_e32 v0, 0xffff, v3
@@ -2422,16 +2422,16 @@ define amdgpu_kernel void @udivrem_i3(ptr addrspace(1) %out0, ptr addrspace(1) %
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    v_and_b32_e32 v2, 7, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    flat_store_byte v[0:1], v2
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s2
 ; GFX8-NEXT:    v_and_b32_e32 v2, 7, v3
@@ -2540,16 +2540,16 @@ define amdgpu_kernel void @udivrem_i27(ptr addrspace(1) %out0, ptr addrspace(1)
 ; GFX8-NEXT:    v_mul_lo_u32 v3, v2, s7
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
 ; GFX8-NEXT:    v_sub_u32_e32 v3, vcc, s4, v3
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    v_add_u32_e32 v4, vcc, 1, v2
-; GFX8-NEXT:    v_cmp_le_u32_e32 vcc, s7, v3
-; GFX8-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
-; GFX8-NEXT:    v_subrev_u32_e64 v4, s[0:1], s7, v3
+; GFX8-NEXT:    v_cmp_le_u32_e64 s[0:1], s7, v3
+; GFX8-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[0:1]
+; GFX8-NEXT:    v_subrev_u32_e32 v4, vcc, s7, v3
 ; GFX8-NEXT:    v_and_b32_e32 v2, 0x7ffffff, v2
-; GFX8-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GFX8-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GFX8-NEXT:    flat_store_dword v[0:1], v2
 ; GFX8-NEXT:    v_mov_b32_e32 v0, s2
 ; GFX8-NEXT:    v_and_b32_e32 v2, 0x7ffffff, v3
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
index 097f6642cbc669b..c2b1bcfeb495922 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll
@@ -121,14 +121,14 @@ define i64 @v_urem_i64(i64 %num, i64 %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v2, v1
 ; CHECK-NEXT:    v_add_i32_e32 v1, vcc, v8, v1
 ; CHECK-NEXT:    v_add_i32_e32 v0, vcc, v1, v0
-; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v4, v7
-; CHECK-NEXT:    v_subb_u32_e64 v4, s[4:5], v5, v0, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v0, s[4:5], v5, v0
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v3
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v0, vcc, v0, v3, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v1, s[4:5], v4, v7
+; CHECK-NEXT:    v_subb_u32_e64 v4, vcc, v5, v0, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v5, v0
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v3
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v0, vcc, v0, v3, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v3
 ; CHECK-NEXT:    v_cndmask_b32_e32 v5, v6, v5, vcc
 ; CHECK-NEXT:    v_sub_i32_e32 v6, vcc, v1, v2
@@ -290,14 +290,14 @@ define amdgpu_ps i64 @s_urem_i64(i64 inreg %num, i64 inreg %den) {
 ; CHECK-NEXT:    v_mul_lo_u32 v4, s2, v4
 ; CHECK-NEXT:    v_add_i32_e32 v4, vcc, v7, v4
 ; CHECK-NEXT:    v_add_i32_e32 v1, vcc, v4, v1
-; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, s0, v6
-; CHECK-NEXT:    v_subb_u32_e64 v3, s[4:5], v3, v1, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v1, s[4:5], s1, v1
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[4:5], s2, v4
-; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_le_u32_e64 s[4:5], s3, v3
-; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v0, vcc, v1, v0, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v4, s[4:5], s0, v6
+; CHECK-NEXT:    v_subb_u32_e64 v3, vcc, v3, v1, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, s1, v1
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s2, v4
+; CHECK-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_le_u32_e32 vcc, s3, v3
+; CHECK-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v0, vcc, v1, v0, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, s3, v3
 ; CHECK-NEXT:    v_cndmask_b32_e32 v1, v6, v5, vcc
 ; CHECK-NEXT:    v_subrev_i32_e32 v3, vcc, s2, v4
@@ -392,17 +392,17 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v14, v18
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v11, v17
-; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v19, v20
-; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v19, v20
+; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
+; GISEL-NEXT:    v_add_i32_e64 v19, s[6:7], v19, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v8, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v20, v16
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v20, v16
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v8, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v20
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v13, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v21, v10, v16
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v21, v10, v19
 ; GISEL-NEXT:    v_add_i32_e64 v20, s[8:9], v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v13, v19
@@ -415,14 +415,14 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[14:15], v19, v20
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v17
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[16:17], v18, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v21, s[6:7], v21, v22
-; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v21, vcc, v21, v22
+; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v22, vcc, v22, v23
 ; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v24, 0, 1, s[16:17]
@@ -473,17 +473,17 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mul_hi_u32 v9, v14, v9
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[10:11], v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
-; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v17, v15
+; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], v17, v15
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v16, v12
 ; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[8:9], v19, v18
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v20, v19
+; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GISEL-NEXT:    v_add_i32_e64 v15, s[4:5], v15, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v15, v20
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
@@ -491,20 +491,20 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e64 v17, s[4:5], v18, v17
 ; GISEL-NEXT:    v_cndmask_b32_e64 v18, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v10, v12
-; GISEL-NEXT:    v_add_i32_e64 v11, s[4:5], v11, v17
-; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v15, v18
-; GISEL-NEXT:    v_add_i32_e64 v15, s[6:7], v16, v19
+; GISEL-NEXT:    v_add_i32_e64 v10, s[4:5], v10, v12
+; GISEL-NEXT:    v_add_i32_e64 v11, s[6:7], v11, v17
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v15, v18
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v16, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v16, v1, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v17, v0, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v10, v1, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v18, v3, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v2, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v3, v11
-; GISEL-NEXT:    v_add_i32_e64 v8, s[6:7], v8, v12
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v15
-; GISEL-NEXT:    v_addc_u32_e32 v8, vcc, v13, v8, vcc
-; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v8, v12
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v15
+; GISEL-NEXT:    v_addc_u32_e64 v8, vcc, v13, v8, s[4:5]
+; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[6:7]
 ; GISEL-NEXT:    v_mul_lo_u32 v12, v0, v8
 ; GISEL-NEXT:    v_mul_lo_u32 v13, v1, v8
 ; GISEL-NEXT:    v_mul_hi_u32 v14, v0, v8
@@ -545,50 +545,50 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v6, v11
 ; GISEL-NEXT:    v_add_i32_e32 v13, vcc, v14, v13
 ; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v12, v15
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v16
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v2, v18
-; GISEL-NEXT:    v_add_i32_e64 v8, s[6:7], v8, v13
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v12
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v0, v4
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v0, v16
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[6:7], v2, v18
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v8, v13
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v12
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v4
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v2, v6
 ; GISEL-NEXT:    v_sub_i32_e64 v12, s[10:11], v0, v4
 ; GISEL-NEXT:    v_sub_i32_e64 v13, s[12:13], v2, v6
 ; GISEL-NEXT:    v_mul_lo_u32 v8, v4, v8
 ; GISEL-NEXT:    v_mul_lo_u32 v9, v6, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, -1, s[8:9]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v12, v4
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v13, v6
-; GISEL-NEXT:    v_sub_i32_e64 v4, s[14:15], v12, v4
-; GISEL-NEXT:    v_sub_i32_e64 v6, s[16:17], v13, v6
-; GISEL-NEXT:    v_add_i32_e64 v8, s[18:19], v17, v8
-; GISEL-NEXT:    v_add_i32_e64 v9, s[18:19], v19, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, -1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v8, s[6:7], v8, v10
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v11
-; GISEL-NEXT:    v_subb_u32_e64 v10, s[6:7], v1, v8, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v1, s[6:7], v1, v8
-; GISEL-NEXT:    v_subb_u32_e64 v8, s[6:7], v3, v9, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[6:7], v3, v9
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v10, v5
-; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v1, v5, vcc
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v8, v7
-; GISEL-NEXT:    v_subb_u32_e64 v3, s[4:5], v3, v7, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v10, v5
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v12, v4
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[14:15], v13, v6
+; GISEL-NEXT:    v_sub_i32_e64 v4, s[16:17], v12, v4
+; GISEL-NEXT:    v_sub_i32_e64 v6, s[18:19], v13, v6
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v17, v8
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v19, v9
+; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[8:9]
+; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, -1, s[14:15]
+; GISEL-NEXT:    v_add_i32_e32 v8, vcc, v8, v10
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v11
+; GISEL-NEXT:    v_subb_u32_e64 v10, vcc, v1, v8, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v8
+; GISEL-NEXT:    v_subb_u32_e64 v8, vcc, v3, v9, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v9
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v5
+; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v1, v5, s[4:5]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v8, v7
+; GISEL-NEXT:    v_subb_u32_e64 v3, s[6:7], v3, v7, s[6:7]
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v10, v5
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v8, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v18, vcc, 0, v1, s[10:11]
 ; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v1, v5, s[10:11]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v19, vcc, 0, v3, s[12:13]
 ; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v7, s[12:13]
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v14, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v14, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v11, v11, v15, s[8:9]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v18, v5
-; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[14:15]
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[16:17]
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v19, v7
-; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[6:7], 0, v3, s[16:17]
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[6:7], 0, v3, s[18:19]
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v18, v5
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v19, v7
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
@@ -721,14 +721,14 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v4, v1
 ; CGP-NEXT:    v_add_i32_e32 v1, vcc, v12, v1
 ; CGP-NEXT:    v_add_i32_e32 v0, vcc, v1, v0
-; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v10, v3
-; CGP-NEXT:    v_subb_u32_e64 v2, s[4:5], v11, v0, vcc
-; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v11, v0
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v5
-; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v0, vcc, v0, v5, vcc
+; CGP-NEXT:    v_sub_i32_e64 v1, s[4:5], v10, v3
+; CGP-NEXT:    v_subb_u32_e64 v2, vcc, v11, v0, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v11, v0
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v5
+; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v0, vcc, v0, v5, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v2, v5
 ; CGP-NEXT:    v_cndmask_b32_e32 v3, v10, v3, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v10, vcc, v1, v4
@@ -885,14 +885,14 @@ define <2 x i64> @v_urem_v2i64(<2 x i64> %num, <2 x i64> %den) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v6, v3
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, v10, v3
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, v3, v2
-; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v8, v5
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v9, v2, vcc
-; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v9, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v6
-; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v7
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v2, vcc, v2, v7, vcc
+; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v8, v5
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v9, v2, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v9, v2
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v6
+; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v7
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v2, vcc, v2, v7, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v7
 ; CGP-NEXT:    v_cndmask_b32_e32 v5, v8, v5, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v8, vcc, v3, v6
@@ -1068,15 +1068,14 @@ define i64 @v_urem_i64_oddk_denom(i64 %num) {
 ; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v1, v3
 ; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v2
 ; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v4
+; CHECK-NEXT:    v_cndmask_b32_e32 v3, -1, v3, vcc
 ; CHECK-NEXT:    v_sub_i32_e32 v5, vcc, v0, v2
-; CHECK-NEXT:    v_cmp_eq_u32_e64 s[6:7], 0, v4
-; CHECK-NEXT:    v_cndmask_b32_e64 v3, -1, v3, s[6:7]
 ; CHECK-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v5, v2
-; CHECK-NEXT:    v_cndmask_b32_e64 v2, 0, -1, s[4:5]
-; CHECK-NEXT:    s_mov_b64 s[4:5], vcc
+; CHECK-NEXT:    v_subbrev_u32_e32 v1, vcc, 0, v1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v5, v2
+; CHECK-NEXT:    v_cndmask_b32_e64 v2, 0, -1, vcc
 ; CHECK-NEXT:    v_subrev_i32_e32 v6, vcc, 0x12d8fb, v5
-; CHECK-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
 ; CHECK-NEXT:    v_cndmask_b32_e64 v2, -1, v2, s[4:5]
 ; CHECK-NEXT:    v_subbrev_u32_e32 v7, vcc, 0, v1, vcc
@@ -1295,33 +1294,32 @@ define <2 x i64> @v_urem_v2i64_oddk_denom(<2 x i64> %num) {
 ; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v5
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v7
+; GISEL-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[4:5]
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v8
+; GISEL-NEXT:    v_cndmask_b32_e32 v5, -1, v5, vcc
 ; GISEL-NEXT:    v_sub_i32_e32 v9, vcc, v2, v4
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], 0, v7
-; GISEL-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[8:9]
-; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v8
-; GISEL-NEXT:    v_cndmask_b32_e64 v5, -1, v5, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[4:5], 0, v3, s[6:7]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; GISEL-NEXT:    s_mov_b64 s[4:5], vcc
-; GISEL-NEXT:    v_subrev_i32_e32 v11, vcc, 0x12d8fb, v9
-; GISEL-NEXT:    v_sub_i32_e64 v12, s[6:7], v0, v4
-; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[6:7], 0, v1, s[6:7]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v12, v4
-; GISEL-NEXT:    v_cndmask_b32_e64 v13, 0, -1, s[6:7]
-; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[4:5], 0, v3, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v10, s[4:5], v0, v4
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
+; GISEL-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v9, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v12, 0, -1, vcc
+; GISEL-NEXT:    v_subrev_i32_e32 v13, vcc, 0x12d8fb, v9
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v13, -1, v13, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v4, s[4:5], v12, v4
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, -1, v11, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e64 v4, s[4:5], v10, v4
 ; GISEL-NEXT:    v_subbrev_u32_e64 v14, s[4:5], 0, v1, s[4:5]
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
-; GISEL-NEXT:    v_cndmask_b32_e64 v10, -1, v10, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v12, -1, v12, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e32 v15, vcc, 0, v3, vcc
-; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v13
-; GISEL-NEXT:    v_cndmask_b32_e32 v4, v12, v4, vcc
-; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v10
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v11, s[4:5]
+; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v11
+; GISEL-NEXT:    v_cndmask_b32_e32 v4, v10, v4, vcc
+; GISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v12
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v13, s[4:5]
 ; GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v14, vcc
 ; GISEL-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v6
 ; GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v4, vcc
@@ -1530,33 +1528,32 @@ define <2 x i64> @v_urem_v2i64_oddk_denom(<2 x i64> %num) {
 ; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v3, v6
 ; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v4
 ; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v7
+; CGP-NEXT:    v_cndmask_b32_e32 v5, -1, v5, vcc
+; CGP-NEXT:    v_subbrev_u32_e64 v1, vcc, 0, v1, s[4:5]
+; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v8
+; CGP-NEXT:    v_cndmask_b32_e32 v6, -1, v6, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v9, vcc, v2, v4
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[8:9], 0, v7
-; CGP-NEXT:    v_cndmask_b32_e64 v5, -1, v5, s[8:9]
-; CGP-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
-; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v8
-; CGP-NEXT:    v_cndmask_b32_e64 v6, -1, v6, s[4:5]
 ; CGP-NEXT:    v_subbrev_u32_e64 v3, s[4:5], 0, v3, s[6:7]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v9, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v10, 0, -1, s[4:5]
-; CGP-NEXT:    s_mov_b64 s[4:5], vcc
-; CGP-NEXT:    v_subrev_i32_e32 v11, vcc, 0x12d8fb, v9
-; CGP-NEXT:    v_sub_i32_e64 v12, s[6:7], v0, v4
-; CGP-NEXT:    v_subbrev_u32_e64 v1, s[6:7], 0, v1, s[6:7]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[6:7], v12, v4
-; CGP-NEXT:    v_cndmask_b32_e64 v13, 0, -1, s[6:7]
-; CGP-NEXT:    v_subbrev_u32_e64 v3, s[4:5], 0, v3, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v10, s[4:5], v0, v4
+; CGP-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[4:5]
+; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v10, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
+; CGP-NEXT:    v_subbrev_u32_e32 v3, vcc, 0, v3, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v9, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v12, 0, -1, vcc
+; CGP-NEXT:    v_subrev_i32_e32 v13, vcc, 0x12d8fb, v9
 ; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v1
-; CGP-NEXT:    v_cndmask_b32_e64 v13, -1, v13, s[4:5]
-; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v12, v4
+; CGP-NEXT:    v_cndmask_b32_e64 v11, -1, v11, s[4:5]
+; CGP-NEXT:    v_sub_i32_e64 v4, s[4:5], v10, v4
 ; CGP-NEXT:    v_subbrev_u32_e64 v14, s[4:5], 0, v1, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e64 s[4:5], 0, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v10, -1, v10, s[4:5]
+; CGP-NEXT:    v_cndmask_b32_e64 v12, -1, v12, s[4:5]
 ; CGP-NEXT:    v_subbrev_u32_e32 v15, vcc, 0, v3, vcc
-; CGP-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v13
-; CGP-NEXT:    v_cndmask_b32_e32 v4, v12, v4, vcc
-; CGP-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v10
-; CGP-NEXT:    v_cndmask_b32_e64 v9, v9, v11, s[4:5]
+; CGP-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v11
+; CGP-NEXT:    v_cndmask_b32_e32 v4, v10, v4, vcc
+; CGP-NEXT:    v_cmp_ne_u32_e64 s[4:5], 0, v12
+; CGP-NEXT:    v_cndmask_b32_e64 v9, v9, v13, s[4:5]
 ; CGP-NEXT:    v_cndmask_b32_e32 v1, v1, v14, vcc
 ; CGP-NEXT:    v_cmp_ne_u32_e32 vcc, 0, v5
 ; CGP-NEXT:    v_cndmask_b32_e32 v0, v0, v4, vcc
@@ -1689,14 +1686,14 @@ define i64 @v_urem_i64_pow2_shl_denom(i64 %x, i64 %y) {
 ; CHECK-NEXT:    v_mul_lo_u32 v1, v5, v1
 ; CHECK-NEXT:    v_add_i32_e32 v1, vcc, v8, v1
 ; CHECK-NEXT:    v_add_i32_e32 v0, vcc, v1, v0
-; CHECK-NEXT:    v_sub_i32_e32 v1, vcc, v3, v7
-; CHECK-NEXT:    v_subb_u32_e64 v2, s[4:5], v4, v0, vcc
-; CHECK-NEXT:    v_sub_i32_e64 v0, s[4:5], v4, v0
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v5
-; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, s[4:5]
-; CHECK-NEXT:    v_cmp_ge_u32_e64 s[4:5], v2, v6
-; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, s[4:5]
-; CHECK-NEXT:    v_subb_u32_e32 v0, vcc, v0, v6, vcc
+; CHECK-NEXT:    v_sub_i32_e64 v1, s[4:5], v3, v7
+; CHECK-NEXT:    v_subb_u32_e64 v2, vcc, v4, v0, s[4:5]
+; CHECK-NEXT:    v_sub_i32_e32 v0, vcc, v4, v0
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v5
+; CHECK-NEXT:    v_cndmask_b32_e64 v3, 0, -1, vcc
+; CHECK-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v6
+; CHECK-NEXT:    v_cndmask_b32_e64 v4, 0, -1, vcc
+; CHECK-NEXT:    v_subb_u32_e64 v0, vcc, v0, v6, s[4:5]
 ; CHECK-NEXT:    v_cmp_eq_u32_e32 vcc, v2, v6
 ; CHECK-NEXT:    v_cndmask_b32_e32 v3, v4, v3, vcc
 ; CHECK-NEXT:    v_sub_i32_e32 v4, vcc, v1, v5
@@ -1786,17 +1783,17 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v14, v18
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v11, v17
-; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v19, v20
-; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v19, v20
+; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v18
+; GISEL-NEXT:    v_add_i32_e64 v19, s[6:7], v19, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v19, v6, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v15, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v20, v16
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v20, v16
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v6, v10
-; GISEL-NEXT:    v_add_i32_e64 v16, s[6:7], v16, v20
+; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
 ; GISEL-NEXT:    v_mul_lo_u32 v20, v13, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v21, v10, v16
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v21, v10, v19
 ; GISEL-NEXT:    v_add_i32_e64 v20, s[8:9], v20, v21
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v13, v19
@@ -1809,14 +1806,14 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[14:15], v19, v20
 ; GISEL-NEXT:    v_mul_hi_u32 v20, v11, v17
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[16:17], v18, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v20, s[6:7], v20, v21
+; GISEL-NEXT:    v_add_i32_e32 v20, vcc, v20, v21
 ; GISEL-NEXT:    v_cndmask_b32_e64 v21, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[14:15]
-; GISEL-NEXT:    v_add_i32_e64 v21, s[6:7], v21, v22
-; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, vcc
-; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v21, vcc, v21, v22
+; GISEL-NEXT:    v_cndmask_b32_e64 v22, 0, 1, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_add_i32_e32 v22, vcc, v22, v23
 ; GISEL-NEXT:    v_cndmask_b32_e64 v23, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v24, 0, 1, s[16:17]
@@ -1867,17 +1864,17 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_mul_hi_u32 v9, v14, v9
 ; GISEL-NEXT:    v_add_i32_e64 v19, s[10:11], v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
-; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v17, v15
+; GISEL-NEXT:    v_add_i32_e64 v15, s[12:13], v17, v15
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, 1, s[6:7]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v16, v12
 ; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, 1, s[10:11]
 ; GISEL-NEXT:    v_add_i32_e64 v18, s[8:9], v19, v18
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e64 v19, s[4:5], v20, v19
+; GISEL-NEXT:    v_add_i32_e32 v19, vcc, v20, v19
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[6:7]
-; GISEL-NEXT:    v_add_i32_e64 v15, s[4:5], v15, v20
-; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, vcc
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v15, v20
+; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[12:13]
 ; GISEL-NEXT:    v_add_i32_e32 v17, vcc, v17, v20
 ; GISEL-NEXT:    v_cndmask_b32_e64 v20, 0, 1, s[8:9]
 ; GISEL-NEXT:    v_add_i32_e32 v16, vcc, v16, v20
@@ -1885,20 +1882,20 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_add_i32_e64 v17, s[4:5], v18, v17
 ; GISEL-NEXT:    v_cndmask_b32_e64 v18, 0, 1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v19, 0, 1, s[4:5]
-; GISEL-NEXT:    v_add_i32_e32 v10, vcc, v10, v12
-; GISEL-NEXT:    v_add_i32_e64 v11, s[4:5], v11, v17
-; GISEL-NEXT:    v_add_i32_e64 v12, s[6:7], v15, v18
-; GISEL-NEXT:    v_add_i32_e64 v15, s[6:7], v16, v19
+; GISEL-NEXT:    v_add_i32_e64 v10, s[4:5], v10, v12
+; GISEL-NEXT:    v_add_i32_e64 v11, s[6:7], v11, v17
+; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v15, v18
+; GISEL-NEXT:    v_add_i32_e32 v15, vcc, v16, v19
 ; GISEL-NEXT:    v_mul_lo_u32 v16, v1, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v17, v0, v10
 ; GISEL-NEXT:    v_mul_hi_u32 v10, v1, v10
 ; GISEL-NEXT:    v_mul_lo_u32 v18, v3, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v19, v2, v11
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v3, v11
-; GISEL-NEXT:    v_add_i32_e64 v6, s[6:7], v6, v12
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v15
-; GISEL-NEXT:    v_addc_u32_e32 v6, vcc, v13, v6, vcc
-; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[4:5]
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v12
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v15
+; GISEL-NEXT:    v_addc_u32_e64 v6, vcc, v13, v6, s[4:5]
+; GISEL-NEXT:    v_addc_u32_e64 v9, vcc, v14, v9, s[6:7]
 ; GISEL-NEXT:    v_mul_lo_u32 v12, v0, v6
 ; GISEL-NEXT:    v_mul_lo_u32 v13, v1, v6
 ; GISEL-NEXT:    v_mul_hi_u32 v14, v0, v6
@@ -1939,50 +1936,50 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; GISEL-NEXT:    v_mul_hi_u32 v11, v4, v11
 ; GISEL-NEXT:    v_add_i32_e32 v13, vcc, v14, v13
 ; GISEL-NEXT:    v_add_i32_e32 v12, vcc, v12, v15
-; GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v16
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v2, v18
-; GISEL-NEXT:    v_add_i32_e64 v6, s[6:7], v6, v13
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v12
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v0, v7
+; GISEL-NEXT:    v_sub_i32_e64 v0, s[4:5], v0, v16
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[6:7], v2, v18
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v13
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v12
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v7
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v2, v4
 ; GISEL-NEXT:    v_sub_i32_e64 v12, s[10:11], v0, v7
 ; GISEL-NEXT:    v_sub_i32_e64 v13, s[12:13], v2, v4
 ; GISEL-NEXT:    v_mul_lo_u32 v6, v7, v6
 ; GISEL-NEXT:    v_mul_lo_u32 v9, v4, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, s[6:7]
+; GISEL-NEXT:    v_cndmask_b32_e64 v14, 0, -1, vcc
 ; GISEL-NEXT:    v_cndmask_b32_e64 v15, 0, -1, s[8:9]
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v12, v7
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v13, v4
-; GISEL-NEXT:    v_sub_i32_e64 v7, s[14:15], v12, v7
-; GISEL-NEXT:    v_sub_i32_e64 v4, s[16:17], v13, v4
-; GISEL-NEXT:    v_add_i32_e64 v6, s[18:19], v17, v6
-; GISEL-NEXT:    v_add_i32_e64 v9, s[18:19], v19, v9
-; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, -1, s[8:9]
-; GISEL-NEXT:    v_add_i32_e64 v6, s[6:7], v6, v10
-; GISEL-NEXT:    v_add_i32_e64 v9, s[6:7], v9, v11
-; GISEL-NEXT:    v_subb_u32_e64 v10, s[6:7], v1, v6, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v1, s[6:7], v1, v6
-; GISEL-NEXT:    v_subb_u32_e64 v6, s[6:7], v3, v9, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v3, s[6:7], v3, v9
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v10, v8
-; GISEL-NEXT:    v_subb_u32_e32 v1, vcc, v1, v8, vcc
-; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v6, v5
-; GISEL-NEXT:    v_subb_u32_e64 v3, s[4:5], v3, v5, s[4:5]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[4:5], v10, v8
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[8:9], v12, v7
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[14:15], v13, v4
+; GISEL-NEXT:    v_sub_i32_e64 v7, s[16:17], v12, v7
+; GISEL-NEXT:    v_sub_i32_e64 v4, s[18:19], v13, v4
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v17, v6
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v19, v9
+; GISEL-NEXT:    v_cndmask_b32_e64 v16, 0, -1, s[8:9]
+; GISEL-NEXT:    v_cndmask_b32_e64 v17, 0, -1, s[14:15]
+; GISEL-NEXT:    v_add_i32_e32 v6, vcc, v6, v10
+; GISEL-NEXT:    v_add_i32_e32 v9, vcc, v9, v11
+; GISEL-NEXT:    v_subb_u32_e64 v10, vcc, v1, v6, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v1, vcc, v1, v6
+; GISEL-NEXT:    v_subb_u32_e64 v6, vcc, v3, v9, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v9
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v8
+; GISEL-NEXT:    v_subb_u32_e64 v1, s[4:5], v1, v8, s[4:5]
+; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v6, v5
+; GISEL-NEXT:    v_subb_u32_e64 v3, s[6:7], v3, v5, s[6:7]
+; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v10, v8
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v6, v5
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cndmask_b32_e64 v11, 0, -1, s[4:5]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v18, vcc, 0, v1, s[10:11]
 ; GISEL-NEXT:    v_subb_u32_e64 v1, vcc, v1, v8, s[10:11]
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v14, s[4:5]
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, v9, v14, s[6:7]
 ; GISEL-NEXT:    v_subbrev_u32_e64 v14, vcc, 0, v3, s[12:13]
 ; GISEL-NEXT:    v_subb_u32_e64 v3, vcc, v3, v5, s[12:13]
 ; GISEL-NEXT:    v_cndmask_b32_e64 v11, v11, v15, s[8:9]
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v18, v8
-; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[14:15]
+; GISEL-NEXT:    v_subbrev_u32_e64 v1, s[4:5], 0, v1, s[16:17]
 ; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v14, v5
-; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[6:7], 0, v3, s[16:17]
+; GISEL-NEXT:    v_subbrev_u32_e64 v3, s[6:7], 0, v3, s[18:19]
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], v18, v8
 ; GISEL-NEXT:    v_cmp_eq_u32_e64 s[8:9], v14, v5
 ; GISEL-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
@@ -2117,14 +2114,14 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v1, v2, v1
 ; CGP-NEXT:    v_add_i32_e32 v1, vcc, v11, v1
 ; CGP-NEXT:    v_add_i32_e32 v0, vcc, v1, v0
-; CGP-NEXT:    v_sub_i32_e32 v1, vcc, v8, v10
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v9, v0, vcc
-; CGP-NEXT:    v_sub_i32_e64 v0, s[4:5], v9, v0
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v1, v2
-; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v3
-; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v0, vcc, v0, v3, vcc
+; CGP-NEXT:    v_sub_i32_e64 v1, s[4:5], v8, v10
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v9, v0, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v0, vcc, v9, v0
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v1, v2
+; CGP-NEXT:    v_cndmask_b32_e64 v8, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v3
+; CGP-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v0, vcc, v0, v3, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v3
 ; CGP-NEXT:    v_cndmask_b32_e32 v8, v9, v8, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v9, vcc, v1, v2
@@ -2283,14 +2280,14 @@ define <2 x i64> @v_urem_v2i64_pow2_shl_denom(<2 x i64> %x, <2 x i64> %y) {
 ; CGP-NEXT:    v_mul_lo_u32 v3, v9, v3
 ; CGP-NEXT:    v_add_i32_e32 v3, vcc, v8, v3
 ; CGP-NEXT:    v_add_i32_e32 v2, vcc, v3, v2
-; CGP-NEXT:    v_sub_i32_e32 v3, vcc, v5, v6
-; CGP-NEXT:    v_subb_u32_e64 v4, s[4:5], v7, v2, vcc
-; CGP-NEXT:    v_sub_i32_e64 v2, s[4:5], v7, v2
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v9
-; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, s[4:5]
-; CGP-NEXT:    v_cmp_ge_u32_e64 s[4:5], v4, v10
-; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, s[4:5]
-; CGP-NEXT:    v_subb_u32_e32 v2, vcc, v2, v10, vcc
+; CGP-NEXT:    v_sub_i32_e64 v3, s[4:5], v5, v6
+; CGP-NEXT:    v_subb_u32_e64 v4, vcc, v7, v2, s[4:5]
+; CGP-NEXT:    v_sub_i32_e32 v2, vcc, v7, v2
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v9
+; CGP-NEXT:    v_cndmask_b32_e64 v5, 0, -1, vcc
+; CGP-NEXT:    v_cmp_ge_u32_e32 vcc, v4, v10
+; CGP-NEXT:    v_cndmask_b32_e64 v6, 0, -1, vcc
+; CGP-NEXT:    v_subb_u32_e64 v2, vcc, v2, v10, s[4:5]
 ; CGP-NEXT:    v_cmp_eq_u32_e32 vcc, v4, v10
 ; CGP-NEXT:    v_cndmask_b32_e32 v5, v6, v5, vcc
 ; CGP-NEXT:    v_sub_i32_e32 v6, vcc, v3, v9
@@ -2593,22 +2590,22 @@ define <2 x i64> @v_urem_v2i64_24bit(<2 x i64> %num, <2 x i64> %den) {
 ; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v13, v5
 ; GISEL-NEXT:    v_add_i32_e32 v4, vcc, v4, v6
 ; GISEL-NEXT:    v_add_i32_e32 v5, vcc, v5, v7
-; GISEL-NEXT:    v_sub_i32_e32 v3, vcc, v3, v9
-; GISEL-NEXT:    v_subb_u32_e64 v6, s[4:5], 0, v4, vcc
-; GISEL-NEXT:    v_sub_i32_e64 v4, s[4:5], 0, v4
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v3, v1
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v2, s[4:5], v2, v11
-; GISEL-NEXT:    v_subb_u32_e64 v8, s[6:7], 0, v5, s[4:5]
-; GISEL-NEXT:    v_sub_i32_e64 v5, s[6:7], 0, v5
-; GISEL-NEXT:    v_cmp_ge_u32_e64 s[6:7], v2, v0
-; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, s[6:7]
-; GISEL-NEXT:    v_cmp_eq_u32_e64 s[6:7], 0, v6
-; GISEL-NEXT:    v_cndmask_b32_e64 v7, -1, v7, s[6:7]
-; GISEL-NEXT:    v_subbrev_u32_e32 v4, vcc, 0, v4, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v3, s[4:5], v3, v9
+; GISEL-NEXT:    v_subb_u32_e64 v6, vcc, 0, v4, s[4:5]
+; GISEL-NEXT:    v_sub_i32_e32 v4, vcc, 0, v4
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v3, v1
+; GISEL-NEXT:    v_cndmask_b32_e64 v7, 0, -1, vcc
+; GISEL-NEXT:    v_sub_i32_e64 v2, s[6:7], v2, v11
+; GISEL-NEXT:    v_subb_u32_e64 v8, vcc, 0, v5, s[6:7]
+; GISEL-NEXT:    v_sub_i32_e32 v5, vcc, 0, v5
+; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v2, v0
+; GISEL-NEXT:    v_cndmask_b32_e64 v9, 0, -1, vcc
+; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v6
+; GISEL-NEXT:    v_cndmask_b32_e32 v7, -1, v7, vcc
+; GISEL-NEXT:    v_subbrev_u32_e64 v4, vcc, 0, v4, s[4:5]
 ; GISEL-NEXT:    v_cmp_eq_u32_e32 vcc, 0, v8
 ; GISEL-NEXT:    v_cndmask_b32_e32 v9, -1, v9, vcc
-; GISEL-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[4:5]
+; GISEL-NEXT:    v_subbrev_u32_e64 v5, vcc, 0, v5, s[6:7]
 ; GISEL-NEXT:    v_sub_i32_e32 v10, vcc, v3, v1
 ; GISEL-NEXT:    v_subbrev_u32_e32 v4, vcc, 0, v4, vcc
 ; GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v10, v1
diff --git a/llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll b/llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll
index aa1d44c31606b8f..53c870e533f0ff4 100644
--- a/llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll
+++ b/llvm/test/CodeGen/AMDGPU/ds-combine-large-stride.ll
@@ -95,7 +95,7 @@ bb:
 ; GCN:     v_mov_b32_e32 [[BASE:v[0-9]+]], [[ARG]]
 
 ; VI-DAG: v_add_u32_e32 [[B1:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
-; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
+; VI-DAG: v_add_u32_e32 [[B2:v[0-9]+]], vcc, 0x400, [[BASE]]
 ; VI-DAG: v_add_u32_e32 [[B3:v[0-9]+]], vcc, {{s[0-9]+}}, [[BASE]]
 
 ; GFX9-DAG: v_add_u32_e32 [[B1:v[0-9]+]], 0x800, [[BASE]]
diff --git a/llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll b/llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll
index 794b10eea58b9bb..1d6826da395cecb 100644
--- a/llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll
+++ b/llvm/test/CodeGen/AMDGPU/fdiv_flags.f32.ll
@@ -342,19 +342,19 @@ define float @v_fdiv_recip_sqrt_f32(float %x) {
 ; CODEGEN-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -410,19 +410,19 @@ define float @v_fdiv_recip_sqrt_f32(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -607,19 +607,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp(float %x) {
 ; CODEGEN-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -647,19 +647,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -831,19 +831,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp_fdiv_only(float %x) {
 ; CODEGEN-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -871,19 +871,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp_fdiv_only(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1016,19 +1016,19 @@ define float @v_fdiv_recip_sqrt_f32_afn_fdiv_only(float %x) {
 ; CODEGEN-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1064,19 +1064,19 @@ define float @v_fdiv_recip_sqrt_f32_afn_fdiv_only(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1191,19 +1191,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp_afn_fdiv_only(float %x) {
 ; CODEGEN-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; CODEGEN-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; CODEGEN-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; CODEGEN-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; CODEGEN-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1221,19 +1221,19 @@ define float @v_fdiv_recip_sqrt_f32_arcp_afn_fdiv_only(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1674,19 +1674,19 @@ define float @v_recip_sqrt_f32_ulp25_contract(float %x) {
 ; IR-IEEE-GISEL-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_sqrt_f32_e32 v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; IR-IEEE-GISEL-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; IR-IEEE-GISEL-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; IR-IEEE-GISEL-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; IR-IEEE-GISEL-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; IR-IEEE-GISEL-NEXT:    v_mov_b32_e32 v2, 0x260
 ; IR-IEEE-GISEL-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; IR-IEEE-GISEL-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll b/llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll
index 13e588dffaf5c18..22dc8415058e54c 100644
--- a/llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll
+++ b/llvm/test/CodeGen/AMDGPU/fsqrt.f32.ll
@@ -34,19 +34,19 @@ define float @v_sqrt_f32(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -129,19 +129,19 @@ define float @v_sqrt_f32_fneg(float %x) {
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e64 v2, -v0, v2
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 vcc, v1, -v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, -v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, -v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, -v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -227,19 +227,19 @@ define float @v_sqrt_f32_fabs(float %x) {
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e64 v2, |v0|, v2
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 vcc, v1, |v0|
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, |v0|, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, |v0|
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, |v0|, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -326,19 +326,19 @@ define float @v_sqrt_f32_fneg_fabs(float %x) {
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e64 v2, -|v0|, v2
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 vcc, v1, -|v0|
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, -|v0|, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, -|v0|
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, -|v0|, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -424,19 +424,19 @@ define float @v_sqrt_f32_ninf(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -518,19 +518,19 @@ define float @v_sqrt_f32_no_infs_attribute(float %x) #5 {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -612,19 +612,19 @@ define float @v_sqrt_f32_nnan(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -709,19 +709,19 @@ define amdgpu_ps i32 @s_sqrt_f32(float inreg %x) {
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, s0
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, s0, v2
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s0, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], s0, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[0:1], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[0:1], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -813,19 +813,19 @@ define amdgpu_ps i32 @s_sqrt_f32_ninf(float inreg %x) {
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, s0
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, s0, v2
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s0, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], s0, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[0:1], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[0:1], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -938,19 +938,19 @@ define float @v_sqrt_f32_nsz(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1032,19 +1032,19 @@ define float @v_sqrt_f32_nnan_ninf(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1126,19 +1126,19 @@ define float @v_sqrt_f32_nnan_ninf_nsz(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1349,37 +1349,37 @@ define <2 x float> @v_sqrt_v2f32(<2 x float> %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    s_mov_b32 s4, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, s4, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v2, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v3, 0xf800000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], -1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, -1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v2, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v6, s[4:5], 1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v6, vcc, 1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v6, v2, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x4f800000, v1
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v3, v1
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v5, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v3, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v5, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[4:5], v0, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v0, s[4:5]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v3
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v4
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v0, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v2, v3, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v6, s[4:5], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v6, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v6, v3, v1
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x37800000, v2
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[4:5]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v1, v4
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v2, v1, vcc
 ; GISEL-IEEE-NEXT:    s_setpc_b64 s[30:31]
@@ -1523,53 +1523,53 @@ define <3 x float> @v_sqrt_v3f32(<3 x float> %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    s_mov_b32 s4, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, s4, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v3, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0xf800000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v5, s[4:5], -1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v5, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v5, v3, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v7, s[4:5], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v7, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v8, -v7, v3, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v8
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v7, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v6
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v5, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v8
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v7, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v6, 0x4f800000, v1
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v4, v1
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v6, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v4, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v6, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v6, v1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v5, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[4:5], v0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v3, v0, s[4:5]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v3, s[4:5], -1, v6
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v3, v0, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v3, vcc, -1, v6
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v3, v6, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v8, s[4:5], 1, v6
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v8, vcc, 1, v6
 ; GISEL-IEEE-NEXT:    v_fma_f32 v9, -v8, v6, v1
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v6, v3, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v9
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v6, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v9
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v8, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v6, 0x37800000, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v6, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v6, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v6, 0x4f800000, v2
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v4, v2
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v4, v2
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v4, v2
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[4:5], v1, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v3, v1, s[4:5]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v3, s[4:5], -1, v4
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v1, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v3, v1, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v3, vcc, -1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v3, v4, v2
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v7, s[4:5], 1, v4
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v7, vcc, 1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v8, -v7, v4, v2
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v4, v3, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v8
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v7, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v6
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v4, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v8
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v7, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[4:5]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v2, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
 ; GISEL-IEEE-NEXT:    s_setpc_b64 s[30:31]
@@ -1712,19 +1712,19 @@ define float @v_sqrt_f32_ulp05(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -1807,19 +1807,19 @@ define float @v_sqrt_f32_ulp1(float %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -2023,37 +2023,37 @@ define <2 x float> @v_sqrt_v2f32_ulp1(<2 x float> %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    s_mov_b32 s4, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, s4, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v2, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v3, 0xf800000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], -1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, -1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v2, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v6, s[4:5], 1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v6, vcc, 1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v6, v2, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v4, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v4, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x4f800000, v1
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v3, v1
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v5, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v3, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v5, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[4:5], v0, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v0, s[4:5]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v3
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v4
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v0, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v2, v3, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v6, s[4:5], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v6, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v6, v3, v1
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x37800000, v2
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[4:5]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v1, v4
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v2, v1, vcc
 ; GISEL-IEEE-NEXT:    s_setpc_b64 s[30:31]
@@ -2165,38 +2165,38 @@ define <2 x float> @v_sqrt_v2f32_ulp1_fabs(<2 x float> %x) {
 ; GISEL-IEEE-NEXT:    s_mov_b32 s4, 0xf800000
 ; GISEL-IEEE-NEXT:    s_mov_b32 s5, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e64 v2, |v0|, s5
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 vcc, s4, |v0|
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, |v0|, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], s4, |v0|
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, |v0|, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v2, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v3, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0x4f800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e64 v4, |v1|, v4
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v5, s[4:5], -1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v5, vcc, -1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v5, v2, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v7, s[4:5], 1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v7, vcc, 1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v8, -v7, v2, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v8
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v7, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v6
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 vcc, v3, |v1|
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, |v1|, v4, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v8
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v7, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v3, |v1|
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, |v1|, v4, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v5, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[4:5], v0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v0, s[4:5]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v3
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v0, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v4, -v2, v3, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v6, s[4:5], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v6, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v6, v3, v1
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v6, s[4:5]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v4
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v6, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x37800000, v2
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[4:5]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v1, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v2, v1, vcc
 ; GISEL-IEEE-NEXT:    s_setpc_b64 s[30:31]
@@ -3120,19 +3120,19 @@ define float @v_sqrt_f32_ninf_known_never_zero(float nofpclass(zero) %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -3214,19 +3214,19 @@ define float @v_sqrt_f32_known_never_zero(float nofpclass(zero) %x) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -3308,19 +3308,19 @@ define float @v_sqrt_f32_known_never_zero_never_inf(float nofpclass(zero inf) %x
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -3402,19 +3402,19 @@ define float @v_sqrt_f32_known_never_zero_never_ninf(float nofpclass(zero ninf)
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -3496,19 +3496,19 @@ define float @v_sqrt_f32_known_never_zero_never_pinf(float nofpclass(zero pinf)
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[4:5], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[4:5], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
@@ -3779,19 +3779,19 @@ define float @v_elim_redun_check_ult_sqrt(float %in) {
 ; GISEL-IEEE-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x4f800000, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, v1, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v0, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[4:5], v1, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v0, v2, s[4:5]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v2, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v3, s[4:5], -1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v3, vcc, -1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v4, -v3, v2, v1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v5, s[4:5], 1, v2
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v5, vcc, 1, v2
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v5, v2, v1
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[4:5], 0, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[4:5]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[4:5], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v5, s[4:5]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v4
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v6
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v2, v5, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v3, 0x37800000, v2
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v2, v3, s[4:5]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v3, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v1, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v2, v1, vcc
@@ -3946,36 +3946,36 @@ define amdgpu_kernel void @elim_redun_check_neg0(ptr addrspace(1) %out, float %i
 ;
 ; GISEL-IEEE-LABEL: elim_redun_check_neg0:
 ; GISEL-IEEE:       ; %bb.0: ; %entry
-; GISEL-IEEE-NEXT:    s_load_dword s2, s[0:1], 0xb
-; GISEL-IEEE-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x9
+; GISEL-IEEE-NEXT:    s_load_dword s4, s[0:1], 0xb
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v0, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0x4f800000
-; GISEL-IEEE-NEXT:    s_mov_b32 s6, -1
+; GISEL-IEEE-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x9
 ; GISEL-IEEE-NEXT:    s_waitcnt lgkmcnt(0)
-; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s2
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s2, v1
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s2, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s4
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s4, v1
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[2:3], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v1, s[2:3]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    s_mov_b32 s7, 0xf000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[0:1], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[0:1], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[2:3]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
 ; GISEL-IEEE-NEXT:    v_bfrev_b32_e32 v1, 1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x7fc00000
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s2, v1
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s4, v1
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[4:7], 0
+; GISEL-IEEE-NEXT:    s_mov_b32 s2, -1
+; GISEL-IEEE-NEXT:    s_mov_b32 s3, 0xf000
+; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[0:3], 0
 ; GISEL-IEEE-NEXT:    s_endpgm
 ;
 ; SDAG-DAZ-LABEL: elim_redun_check_neg0:
@@ -4080,35 +4080,35 @@ define amdgpu_kernel void @elim_redun_check_pos0(ptr addrspace(1) %out, float %i
 ;
 ; GISEL-IEEE-LABEL: elim_redun_check_pos0:
 ; GISEL-IEEE:       ; %bb.0: ; %entry
-; GISEL-IEEE-NEXT:    s_load_dword s2, s[0:1], 0xb
-; GISEL-IEEE-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x9
+; GISEL-IEEE-NEXT:    s_load_dword s4, s[0:1], 0xb
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v0, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0x4f800000
-; GISEL-IEEE-NEXT:    s_mov_b32 s6, -1
+; GISEL-IEEE-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x9
 ; GISEL-IEEE-NEXT:    s_waitcnt lgkmcnt(0)
-; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s2
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s2, v1
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s2, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s4
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s4, v1
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[2:3], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v1, s[2:3]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    s_mov_b32 s7, 0xf000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[0:1], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[0:1], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[2:3]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0x7fc00000
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 vcc, s2, 0
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 vcc, s4, 0
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v1, vcc
-; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[4:7], 0
+; GISEL-IEEE-NEXT:    s_mov_b32 s2, -1
+; GISEL-IEEE-NEXT:    s_mov_b32 s3, 0xf000
+; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[0:3], 0
 ; GISEL-IEEE-NEXT:    s_endpgm
 ;
 ; SDAG-DAZ-LABEL: elim_redun_check_pos0:
@@ -4212,36 +4212,36 @@ define amdgpu_kernel void @elim_redun_check_ult(ptr addrspace(1) %out, float %in
 ;
 ; GISEL-IEEE-LABEL: elim_redun_check_ult:
 ; GISEL-IEEE:       ; %bb.0: ; %entry
-; GISEL-IEEE-NEXT:    s_load_dword s2, s[0:1], 0xb
-; GISEL-IEEE-NEXT:    s_load_dwordx2 s[4:5], s[0:1], 0x9
+; GISEL-IEEE-NEXT:    s_load_dword s4, s[0:1], 0xb
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v0, 0xf800000
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, 0x4f800000
-; GISEL-IEEE-NEXT:    s_mov_b32 s6, -1
+; GISEL-IEEE-NEXT:    s_load_dwordx2 s[0:1], s[0:1], 0x9
 ; GISEL-IEEE-NEXT:    s_waitcnt lgkmcnt(0)
-; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s2
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s2, v1
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s2, v0
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v2, v1, vcc
+; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, s4
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v1, s4, v1
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[2:3], s4, v0
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v2, v1, s[2:3]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v1, v0
-; GISEL-IEEE-NEXT:    s_mov_b32 s7, 0xf000
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v2, s[0:1], -1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v2, vcc, -1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v3, -v2, v1, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v4, s[0:1], 1, v1
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v4, vcc, 1, v1
 ; GISEL-IEEE-NEXT:    v_fma_f32 v5, -v4, v1, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v4, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v3
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v1, v1, v4, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, 0x37800000, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v1, v1, v2, s[2:3]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x260
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v2
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v1, v0, vcc
 ; GISEL-IEEE-NEXT:    v_bfrev_b32_e32 v1, 1
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v2, 0x7fc00000
-; GISEL-IEEE-NEXT:    v_cmp_nge_f32_e32 vcc, s2, v1
+; GISEL-IEEE-NEXT:    v_cmp_nge_f32_e32 vcc, s4, v1
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v0, v2, vcc
-; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[4:7], 0
+; GISEL-IEEE-NEXT:    s_mov_b32 s2, -1
+; GISEL-IEEE-NEXT:    s_mov_b32 s3, 0xf000
+; GISEL-IEEE-NEXT:    buffer_store_dword v0, off, s[0:3], 0
 ; GISEL-IEEE-NEXT:    s_endpgm
 ;
 ; SDAG-DAZ-LABEL: elim_redun_check_ult:
@@ -4372,38 +4372,38 @@ define amdgpu_kernel void @elim_redun_check_v2(ptr addrspace(1) %out, <2 x float
 ; GISEL-IEEE-NEXT:    s_waitcnt lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, s6
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, s6, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, s0, v1
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[0:1], s0, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v2
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v0, s7, v0
 ; GISEL-IEEE-NEXT:    s_mov_b32 s6, -1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v5, s[0:1], -1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v5, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v5, v3, v2
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v7, s[0:1], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v7, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v8, -v7, v3, v2
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v8
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v7, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v6
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v5, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v8
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v7, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[0:1]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v6, s7
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s7, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v6, v0, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], s7, v4
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v6, v0, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v4, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v5, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[0:1], v2, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v3, s[0:1], -1, v4
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v2, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v3, vcc, -1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v3, v4, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v8, s[0:1], 1, v4
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v8, vcc, 1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v9, -v8, v4, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v4, v3, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v9
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[0:1]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v4, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v9
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v8, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v0, vcc
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0x7fc00000
@@ -4578,38 +4578,38 @@ define amdgpu_kernel void @elim_redun_check_v2_ult(ptr addrspace(1) %out, <2 x f
 ; GISEL-IEEE-NEXT:    s_waitcnt lgkmcnt(0)
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v1, s6
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v2, s6, v0
-; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e32 vcc, s0, v1
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v1, v2, vcc
+; GISEL-IEEE-NEXT:    v_cmp_gt_f32_e64 s[0:1], s0, v1
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v1, v2, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v3, v2
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v0, s7, v0
 ; GISEL-IEEE-NEXT:    s_mov_b32 s6, -1
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v5, s[0:1], -1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v5, vcc, -1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v6, -v5, v3, v2
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v7, s[0:1], 1, v3
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v7, vcc, 1, v3
 ; GISEL-IEEE-NEXT:    v_fma_f32 v8, -v7, v3, v2
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v6
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v8
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v7, s[0:1]
-; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v6
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v5, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v8
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v7, vcc
+; GISEL-IEEE-NEXT:    v_mul_f32_e32 v5, 0x37800000, v3
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v5, s[0:1]
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v6, s7
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, s7, v4
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v0, v6, v0, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], s7, v4
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v0, v6, v0, s[0:1]
 ; GISEL-IEEE-NEXT:    v_sqrt_f32_e32 v4, v0
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v5, 0x260
-; GISEL-IEEE-NEXT:    v_cmp_class_f32_e64 s[0:1], v2, v5
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v2, v3, v2, s[0:1]
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v3, s[0:1], -1, v4
+; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v2, v5
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v2, v3, v2, vcc
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v3, vcc, -1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v7, -v3, v4, v0
-; GISEL-IEEE-NEXT:    v_add_i32_e64 v8, s[0:1], 1, v4
+; GISEL-IEEE-NEXT:    v_add_i32_e32 v8, vcc, 1, v4
 ; GISEL-IEEE-NEXT:    v_fma_f32 v9, -v8, v4, v0
-; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e64 s[0:1], 0, v7
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v4, v3, s[0:1]
-; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e64 s[0:1], 0, v9
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v8, s[0:1]
+; GISEL-IEEE-NEXT:    v_cmp_ge_f32_e32 vcc, 0, v7
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v4, v3, vcc
+; GISEL-IEEE-NEXT:    v_cmp_lt_f32_e32 vcc, 0, v9
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v8, vcc
 ; GISEL-IEEE-NEXT:    v_mul_f32_e32 v4, 0x37800000, v3
-; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v4, vcc
+; GISEL-IEEE-NEXT:    v_cndmask_b32_e64 v3, v3, v4, s[0:1]
 ; GISEL-IEEE-NEXT:    v_cmp_class_f32_e32 vcc, v0, v5
 ; GISEL-IEEE-NEXT:    v_cndmask_b32_e32 v3, v3, v0, vcc
 ; GISEL-IEEE-NEXT:    v_mov_b32_e32 v4, 0x7fc00000
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll b/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
index 671ead6127308dd..f58abc4f591b0ee 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
+++ b/llvm/test/CodeGen/AMDGPU/llvm.is.fpclass.f16.ll
@@ -578,12 +578,12 @@ define i1 @negsubnormal_f16(half %x) nounwind {
 ; GFX7GLISEL-NEXT:    v_and_b32_e32 v1, 0x7fff, v0
 ; GFX7GLISEL-NEXT:    v_and_b32_e32 v0, 0xffff, v0
 ; GFX7GLISEL-NEXT:    v_and_b32_e32 v2, 0xffff, v1
-; GFX7GLISEL-NEXT:    v_cmp_ne_u32_e32 vcc, v0, v2
-; GFX7GLISEL-NEXT:    v_subrev_i32_e64 v0, s[4:5], 1, v1
+; GFX7GLISEL-NEXT:    v_cmp_ne_u32_e64 s[4:5], v0, v2
+; GFX7GLISEL-NEXT:    v_subrev_i32_e32 v0, vcc, 1, v1
 ; GFX7GLISEL-NEXT:    v_and_b32_e32 v0, 0xffff, v0
 ; GFX7GLISEL-NEXT:    v_mov_b32_e32 v1, 0x3ff
-; GFX7GLISEL-NEXT:    v_cmp_lt_u32_e64 s[4:5], v0, v1
-; GFX7GLISEL-NEXT:    s_and_b64 s[4:5], s[4:5], vcc
+; GFX7GLISEL-NEXT:    v_cmp_lt_u32_e32 vcc, v0, v1
+; GFX7GLISEL-NEXT:    s_and_b64 s[4:5], vcc, s[4:5]
 ; GFX7GLISEL-NEXT:    v_cndmask_b32_e64 v0, 0, 1, s[4:5]
 ; GFX7GLISEL-NEXT:    s_setpc_b64 s[30:31]
 ;
diff --git a/llvm/test/CodeGen/AMDGPU/med3-knownbits.ll b/llvm/test/CodeGen/AMDGPU/med3-knownbits.ll
index e64bc0dc374da67..e4c15a43eb60483 100644
--- a/llvm/test/CodeGen/AMDGPU/med3-knownbits.ll
+++ b/llvm/test/CodeGen/AMDGPU/med3-knownbits.ll
@@ -100,10 +100,10 @@ define i32 @v_known_signbits_smed3(i16 %a, i16 %b) {
 ; SI-GISEL-NEXT:    v_mul_lo_u32 v5, v3, v1
 ; SI-GISEL-NEXT:    v_add_i32_e32 v6, vcc, 1, v3
 ; SI-GISEL-NEXT:    v_sub_i32_e32 v0, vcc, v0, v5
-; SI-GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
-; SI-GISEL-NEXT:    v_cndmask_b32_e32 v3, v3, v6, vcc
-; SI-GISEL-NEXT:    v_sub_i32_e64 v5, s[4:5], v0, v1
-; SI-GISEL-NEXT:    v_cndmask_b32_e32 v0, v0, v5, vcc
+; SI-GISEL-NEXT:    v_cmp_ge_u32_e64 s[4:5], v0, v1
+; SI-GISEL-NEXT:    v_cndmask_b32_e64 v3, v3, v6, s[4:5]
+; SI-GISEL-NEXT:    v_sub_i32_e32 v5, vcc, v0, v1
+; SI-GISEL-NEXT:    v_cndmask_b32_e64 v0, v0, v5, s[4:5]
 ; SI-GISEL-NEXT:    v_add_i32_e32 v5, vcc, 1, v3
 ; SI-GISEL-NEXT:    v_cmp_ge_u32_e32 vcc, v0, v1
 ; SI-GISEL-NEXT:    v_cndmask_b32_e32 v0, v3, v5, vcc
diff --git a/llvm/test/CodeGen/AMDGPU/shrink-dead-sdst.mir b/llvm/test/CodeGen/AMDGPU/shrink-dead-sdst.mir
new file mode 100644
index 000000000000000..ba61a3ac64f640f
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/shrink-dead-sdst.mir
@@ -0,0 +1,12 @@
+# RUN: llc -mtriple=amdgcn-amd-amdpal -run-pass=si-shrink-instructions -verify-machineinstrs -o - %s | FileCheck -check-prefix=GCN %s
+
+
+# GCN-LABEL: name: shrink_dead_dsts
+# GCN: %0:vgpr_32 = V_SUBREV_CO_U32_e32 undef %3:sreg_32, undef %2:vgpr_32, implicit-def dead $vcc, implicit $exec
+---
+name:            shrink_dead_dsts
+body:             |
+  bb.0:
+    %2:vgpr_32, dead %3:sreg_64 = V_SUB_CO_U32_e64 undef %0:vgpr_32, undef %1:sreg_32, 0, implicit $exec
+
+...



More information about the llvm-commits mailing list