[llvm-branch-commits] [llvm] [AMDGPU] Enabled GCN trackers (amdgpu-use-amdgpu-trackers) by default. (PR #184400)
via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Tue Mar 3 10:00:43 PST 2026
llvmbot wrote:
<!--LLVM PR SUMMARY COMMENT-->
@llvm/pr-subscribers-backend-amdgpu
Author: Dhruva Chakrabarti (dhruvachak)
<details>
<summary>Changes</summary>
The LIT tests have been generally updated in one of the following ways: (1) If the above option was not present and the test was auto-generated, the test has now been auto-generated.
(2) If the above option was not present and the test was not auto-generated, added the option -amdgpu-use-amdgpu-trackers=0 so as to preserve any specific attributes the test was already checking. (3) If the above option was present in a test, then its value has been updated to reflect the change in the default.
Currently, there are 4 tests in category (2). They are: CodeGen/AMDGPU/
addrspacecast.ll
schedule-regpressure-limit.ll
schedule-regpressure-limit2.ll
sema-v-unsched-bundle.ll
There are 8 tests in category (3). They are:
CodeGen/AMDGPU/
schedule-amdgpu-tracker-physreg.ll
schedule-amdgpu-trackers.ll
materialize-frame-index-sgpr.ll
schedule-relaxed-occupancy.ll
schedule-regpressure-ilp-metric-spills.mir
pr51516.mir
high-RP-reschedule.mir
machine-scheduler-sink-trivial-remats.mir
The rest are in category (1).
This PR is stacked on top of https://github.com/llvm/llvm-project/pull/184275.
Assisted-by: Cursor
---
Patch is 21.25 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/184400.diff
162 Files Affected:
- (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll (+105-105)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll (+52-52)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fdiv.f64.ll (+369-387)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshl.ll (+223-223)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/fshr.ll (+217-217)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement-stack-lower.ll (+80-80)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/insertelement.ll (+138-138)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/load-uniform-in-vgpr.ll (+26-27)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/mul.ll (+198-183)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdiv.i64.ll (+759-759)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/sdivrem.ll (+196-196)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/srem.i64.ll (+987-987)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/uaddsat.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udiv.i64.ll (+219-219)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/udivrem.ll (+243-243)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/urem.i64.ll (+220-220)
- (modified) llvm/test/CodeGen/AMDGPU/GlobalISel/usubsat.ll (+24-24)
- (modified) llvm/test/CodeGen/AMDGPU/a-v-flat-atomicrmw.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/a-v-global-atomicrmw.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/abs_i16.ll (+15-15)
- (modified) llvm/test/CodeGen/AMDGPU/add.ll (+32-32)
- (modified) llvm/test/CodeGen/AMDGPU/addrspacecast.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/agpr-copy-no-free-registers.ll (+138-58)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.1024bit.ll (+73457-73030)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.128bit.ll (+95-97)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.224bit.ll (+43-43)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.256bit.ll (+3794-3764)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.288bit.ll (+192-192)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.320bit.ll (+1299-1275)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.352bit.ll (+229-229)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.384bit.ll (+481-469)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.448bit.ll (+568-536)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.512bit.ll (+12941-12966)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.576bit.ll (+1593-1561)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.640bit.ll (+1690-1649)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.704bit.ll (+2710-2648)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.768bit.ll (+1904-1808)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.832bit.ll (+2318-2175)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.896bit.ll (+3632-3520)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.960bit.ll (+4379-4290)
- (modified) llvm/test/CodeGen/AMDGPU/amdgcn.bitcast.96bit.ll (+28-28)
- (modified) llvm/test/CodeGen/AMDGPU/av-split-dead-valno-crash.ll (+26-28)
- (modified) llvm/test/CodeGen/AMDGPU/bf16.ll (+9707-9179)
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-contents-legalization.ll (+61-63)
- (modified) llvm/test/CodeGen/AMDGPU/buffer-fat-pointers-memcpy.ll (+36-36)
- (modified) llvm/test/CodeGen/AMDGPU/call-argument-types.ll (+706-417)
- (modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir (+21-21)
- (modified) llvm/test/CodeGen/AMDGPU/div_i128.ll (+271-271)
- (modified) llvm/test/CodeGen/AMDGPU/div_v2i128.ll (+933-933)
- (modified) llvm/test/CodeGen/AMDGPU/extract-subvector.ll (+28-28)
- (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.bf16.ll (+72-73)
- (modified) llvm/test/CodeGen/AMDGPU/fcanonicalize.f16.ll (+4-4)
- (modified) llvm/test/CodeGen/AMDGPU/fceil64.ll (+311-313)
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.bf16.ll (+155-176)
- (modified) llvm/test/CodeGen/AMDGPU/fcopysign.f16.ll (+82-105)
- (modified) llvm/test/CodeGen/AMDGPU/fmax_legacy.f16.ll (+36-36)
- (modified) llvm/test/CodeGen/AMDGPU/fmaximum.ll (+43-43)
- (modified) llvm/test/CodeGen/AMDGPU/fmin_legacy.f16.ll (+36-36)
- (modified) llvm/test/CodeGen/AMDGPU/fminimum.ll (+43-43)
- (modified) llvm/test/CodeGen/AMDGPU/fptoi.i128.ll (+62-62)
- (modified) llvm/test/CodeGen/AMDGPU/freeze.ll (+86-79)
- (modified) llvm/test/CodeGen/AMDGPU/fsqrt.f64.ll (+200-200)
- (modified) llvm/test/CodeGen/AMDGPU/function-args.ll (+43-43)
- (modified) llvm/test/CodeGen/AMDGPU/function-returns.ll (+229-229)
- (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-argument-types.ll (+59-63)
- (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll (+526-327)
- (modified) llvm/test/CodeGen/AMDGPU/half.ll (+233-231)
- (modified) llvm/test/CodeGen/AMDGPU/high-RP-reschedule.mir (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/identical-subrange-spill-infloop.ll (+56-89)
- (modified) llvm/test/CodeGen/AMDGPU/indirect-addressing-si.ll (+480-485)
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_dynelt.ll (+67-68)
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2bf16.ll (+140-140)
- (modified) llvm/test/CodeGen/AMDGPU/insert_vector_elt.v2i16.ll (+138-138)
- (modified) llvm/test/CodeGen/AMDGPU/insert_waitcnt_for_precise_memory.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/integer-mad-patterns.ll (+114-114)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir (+808-807)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir (+361-361)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2c.mir (+6-6)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+10-12)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.tensor.load.store.ll (+91-180)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp.f64.ll (+1576-1559)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp10.f64.ll (+1564-1558)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.exp2.f64.ll (+1333-1327)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.fma.f16.ll (+6-6)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f16.ll (+188-188)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f32.ll (+147-147)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.maximum.f64.ll (+471-372)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f16.ll (+73-73)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f32.ll (+147-147)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.minimum.f64.ll (+471-372)
- (modified) llvm/test/CodeGen/AMDGPU/llvm.round.f64.ll (+39-40)
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i1.ll (+2122-1778)
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i16.ll (+1556-1566)
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i32.ll (+472-466)
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i64.ll (+48-47)
- (modified) llvm/test/CodeGen/AMDGPU/load-constant-i8.ll (+1171-1178)
- (modified) llvm/test/CodeGen/AMDGPU/load-global-i16.ll (+1655-1757)
- (modified) llvm/test/CodeGen/AMDGPU/load-global-i32.ll (+649-789)
- (modified) llvm/test/CodeGen/AMDGPU/load-global-i8.ll (+1592-1665)
- (modified) llvm/test/CodeGen/AMDGPU/load-local-i16.ll (+2691-2901)
- (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-rematerialization-scoring.mir (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/maximumnum.bf16.ll (+2352-2152)
- (modified) llvm/test/CodeGen/AMDGPU/maximumnum.ll (+506-535)
- (modified) llvm/test/CodeGen/AMDGPU/memcpy-libcall.ll (+92-90)
- (modified) llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll (+1274-1271)
- (modified) llvm/test/CodeGen/AMDGPU/memset-param-combinations.ll (+7-11)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-cd-select.ll (+30-36)
- (modified) llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll (+20-20)
- (modified) llvm/test/CodeGen/AMDGPU/minimumnum.bf16.ll (+2386-2186)
- (modified) llvm/test/CodeGen/AMDGPU/minimumnum.ll (+506-535)
- (modified) llvm/test/CodeGen/AMDGPU/mul.ll (+39-39)
- (modified) llvm/test/CodeGen/AMDGPU/mul24-pass-ordering.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/packed-fp32.ll (+364-356)
- (modified) llvm/test/CodeGen/AMDGPU/pr51516.mir (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/preserve-wwm-copy-dst-reg.ll (+16-16)
- (modified) llvm/test/CodeGen/AMDGPU/promote-constOffset-to-imm.ll (+104-103)
- (modified) llvm/test/CodeGen/AMDGPU/regpressure_printer.mir (+64-50)
- (modified) llvm/test/CodeGen/AMDGPU/rem_i128.ll (+175-175)
- (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+15-18)
- (modified) llvm/test/CodeGen/AMDGPU/rsq.f64.ll (+782-785)
- (modified) llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir (+10-10)
- (modified) llvm/test/CodeGen/AMDGPU/sched-handleMoveUp-subreg-def-across-subreg-def.mir (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_copies.mir (+780-780)
- (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_cost.mir (+62-62)
- (modified) llvm/test/CodeGen/AMDGPU/sched_mfma_rewrite_diff_types.mir (+20-20)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-amdgpu-trackers.ll (+4-4)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-barrier.mir (+13-13)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-ilp-metric-spills.mir (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit.ll (+3-3)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-regpressure-limit2.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/schedule-relaxed-occupancy.ll (+4-4)
- (modified) llvm/test/CodeGen/AMDGPU/scratch-simple.ll (+452-438)
- (modified) llvm/test/CodeGen/AMDGPU/sdiv.ll (+210-210)
- (modified) llvm/test/CodeGen/AMDGPU/sdwa-peephole.ll (+7-7)
- (modified) llvm/test/CodeGen/AMDGPU/select.f16.ll (+476-512)
- (modified) llvm/test/CodeGen/AMDGPU/sema-v-unsched-bundle.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/shl.ll (+11-11)
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v2i64.v8i64.ll (+459-459)
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4i64.v4i64.ll (+119-129)
- (modified) llvm/test/CodeGen/AMDGPU/shufflevector.v4p0.v4p0.ll (+119-129)
- (modified) llvm/test/CodeGen/AMDGPU/spill-agpr.ll (+1-1)
- (modified) llvm/test/CodeGen/AMDGPU/sra.ll (+31-31)
- (modified) llvm/test/CodeGen/AMDGPU/srem.ll (+113-113)
- (modified) llvm/test/CodeGen/AMDGPU/srl.ll (+11-11)
- (modified) llvm/test/CodeGen/AMDGPU/ssubsat.ll (+96-96)
- (modified) llvm/test/CodeGen/AMDGPU/stack-realign.ll (+4-8)
- (modified) llvm/test/CodeGen/AMDGPU/uaddsat.ll (+2-2)
- (modified) llvm/test/CodeGen/AMDGPU/udiv.ll (+20-20)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-add.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-and.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-mul.ll (+332-318)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-or.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smax.ll (+82-82)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-smin.ll (+82-82)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umax.ll (+82-82)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-umin.ll (+82-82)
- (modified) llvm/test/CodeGen/AMDGPU/vector-reduce-xor.ll (+8-8)
- (modified) llvm/test/CodeGen/AMDGPU/vni8-across-blocks.ll (+36-35)
``````````diff
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index 6685df3de7d22..127acf1c5513b 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -76,7 +76,7 @@ static cl::opt<bool>
static cl::opt<bool> GCNTrackers(
"amdgpu-use-amdgpu-trackers", cl::Hidden,
cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
- cl::init(false));
+ cl::init(true));
static cl::opt<bool> TrackPhysRegInTrackers(
"amdgpu-trackers-physical-register-tracking", cl::Hidden,
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
index b754bf0071da8..c7375768a831e 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/add.vni16.ll
@@ -370,62 +370,62 @@ define void @addv_7i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
; GFX8-NEXT: flat_load_ushort v17, v[6:7]
; GFX8-NEXT: flat_load_ushort v18, v[8:9]
-; GFX8-NEXT: flat_load_ushort v19, v[10:11]
-; GFX8-NEXT: flat_load_ushort v20, v[12:13]
-; GFX8-NEXT: flat_load_ushort v21, v[14:15]
-; GFX8-NEXT: flat_load_ushort v22, v[0:1]
+; GFX8-NEXT: flat_load_ushort v10, v[10:11]
+; GFX8-NEXT: flat_load_ushort v11, v[12:13]
+; GFX8-NEXT: flat_load_ushort v12, v[14:15]
+; GFX8-NEXT: flat_load_ushort v13, v[0:1]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v2
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
; GFX8-NEXT: v_add_u32_e32 v6, vcc, 4, v2
; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v8, vcc, 6, v2
+; GFX8-NEXT: flat_load_ushort v14, v[2:3]
+; GFX8-NEXT: flat_load_ushort v15, v[0:1]
+; GFX8-NEXT: flat_load_ushort v19, v[6:7]
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, 6, v2
+; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT: v_add_u32_e32 v6, vcc, 8, v2
+; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v3, vcc
+; GFX8-NEXT: v_add_u32_e32 v8, vcc, 10, v2
; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v10, vcc, 8, v2
-; GFX8-NEXT: v_addc_u32_e32 v11, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v12, vcc, 10, v2
-; GFX8-NEXT: v_addc_u32_e32 v13, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v14, vcc, 12, v2
-; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_ushort v2, v[2:3]
-; GFX8-NEXT: flat_load_ushort v3, v[0:1]
+; GFX8-NEXT: v_add_u32_e32 v2, vcc, 12, v2
+; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
+; GFX8-NEXT: flat_load_ushort v20, v[0:1]
; GFX8-NEXT: flat_load_ushort v6, v[6:7]
; GFX8-NEXT: flat_load_ushort v7, v[8:9]
-; GFX8-NEXT: flat_load_ushort v8, v[10:11]
-; GFX8-NEXT: flat_load_ushort v9, v[12:13]
-; GFX8-NEXT: flat_load_ushort v10, v[14:15]
+; GFX8-NEXT: flat_load_ushort v2, v[2:3]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 2, v4
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v2, v16, v2
+; GFX8-NEXT: v_add_u16_e32 v3, v16, v14
; GFX8-NEXT: s_waitcnt vmcnt(5)
-; GFX8-NEXT: v_add_u16_e32 v3, v17, v3
-; GFX8-NEXT: flat_store_short v[4:5], v2
-; GFX8-NEXT: flat_store_short v[0:1], v3
+; GFX8-NEXT: v_add_u16_e32 v8, v17, v15
+; GFX8-NEXT: flat_store_short v[4:5], v3
+; GFX8-NEXT: flat_store_short v[0:1], v8
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 4, v4
; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v6, v18, v6
+; GFX8-NEXT: v_add_u16_e32 v9, v18, v19
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT: flat_store_short v[0:1], v6
+; GFX8-NEXT: flat_store_short v[0:1], v9
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 6, v4
-; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v7, v19, v7
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT: flat_store_short v[0:1], v7
+; GFX8-NEXT: s_waitcnt vmcnt(6)
+; GFX8-NEXT: v_add_u16_e32 v10, v10, v20
+; GFX8-NEXT: flat_store_short v[0:1], v10
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 8, v4
; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v8, v20, v8
+; GFX8-NEXT: v_add_u16_e32 v6, v11, v6
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT: flat_store_short v[0:1], v8
+; GFX8-NEXT: flat_store_short v[0:1], v6
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 10, v4
; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v9, v21, v9
+; GFX8-NEXT: v_add_u16_e32 v7, v12, v7
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT: flat_store_short v[0:1], v9
+; GFX8-NEXT: flat_store_short v[0:1], v7
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 12, v4
; GFX8-NEXT: s_waitcnt vmcnt(6)
-; GFX8-NEXT: v_add_u16_e32 v10, v22, v10
+; GFX8-NEXT: v_add_u16_e32 v2, v13, v2
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
-; GFX8-NEXT: flat_store_short v[0:1], v10
+; GFX8-NEXT: flat_store_short v[0:1], v2
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
@@ -532,29 +532,29 @@ define void @add_v9i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addrs
; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT: flat_load_ushort v14, v[0:1]
+; GFX8-NEXT: flat_load_ushort v16, v[0:1]
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
; GFX8-NEXT: flat_load_ushort v0, v[0:1]
+; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v4
+; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v5, vcc
; GFX8-NEXT: s_waitcnt vmcnt(2)
; GFX8-NEXT: v_add_u16_e32 v1, v6, v10
; GFX8-NEXT: v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
; GFX8-NEXT: v_add_u16_e32 v3, v7, v11
-; GFX8-NEXT: v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u16_e32 v11, v8, v12
+; GFX8-NEXT: v_add_u16_sdwa v6, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v7, v8, v12
; GFX8-NEXT: v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u16_e32 v12, v9, v13
+; GFX8-NEXT: v_add_u16_e32 v10, v9, v13
; GFX8-NEXT: v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u32_e32 v6, vcc, 16, v4
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_add_u16_e32 v13, v14, v0
+; GFX8-NEXT: v_add_u16_e32 v11, v16, v0
; GFX8-NEXT: v_or_b32_e32 v0, v1, v2
-; GFX8-NEXT: v_or_b32_e32 v1, v3, v10
-; GFX8-NEXT: v_or_b32_e32 v2, v11, v8
-; GFX8-NEXT: v_or_b32_e32 v3, v12, v9
-; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc
+; GFX8-NEXT: v_or_b32_e32 v1, v3, v6
+; GFX8-NEXT: v_or_b32_e32 v2, v7, v8
+; GFX8-NEXT: v_or_b32_e32 v3, v10, v9
; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT: flat_store_short v[6:7], v13
+; GFX8-NEXT: flat_store_short v[14:15], v11
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
@@ -685,55 +685,55 @@ define void @add_v11i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr
; GFX8-LABEL: add_v11i16:
; GFX8: ; %bb.0:
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v0
+; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v1, vcc
+; GFX8-NEXT: v_add_u32_e32 v16, vcc, 18, v0
+; GFX8-NEXT: v_addc_u32_e32 v17, vcc, 0, v1, vcc
; GFX8-NEXT: flat_load_dwordx4 v[6:9], v[0:1]
-; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3]
-; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v2
-; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v16, vcc, 18, v2
-; GFX8-NEXT: v_addc_u32_e32 v17, vcc, 0, v3, vcc
-; GFX8-NEXT: v_add_u32_e32 v2, vcc, 20, v2
-; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
-; GFX8-NEXT: flat_load_ushort v14, v[14:15]
-; GFX8-NEXT: flat_load_ushort v15, v[16:17]
-; GFX8-NEXT: flat_load_ushort v16, v[2:3]
-; GFX8-NEXT: v_add_u32_e32 v2, vcc, 16, v0
-; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v1, vcc
-; GFX8-NEXT: s_waitcnt vmcnt(3)
-; GFX8-NEXT: v_add_u16_e32 v17, v6, v10
-; GFX8-NEXT: v_add_u16_sdwa v10, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u32_e32 v6, vcc, 18, v0
-; GFX8-NEXT: v_add_u16_e32 v18, v7, v11
-; GFX8-NEXT: v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v1, vcc
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 20, v0
-; GFX8-NEXT: flat_load_ushort v2, v[2:3]
-; GFX8-NEXT: flat_load_ushort v3, v[6:7]
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT: flat_load_ushort v21, v[0:1]
-; GFX8-NEXT: v_add_u32_e32 v6, vcc, 16, v4
-; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc
-; GFX8-NEXT: v_add_u16_e32 v19, v8, v12
+; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3]
+; GFX8-NEXT: flat_load_ushort v18, v[14:15]
+; GFX8-NEXT: flat_load_ushort v16, v[16:17]
+; GFX8-NEXT: flat_load_ushort v17, v[0:1]
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2
+; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT: v_add_u32_e32 v14, vcc, 18, v2
+; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v3, vcc
+; GFX8-NEXT: flat_load_ushort v19, v[0:1]
+; GFX8-NEXT: flat_load_ushort v20, v[14:15]
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, 20, v2
+; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT: flat_load_ushort v0, v[0:1]
+; GFX8-NEXT: v_add_u32_e32 v14, vcc, 16, v4
+; GFX8-NEXT: v_addc_u32_e32 v15, vcc, 0, v5, vcc
+; GFX8-NEXT: s_waitcnt vmcnt(6)
+; GFX8-NEXT: v_add_u16_e32 v1, v6, v10
+; GFX8-NEXT: v_add_u16_sdwa v2, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u32_e32 v6, vcc, 18, v4
+; GFX8-NEXT: v_add_u16_e32 v3, v7, v11
+; GFX8-NEXT: v_add_u16_sdwa v10, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v11, v8, v12
; GFX8-NEXT: v_add_u16_sdwa v12, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u32_e32 v8, vcc, 18, v4
-; GFX8-NEXT: v_add_u16_e32 v20, v9, v13
+; GFX8-NEXT: v_add_u16_e32 v21, v9, v13
; GFX8-NEXT: v_add_u16_sdwa v13, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v5, vcc
-; GFX8-NEXT: v_or_b32_e32 v0, v17, v10
-; GFX8-NEXT: v_or_b32_e32 v1, v18, v11
-; GFX8-NEXT: v_add_u32_e32 v10, vcc, 20, v4
-; GFX8-NEXT: v_addc_u32_e32 v11, vcc, 0, v5, vcc
+; GFX8-NEXT: v_addc_u32_e32 v7, vcc, 0, v5, vcc
+; GFX8-NEXT: v_add_u32_e32 v8, vcc, 20, v4
; GFX8-NEXT: s_waitcnt vmcnt(2)
-; GFX8-NEXT: v_add_u16_e32 v14, v2, v14
+; GFX8-NEXT: v_add_u16_e32 v18, v18, v19
; GFX8-NEXT: s_waitcnt vmcnt(1)
-; GFX8-NEXT: v_add_u16_e32 v15, v3, v15
-; GFX8-NEXT: v_or_b32_e32 v2, v19, v12
-; GFX8-NEXT: v_or_b32_e32 v3, v20, v13
+; GFX8-NEXT: v_add_u16_e32 v16, v16, v20
+; GFX8-NEXT: v_addc_u32_e32 v9, vcc, 0, v5, vcc
; GFX8-NEXT: s_waitcnt vmcnt(0)
-; GFX8-NEXT: v_add_u16_e32 v16, v21, v16
+; GFX8-NEXT: v_add_u16_e32 v17, v17, v0
+; GFX8-NEXT: v_or_b32_e32 v0, v1, v2
+; GFX8-NEXT: v_or_b32_e32 v1, v3, v10
+; GFX8-NEXT: v_or_b32_e32 v2, v11, v12
+; GFX8-NEXT: v_or_b32_e32 v3, v21, v13
; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT: flat_store_short v[6:7], v14
-; GFX8-NEXT: flat_store_short v[8:9], v15
-; GFX8-NEXT: flat_store_short v[10:11], v16
+; GFX8-NEXT: flat_store_short v[14:15], v18
+; GFX8-NEXT: flat_store_short v[6:7], v16
+; GFX8-NEXT: flat_store_short v[8:9], v17
; GFX8-NEXT: s_waitcnt vmcnt(0)
; GFX8-NEXT: s_setpc_b64 s[30:31]
;
@@ -825,34 +825,34 @@ define void @add_v12i16(ptr addrspace(1) %ptra, ptr addrspace(1) %ptrb, ptr addr
; GFX8-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX8-NEXT: flat_load_dwordx4 v[6:9], v[0:1]
; GFX8-NEXT: flat_load_dwordx4 v[10:13], v[2:3]
-; GFX8-NEXT: v_add_u32_e32 v2, vcc, 16, v2
-; GFX8-NEXT: v_addc_u32_e32 v3, vcc, 0, v3, vcc
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v0
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v1, vcc
-; GFX8-NEXT: flat_load_dwordx2 v[14:15], v[2:3]
-; GFX8-NEXT: s_waitcnt vmcnt(1)
-; GFX8-NEXT: v_add_u16_e32 v2, v6, v10
-; GFX8-NEXT: v_add_u16_sdwa v3, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u16_e32 v10, v7, v11
-; GFX8-NEXT: v_add_u16_sdwa v11, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: flat_load_dwordx2 v[6:7], v[0:1]
-; GFX8-NEXT: v_add_u16_e32 v16, v8, v12
-; GFX8-NEXT: v_add_u16_sdwa v8, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u16_e32 v12, v9, v13
+; GFX8-NEXT: flat_load_dwordx2 v[14:15], v[0:1]
+; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v2
+; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v3, vcc
+; GFX8-NEXT: flat_load_dwordx2 v[16:17], v[0:1]
+; GFX8-NEXT: s_waitcnt vmcnt(2)
+; GFX8-NEXT: v_add_u16_e32 v0, v6, v10
+; GFX8-NEXT: v_add_u16_sdwa v1, v6, v10 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v2, v7, v11
+; GFX8-NEXT: v_add_u16_sdwa v3, v7, v11 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v6, v8, v12
+; GFX8-NEXT: v_add_u16_sdwa v7, v8, v12 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v8, v9, v13
; GFX8-NEXT: v_add_u16_sdwa v9, v9, v13 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_or_b32_e32 v0, v2, v3
-; GFX8-NEXT: v_or_b32_e32 v1, v10, v11
-; GFX8-NEXT: v_or_b32_e32 v2, v16, v8
-; GFX8-NEXT: v_or_b32_e32 v3, v12, v9
+; GFX8-NEXT: v_or_b32_e32 v0, v0, v1
+; GFX8-NEXT: v_or_b32_e32 v1, v2, v3
+; GFX8-NEXT: v_or_b32_e32 v2, v6, v7
+; GFX8-NEXT: v_or_b32_e32 v3, v8, v9
+; GFX8-NEXT: s_waitcnt vmcnt(0)
+; GFX8-NEXT: v_add_u16_e32 v6, v14, v16
+; GFX8-NEXT: v_add_u16_sdwa v7, v14, v16 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_add_u16_e32 v8, v15, v17
+; GFX8-NEXT: v_add_u16_sdwa v9, v15, v17 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
; GFX8-NEXT: flat_store_dwordx4 v[4:5], v[0:3]
-; GFX8-NEXT: s_waitcnt vmcnt(1)
-; GFX8-NEXT: v_add_u16_e32 v8, v6, v14
-; GFX8-NEXT: v_add_u16_sdwa v6, v6, v14 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
-; GFX8-NEXT: v_add_u16_e32 v9, v7, v15
-; GFX8-NEXT: v_add_u16_sdwa v7, v7, v15 dst_sel:WORD_1 dst_unused:UNUSED_PAD src0_sel:WORD_1 src1_sel:WORD_1
+; GFX8-NEXT: v_or_b32_e32 v6, v6, v7
; GFX8-NEXT: v_add_u32_e32 v0, vcc, 16, v4
-; GFX8-NEXT: v_or_b32_e32 v6, v8, v6
-; GFX8-NEXT: v_or_b32_e32 v7, v9, v7
+; GFX8-NEXT: v_or_b32_e32 v7, v8, v9
; GFX8-NEXT: v_addc_u32_e32 v1, vcc, 0, v5, vcc
; GFX8-NEXT: flat_store_dwordx2 v[0:1], v[6:7]
; GFX8-NEXT: s_waitcnt vmcnt(0)
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
index 8183a4dec10ca..f773983ef0f01 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/combine-fma-add-fma-mul.ll
@@ -699,24 +699,24 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou
; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
; GFX9-CONTRACT-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8
-; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0)
+; GFX9-CONTRACT-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:12
+; GFX9-CONTRACT-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:16
+; GFX9-CONTRACT-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:20
+; GFX9-CONTRACT-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:24
+; GFX9-CONTRACT-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:28
+; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(5)
; GFX9-CONTRACT-NEXT: v_fma_f64 v[16:17], v[16:17], v[24:25], v[31:32]
-; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:12
-; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:16
+; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-CONTRACT-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:32
+; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(5)
+; GFX9-CONTRACT-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[33:34]
+; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(3)
+; GFX9-CONTRACT-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[35:36]
; GFX9-CONTRACT-NEXT: v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17]
-; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0)
-; GFX9-CONTRACT-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[24:25]
-; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:20
-; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:24
; GFX9-CONTRACT-NEXT: v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
-; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0)
-; GFX9-CONTRACT-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-CONTRACT-NEXT: buffer_load_dword v31, off, s[0:3], s32
-; GFX9-CONTRACT-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:28
-; GFX9-CONTRACT-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:32
; GFX9-CONTRACT-NEXT: v_fma_f64 v[4:5], v[4:5], v[12:13], v[20:21]
; GFX9-CONTRACT-NEXT: s_waitcnt vmcnt(0)
-; GFX9-CONTRACT-NEXT: v_fma_f64 v[22:23], v[22:23], v[30:31], v[24:25]
+; GFX9-CONTRACT-NEXT: v_fma_f64 v[22:23], v[22:23], v[30:31], v[37:38]
; GFX9-CONTRACT-NEXT: v_fma_f64 v[6:7], v[6:7], v[14:15], v[22:23]
; GFX9-CONTRACT-NEXT: s_setpc_b64 s[30:31]
;
@@ -725,24 +725,24 @@ define <4 x double> @test_f64_add_mul(<4 x double> %a, <4 x double> %b, <4 x dou
; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32 offset:4
; GFX9-DENORM-NEXT: buffer_load_dword v32, off, s[0:3], s32 offset:8
-; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0)
+; GFX9-DENORM-NEXT: buffer_load_dword v33, off, s[0:3], s32 offset:12
+; GFX9-DENORM-NEXT: buffer_load_dword v34, off, s[0:3], s32 offset:16
+; GFX9-DENORM-NEXT: buffer_load_dword v35, off, s[0:3], s32 offset:20
+; GFX9-DENORM-NEXT: buffer_load_dword v36, off, s[0:3], s32 offset:24
+; GFX9-DENORM-NEXT: buffer_load_dword v37, off, s[0:3], s32 offset:28
+; GFX9-DENORM-NEXT: s_waitcnt vmcnt(5)
; GFX9-DENORM-NEXT: v_fma_f64 v[16:17], v[16:17], v[24:25], v[31:32]
-; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:12
-; GFX9-DENORM-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:16
+; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32
+; GFX9-DENORM-NEXT: buffer_load_dword v38, off, s[0:3], s32 offset:32
+; GFX9-DENORM-NEXT: s_waitcnt vmcnt(5)
+; GFX9-DENORM-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[33:34]
+; GFX9-DENORM-NEXT: s_waitcnt vmcnt(3)
+; GFX9-DENORM-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[35:36]
; GFX9-DENORM-NEXT: v_fma_f64 v[0:1], v[0:1], v[8:9], v[16:17]
-; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0)
-; GFX9-DENORM-NEXT: v_fma_f64 v[18:19], v[18:19], v[26:27], v[24:25]
-; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 offset:20
-; GFX9-DENORM-NEXT: buffer_load_dword v25, off, s[0:3], s32 offset:24
; GFX9-DENORM-NEXT: v_fma_f64 v[2:3], v[2:3], v[10:11], v[18:19]
-; GFX9-DENORM-NEXT: s_waitcnt vmcnt(0)
-; GFX9-DENORM-NEXT: v_fma_f64 v[20:21], v[20:21], v[28:29], v[24:25]
-; GFX9-DENORM-NEXT: buffer_load_dword v31, off, s[0:3], s32
-; GFX9-DENORM-NEXT: buffer_load_dword v24, off, s[0:3], s32 o...
[truncated]
``````````
</details>
https://github.com/llvm/llvm-project/pull/184400
More information about the llvm-branch-commits
mailing list