[llvm] [MLIR][Affine] Fix crash in loop unswitching/hoistAffineIfOp (#130401) (PR #130644)

Shilei Tian via llvm-commits llvm-commits at lists.llvm.org
Mon Mar 10 10:42:42 PDT 2025


https://github.com/shiltian created https://github.com/llvm/llvm-project/pull/130644

[MLIR][Affine] Fix crash in loop unswitching/hoistAffineIfOp (#130401)

Fix obvious crash as a result of missing affine.parallel handling. Also,
fix bug exposed in a helper method used by hoistAffineIfOp.

Fixes: https://github.com/llvm/llvm-project/issues/62323

[clang][bytecode] Implement __builtin_{memchr,strchr,char_memchr} (#130420)

llvm has recently started to use `__builitn_memchr` at compile time, so
implement this. Still needs some work but the basics are done.

[mlir]Add a check to ensure bailing out when reducing to a scalar (#129694)

Fixes issue #64075
Referencing this comment for more detailed view ->
https://github.com/llvm/llvm-project/issues/64075#issuecomment-2694112594

**Minimal example crashing :** 
```
func.func @multi_reduction(%0: vector<4x2xf32>, %acc1: f32) -> f32 {
  %2 = vector.multi_reduction <add>, %0, %acc1 [0, 1] : vector<4x2xf32> to f32
  return %2 : f32
}
```

[X86] combineINSERT_SUBVECTOR - attempt to recursively shuffle combine if both base/sub-vectors are already shuffles (#130304)

[lldb] Remove progress report coalescing (#130329)

Remove support for coalescing progress reports in LLDB. This
functionality was motivated by Xcode, which wanted to listen for less
frequent, aggregated progress events at the cost of losing some detail.
See the original RFC [1] for more details. Since then, they've
reevaluated this trade-off and opted to listen for the regular, full
fidelity progress events and do any post processing on their end.

rdar://146425487

[gn build] Port a1b14dbc4740

[clang][bytecode][NFC] Check conditional op condition for ConstantExprs (#130425)

Same thing we now do in if statements. Check the condition of a
conditional operator for a statically known true/false value.

[RISCV] Added test for dag spill fix

[TableGen] Use Register in FastISel output. NFC

[clang][bytecode] Surround bcp condition with Start/EndSpeculation (#130427)

This is similar to what the current interpreter is doing - the
FoldConstant RAII object surrounds the entire HandleConditionalOperator
call, which means the condition and both TrueExpr or FalseExpr.

[X86] combineConcatVectorOps - convert (V)SHUFPS concatenation to use combineConcatVectorOps recursion (#130426)

Only concatenate X86ISD::SHUFP nodes if at least one operand is
beneficial to concatenate - helps prevent lot of unnecessary AVX1
concatenations

[SandboxVec] Add region-from-bbs helper pass (#130153)

RegionFromBBs is a helper Sandbox IR function pass that builds a region
for each BB. This helps test region passes in isolation without relying
on other parts of the vectorizer, which is especially useful for
stress-testing.

[llvm-profdata] Fix typo in llvm-profdata (#114675)

Signed-off-by: Peter Jung <admin at ptr1337.dev>

[llvm][NFC]Fix a few typos (#110844)

[gn build] Port 2fb1f0322403

[X86] Fix typo in X86ISD::SHUFP concatenation

[Clang] Fix typo 'dereferencable' to 'dereferenceable' (#116761)

This patch corrects the typo 'dereferencable' to 'dereferenceable' in
CGCall.cpp.
The typo is located within a comment inside the `void
CodeGenModule::ConstructAttributeList` function.

[NFC][YAML][IR] Output CfiFunction sorted (#130379)

As-is it's NFC, as internally `CfiFunction*` are std::set<>.

We are changing internals of `CfiFunctionDefs` and
`CfiFunctionDecls` so they will be ordered by GUID.

Sorting by name is unnecessary but good for
readability and tests.

[libc++][NFC] Fixed bad link in 21.rst (#130428)

Re-land "[mlir][ODS] Add a generated builder that takes the Properties struct" (#130117) (#130373)

This reverts commit 32f543760c7f44c4c7d18bc00a3a1d8860c42bff.

Investigations showed that the unit test utilities were calling erase(),
causing a use-after-free. Fixed by rearranging checks in the test

Revert "[gold] Fix compilation (#130334)"

This reverts commit b0baa1d8bd68a2ce2f7c5f2b62333e410e9122a1.

Reverting follow-up commit to ce9e1d3c15ed6290f1cb07b482939976fa8115cd since the original commit test is flaky.

Revert "[clangd] fix warning by adding missing parens"

This reverts commit df79000896101acc9b8d7435e59f767b36c00ac8.

Reverting follow-up commit to ce9e1d3c15ed6290f1cb07b482939976fa8115cd since the original commit test is flaky.

Revert "Modify the localCache API to require an explicit commit on CachedFile… (#115331)"

This reverts commit ce9e1d3c15ed6290f1cb07b482939976fa8115cd.

The unittest added in this commit seems to be flaky causing random failure on buildbots:
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13235
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13232
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13228
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13224
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13220
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13210
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13208
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13207
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13202
 - https://lab.llvm.org/buildbot/#/builders/46/builds/13196
and
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14266
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14254
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14250
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14245
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14244
 - https://lab.llvm.org/buildbot/#/builders/180/builds/14226

[NFC][YAML] Switch `std::sort` to `llvm::sort` (#130448)

Follow up to #130379.

[gn build] Port 1d763f383380

[clangd] Add `BuiltinHeaders` config option (#129459)

This option, under `CompileFlags`, governs whether clangd uses its own
built-in headers (`Clangd` option value) or the built-in headers of the driver
in the file's compile command (`QueryDriver` option value, applicable to
cases where `--query-driver` is used to instruct clangd to ask the driver
for its system include paths).

The default value is `Clangd`, preserving clangd's current defaut behaviour.

Fixes clangd/clangd#2074

[AArch64] Use Register in AArch64FastISel.cpp. NFC

[NFC][IR] De-duplicate CFI related code (#130450)

[msan] Handle Arm NEON pairwise min/max instructions (#129824)

Change the handling of:
- llvm.aarch64.neon.fmaxp
- llvm.aarch64.neon.fminp
- llvm.aarch64.neon.fmaxnmp
- llvm.aarch64.neon.fminnmp
- llvm.aarch64.neon.smaxp
- llvm.aarch64.neon.sminp
- llvm.aarch64.neon.umaxp
- llvm.aarch64.neon.uminp
from the incorrect heuristic handler (maybeHandleSimpleNomemIntrinsic)
to handlePairwiseShadowOrIntrinsic.

Updates the tests from https://github.com/llvm/llvm-project/pull/129760

Adds a note that maybeHandleSimpleNomemIntrinsic may incorrectly match
horizontal/pairwise intrinsics.

[NFC][YAML] Replace iterators with simple getter (#130449)

To simplify #130382.

[libc++][NFC] Comment cleanup for `<type_traits>` (#130422)

- Aligns all version comments to line 65.
- Consistently uses `// since C++` instead of `// C++`, in lowercase as
used in most version comments.
- Consistently uses `class` to introduce type template parameters.
- Consistently uses `inline constexpr` for variable templates.
- Corrects the comment for `bool_constant` to `// since C++17` as it's a
C++17 feature.
- Changes the class-key of `result_of` to `struct`, which follows N4659
[depr.meta.types] and the actual usage in libc++.
- Adds missed `// since C++17` for `is_(nothrow_)invocable(_r)`.
- Moves the comments for `is_nothrow_convertible_v` to the part for
variable templates.
- Removes duplicated comments for `true_type` and `false_type`.

[msan] Apply handleVectorReduceIntrinsic to max/min vector instructions (#129819)

Changes the handling of:
- llvm.aarch64.neon.smaxv
- llvm.aarch64.neon.sminv
- llvm.aarch64.neon.umaxv
- llvm.aarch64.neon.uminv
- llvm.vector.reduce.smax
- llvm.vector.reduce.smin
- llvm.vector.reduce.umax
- llvm.vector.reduce.umin
- llvm.vector.reduce.fmax
- llvm.vector.reduce.fmin
from the default strict handling (visitInstruction) to
handleVectorReduceIntrinsic.

Also adds a parameter to handleVectorReduceIntrinsic to specify whether
the return type must match the elements of the vector.

Updates the tests from https://github.com/llvm/llvm-project/pull/129741,
https://github.com/llvm/llvm-project/pull/129810,
https://github.com/llvm/llvm-project/pull/129768

[libc] Added type-generic macros for fixed-point functions (#129371)

Adds macros `absfx`, `countlsfx` and `roundfx` .
ref:  #129111

[clang][bytecode][NFC] Bail out on non constant evaluated builtins (#130431)

If the ASTContext says so, don't bother trying to constant evaluate the
given builtin.

[ARM] Use Register in FastISel. NFC

[ARM] Remove unused argument. NFC

[AArch64] Remove unused DenseMap variable. NFC

[ARM] Change FastISel Address from a struct to a class. NFC

This allows us to use Register in the interface, but store an
unsigned internally in a union.

[clang][analyzer][NFC] Fix typos in comments (#130456)

[ExecutionEngine] Avoid repeated map lookups (NFC) (#130461)

[IPO] Avoid repeated hash lookups (NFC) (#130462)

[Scalar] Avoid repeated hash lookups (NFC) (#130463)

[Utils] Avoid repeated hash lookups (NFC) (#130464)

[llvm-jitlink] Avoid repeated hash lookups (NFC) (#130465)

[llvm-profgen] Avoid repeated hash lookups (NFC) (#130466)

[lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 (#123576)

Instructions with relocation `R_LARCH_CALL36` may be relax as follows:
```
From:
   pcaddu18i $dest, %call36(foo)
     R_LARCH_CALL36, R_LARCH_RELAX
   jirl $r, $dest, 0
To:
   b/bl foo  # bl if r=$ra, b if r=$zero
     R_LARCH_B26
```

[InstCombine] Add handling for (or (zext x), (shl (zext (ashr x, bw/2-1))), bw/2) -> (sext x) fold (#130316)

Minor tweak to #129363 which handled all the cases where there was a sext for the original source value, but not for cases where the source is already half the size of the destination type

Another regression noticed in #76524

[clang][bytecode] Fix getting pointer element type in __builtin_memcmp (#130485)

When such a pointer is heap allocated, the type we get is a pointer
type. Take the pointee type in that case.

[X86] combineConcatVectorOps - use all_of to check for matching PSHUFD/PSHUFLW/PSHUFHW shuffle mask.

Prep work before adding 512-bit support.

[clang-tidy] Fix invalid fixit from modernize-use-ranges for nullptr used with std::unique_ptr (#127162)

This PR fixes issue #124815 by correcting the handling of `nullptr` with
`std::unique_ptr` in the `modernize-use-ranges` check.

Updated the logic to suppress warnings for `nullptr` in `std::find`.

[X86] Combine bitcast(v1Ty insert_vector_elt(X, Y, 0)) to Y (#130475)

Though it only happens in v1i1 when we generate llvm.masked.load/store
intrinsics for APX cload/cstore.

https://godbolt.org/z/vjsrofsqx

[ValueTracking] Bail out on x86_fp80 when computing fpclass with knownbits (#130477)

In https://github.com/llvm/llvm-project/pull/97762, we assume the
minimum possible value of X is NaN implies X is NaN. But it doesn't hold
for x86_fp80 format. If the knownbits of X are
`?'011111111111110'????????????????????????????????????????????????????????????????`,
the minimum possible value of X is NaN/unnormal. However, it can be a
normal value.

Closes https://github.com/llvm/llvm-project/issues/130408.

Revert "[lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36 (#123576)"

This reverts commit 6fbe491e1776f6598790a844bf4e743de956b42d.
Broke check-lld, see the many bot comments on
https://github.com/llvm/llvm-project/pull/123576

[VPlan] Refactor VPlan creation, add transform introducing region (NFC). (#128419)

Create an empty VPlan first, then let the HCFG builder create a plain
CFG for the top-level loop (w/o a top-level region). The top-level
region is introduced by a separate VPlan-transform. This is instead of
creating the vector loop region before building the VPlan CFG for the
input loop.

This simplifies the HCFG builder (which should probably be renamed) and
moves along the roadmap ('buildLoop') outlined in [1].

As follow-up, I plan to also preserve the exit branches in the initial
VPlan out of the CFG builder, including connections to the exit blocks.

The conversion from plain CFG with potentially multiple exits to a
single entry/exit region will be done as VPlan transform in a follow-up.

This is needed to enable VPlan-based predication. Currently early exit
support relies on building the block-in masks on the original CFG,
because exiting branches and conditions aren't preserved in the VPlan.
So in order to switch to VPlan-based predication, we will have to
preserve them in the initial plain CFG, so the exit conditions are
available explicitly when we convert to single entry/exit regions.

Another follow-up is updating the outer loop handling to also introduce
VPRegionBlocks for nested loops as transform. Currently the existing
logic in the builder will take care of creating VPRegionBlocks for
nested loops, but not the top-level loop.

[1]
https://llvm.org/devmtg/2023-10/slides/techtalks/Hahn-VPlan-StatusUpdateAndRoadmap.pdf

PR: https://github.com/llvm/llvm-project/pull/128419

[gn build] Port fd267082ee6d

[NFC][Cloning] Make ClonedModule case more obvious in CollectDebugInfoForCloning (#129143)


Summary:
The code's behavior is unchanged, but it's more obvious right now.

Test Plan:
ninja check-llvm-unit check-llvm

[libc++] Protect more code against -Wdeprecated. (#130419)

This seems needed when updating the CI Docker image.

[libc++][CI] Update action runner base image. (#130433)

Updates to the latest release. The side effect of this change is
updating all compilers to the latest upstream version.

[HLSL] Disallow virtual inheritance and functions (#127346)

This PR disallows virtual inheritance and virtual functions in HLSL.

[NFC][Cloning] Simplify the flow in FindDebugInfoToIdentityMap (#129144)


Summary:
The new flow should make it more clear what is happening in cases of
Different of Cloned modules.

Test Plan:
ninja check-llvm-unit check-llvm

[Sanitizers][Darwin] Correct iterating of MachO load commands (#130161)

The condition to stop iterating so far was to look for load command cmd
field == 0. The iteration would continue past the commands area, and
would finally find lc->cmd ==0, if lucky. Or crash with bus error, if
out of luck.

Correcting this by limiting the number of iterations to the count
specified in mach_header(_64) ncmds field.

rdar://143903403

---------

Co-authored-by: Mariusz Borsa <m_borsa at apple.com>

[AArch64] Improve vector funnel shift by constant costs. (#130044)

We now have better codegen, and can have better costs to match. The
generated code should now produce a shl+usra and can be seen in
testcases such as:
https://github.com/llvm/llvm-project/blob/7e5821bae80db3f3f0fe0d5f8ce62f79e548eed5/llvm/test/CodeGen/AArch64/fsh.ll#L3941.

[LV] Add outer loop test with different successor orders in inner latch.

[NFC][Cloning] Add a helper to collect debug info from instructions (#129145)


Summary:
Just moving around. This helper will be used for further refactoring.

Test Plan:
ninja check-llvm-unit check-llvm

Revert "[ARM] Change FastISel Address from a struct to a class. NFC"

This reverts commit d47bc6fd93f9f439a54fd7cf55cdcb2e2ca0cfcb.

I forgot to commit clang-format cleanup before I pushed this.

Recommit "[ARM] Change FastISel Address from a struct to a class. NFC"

With clang-format this time.

Original message:
This allows us to use Register in the interface, but store an
unsigned internally in a union.

[lldb] Add missing converstion to optional

[X86] Use Register in FastISel. NFC

Replace 'Reg == 0' with '!Reg'

[HLSL] select scalar overloads for vector conditions (#129396)

This PR adds scalar/vector overloads for vector conditions to the
`select` builtin, and updates the sema checking and codegen to allow
scalars to extend to vectors.

Fixes #126570

[ADT] Use `adl_being`/`end` in `hasSingleElement` (#130506)

This is to make sure that ADT helpers consistently use argument
dependent lookup when dealing with input ranges.

This was a part of #87936 but reverted due to buildbot failures. Now
that I have a threadripper system, I'm landing this piece-by-piece.

[gn build] Port e85e29c2992b

[alpha.webkit.UnretainedLambdaCapturesChecker] Add a WebKit checker for lambda capturing NS or CF types. (#128651)

Add a new WebKit checker for checking that lambda captures of CF types
use RetainPtr either when ARC is disabled or enabled, and those of NS
types use RetainPtr when ARC is disabled.

[Clang] use constant evaluation context for constexpr if conditions (#123667)

Fixes #123524

---

This PR addresses the issue of immediate function expressions not
properly evaluated in `constexpr` if conditions. Adding the
`ConstantEvaluated` context for expressions in `constexpr` if statements
ensures that these expressions are treated as manifestly
constant-evaluated and parsed correctly.

[Xtensa] Implement Xtensa MAC16 Option. (#130004)

[RISCV] Fix incorrect mask of shuffle vector in the test. (NFC) (#130244)

The mask of shuffle vector should be <u, u, 4, 6, 8, 10, 12, 14>, not
<u, u, 4, 6, *6, 10, 12, 14> for steps of 2.

And the mask of suffle vector with an undef initial element has been
supported by https://github.com/llvm/llvm-project/pull/118509.

[clang] Fix typos in options text. (#130129)

[clang] Reject constexpr-unknown values as constant expressions more consistently (#129952)

Perform the check for constexpr-unknown values in the same place we
perform checks for other values which don't count as constant
expressions.

While I'm here, also fix a rejects-valid with a reference that doesn't
have an initializer. This diagnostic was also covering up some of the
bugs here.

The existing behavior with -fexperimental-new-constant-interpreter seems
to be correct, but the diagnostics are slightly different; it would be
helpful if someone could check on that as a followup.

Followup to #128409.

Fixes #129844. Fixes #129845.

[llvm-objdump][ELF]Fix crash when reading strings from .dynstr (#125679)

This change introduces a check for the strtab offset to prevent
llvm-objdump from crashing when processing malformed ELF files.
It provide a minimal reproduce test for
https://github.com/llvm/llvm-project/issues/86612#issuecomment-2035694455.
Additionally, it modifies how llvm-objdump handles and outputs malformed
ELF files with invalid string offsets.(More info:
https://discourse.llvm.org/t/should-llvm-objdump-objdump-display-actual-corrupted-values-in-malformed-elf-files/84391)

Fixes: #86612

Co-authored-by: James Henderson <James.Henderson at sony.com>

[APFloat] Fix `IEEEFloat::addOrSubtractSignificand` and `IEEEFloat::normalize` (#98721)

Fixes #63895
Fixes #104984

Before this PR, `addOrSubtractSignificand` presumed that the loss came
from the side being subtracted, and didn't handle the case where lhs ==
rhs and there was loss. This can occur during FMA. This PR fixes the
situation by correctly determining where the loss came from and handling
it appropriately.

Additionally, `normalize` failed to adjust the exponent when the
significand is zero but `lost_fraction != lfExactlyZero`. This meant
that the test case from #63895 was rounded incorrectly as the loss
wasn't adjusted to account for the exponent being below the minimum
exponent. This PR fixes this by only skipping the exponent adjustment if
the significand is zero and there was no lost fraction.

(Note to reviewer: I don't have commit access)

[Clang][CodeGen] Fix demangler invariant comment assertion (#130522)

This patch makes the assertion (that is currently in a comment) that
validates that names mangled by clang can be demangled by LLVM actually
compile/work. There were some minor issues that needed to be fixed (like
starts_with not being available on std::string and needing to call
getDecl() on GD), and a logic issue that should be fixed in this patch.
This enables just uncommenting the assertion to enable it within the
compiler (minus needing to add the header file).

Reland [lld][LoongArch] Relax call36/tail36: R_LARCH_CALL36

Instructions with relocation `R_LARCH_CALL36` may be relax as follows:
```
From:
   pcaddu18i $dest, %call36(foo)
     R_LARCH_CALL36, R_LARCH_RELAX
   jirl $r, $dest, 0
To:
   b/bl foo  # bl if r=$ra, b if r=$zero
     R_LARCH_B26
```

This patch fixes the buildbots failuer of lld tests.
Changes: Modify test files: from `sym at plt` to `%plt(sym)`.

InstCombine: Fix a crash in `PointerReplacer` when constructing a new PHI (#130256)

When constructing a PHI node in `PointerReplacer::replace`, the incoming
operands are expected to have already been replaced and in the
replacement map. However, when one of the incoming operands is a load,
the search of the map is unsuccessful, and a nullptr is returned from
`getReplacement`. The reason is that, when a load is replaced, all the
uses of the load has been actually replaced by the new load. It is
useless to insert the original load into the map. Instead, we should
place the new load into the map to meet the expectation of the later map
search.

Fixes: SWDEV-516420

[AMDGPU] Add GFX12 S_ALLOC_VGPR instruction (#130018)

This patch only adds the instruction for disassembly support.

We neither have an instrinsic nor codegen support, and it is unclear
whether we actually want to ever have an intrinsic, given the fragile
semantics.

For now, it will be generated only by the backend in very specific
circumstances.

---------

Co-authored-by: Jannik Silvanus <jannik.silvanus at amd.com>

[RISCV] Remove Predicates from classes in RISCVInstrInfoXTHead.td. NFC

All of instantiations of these classes also specify Predicates
making the base class redundant or unnecessary. The Predicates on the
instantiations aren't always the same as the base class so those
are needed.

Also move the DecoderNamespace to the instantiations for consistency
with the Predicates.

[AArch64][CostModel] Alter sdiv/srem cost where the divisor is constant (#123552)

This patch revises the cost model for sdiv/srem and draws its inspiration from the udiv/urem patch #122236

The typical codegen for the different scenarios has been mentioned as notes/comments in the code itself( this is done owing to lot of scenarios such that it would be difficult to mention them here in the patch description).

[AArch64] Avoid repeated hash lookups (NFC) (#130542)

[CodeGen] Avoid repeated hash lookups (NFC) (#130543)

[alpha.webkit.NoUnretainedMemberChecker] Add a new WebKit checker for unretained member variables and ivars. (#128641)

Add a new WebKit checker for member variables and instance variables of
NS and CF types. A member variable or instance variable to a CF type
should be RetainPtr regardless of whether ARC is enabled or disabled,
and that of a NS type should be RetainPtr when ARC is disabled.

[mlir] Apply ClangTidy finding (NFC)

loop variable is copied but only used as const reference; consider making it a const reference

[clang][NFC] Clean up Expr::EvaluateAsConstantExpr (#130498)

The Info.EnableNewConstInterp case is already handled above.

[Clang] Fix segmentation fault caused by `VarBypassDetector` stack overflow on deeply nested expressions (#124128)

This happens when using `-O2`.

Similarly to #111701
([test](https://github.com/bricknerb/llvm-project/blob/93e4a7386ec897e53d7330c6206d38759a858be2/clang/test/CodeGen/deeply-nested-expressions.cpp)),
not adding a test that reproduces since this test is slow and likely to
be hard to maintained as discussed here and in [previous
discussion](https://github.com/llvm/llvm-project/pull/111701/files/1a63281b6c240352653fd2e4299755c1f32a76f4#r1795518779).
Test that was reverted here:
https://github.com/llvm/llvm-project/pull/124128/commits/d6b5576940d38aadb3293acbff680d1f5d22486c

[mlir][CAPI][python] bind CallSiteLoc, FileLineColRange, FusedLoc, NameLoc (#129351)

This PR extends the python bindings for CallSiteLoc, FileLineColRange,
FusedLoc, NameLoc with field accessors. It also adds the missing
`value.location` accessor.

I also did some "spring cleaning" here (`cast` -> `dyn_cast`) after
running into some of my own illegal casts.

[libunwind][RISCV] Make asm statement volatile (#130286)

Compiling with `O3`, the `early-machinelicm` pass hoisted the asm
statement to a path that has been executed unconditionally during stack
unwinding. On hardware without vector extension support, this resulted
in reading a nonexistent register.

[ADT/Support] Add includes to fix module build

Current Clang complains that 'size_t' / 'reference_wrapper' "must be
declared before it is used."

[Clang][AArch64] Add support for SHF_AARCH64_PURECODE ELF section flag (2/3) (#125688)

Add support for the new SHF_AARCH64_PURECODE ELF section flag:
https://github.com/ARM-software/abi-aa/pull/304

The general implementation follows the existing one for ARM targets.
Simlarly to ARM targets, generating object files with the
`SHF_AARCH64_PURECODE` flag set is enabled by the
`-mexecute-only`/`-mpure-code` driver flag.

Related PRs:
* LLVM: https://github.com/llvm/llvm-project/pull/125687
* LLD: https://github.com/llvm/llvm-project/pull/125689

Revert "[clang] Implement instantiation context note for checking template parameters (#126088)"

This reverts commit a24523ac8dc07f3478311a5969184b922b520395.

This is causing significant compile-time regressions for C++ code, see:
https://github.com/llvm/llvm-project/pull/126088#issuecomment-2704874202

[X86] checkBitcastSrcVectorSize - early return when reach to MaxRecursionDepth. (#130226)

[readobj][Arm][AArch64] Refactor Build Attributes parsing under ELFAtributeParser and add support for AArch64 Build Attributes (#128727)

Refactor readobj to integrate AArch64 Build Attributes under
ELFAttributeParser. ELFAttributeParser now serves as a base class for:
- ELFCompactAttrParser, handling Arm-style attributes with a single
build attribute subsection.
- ELFExtendedAttrParser, handling AArch64-style attributes with multiple
build attribute subsections. This improves code organization and better
aligns with the attribute parsing model.

Add support for parsing AArch64 Build Attributes.

[MCA] Adding missing instructions in AArch64 Neoverse V1 tests (#128892)

Added missing instructions for LLVM Opcodes coverage. It will help to
maintain TableGen scheduling information of AArch64 Neoverse V1.

Follow up of MR ##126703
This is a dispatch of new instructions of the big test:
V1-scheduling-info.s
I have created a new test for special instructions without scheduling
info in Software Optimization Guide: V1-misc-instructions.s

No more asm instruction comments to maintain.

[gn build] Port b1ebfac1859c

[X86] Add test case showing its not always beneficial to fold concat(palignr(),palignr()) -> palignr(concat(),concat())

[lldb] Add more ARM checks in TestLldbGdbServer.py (#130277)

When https://github.com/llvm/llvm-project/pull/130034 enabled RISC-V
here I noticed that these should run for ARM as well.

ARM only has 4 argument registers, which matches Arm's ABI for it:
https://github.com/ARM-software/abi-aa/blob/main/aapcs32/aapcs32.rst#core-registers

The ABI defines a link register LR, and I assume that's what becomes
'ra' in LLDB.

Tested on ARM and AArch64 Linux.

[lldb] Clean up UnwindAssemblyInstEmulation (#129030)

My main motivation was trying to understand how the function and whether
the rows need to be (shared) pointers. I noticed that the function
essentially constructs two copies unwind plans in parallel (the second
being the saved_unwind_states).

If we delay the construction of the unwind plan to the end of the
function, then we never need two copies of a single row (we can just
move it into the final result), so we can just use them as value types.
This makes the overall logic of the function stand out better as it
avoids the laborious deep copies of the Row shared pointer.

I've also noticed that a large portion of the function is devoted to
recomputing certain properties of the unwind state (e.g. the
`m_fp_is_cfa` field). Instead of doing that, this patch just
saves/restores them together with the rest of the state.

[MLIR][py] Add PyThreadPool as wrapper around MlirLlvmThreadPool in MLIR python bindings (#130109)

In some projects like JAX ir.Context are used with disabled multi-threading to avoid
caching multiple threading pools:

https://github.com/jax-ml/jax/blob/623865fe9538100d877ba9d36f788d0f95a11ed2/jax/_src/interpreters/mlir.py#L606-L611

However, when context has enabled multithreading it also uses locks on
the StorageUniquers and this can be helpful to avoid data races in the
multi-threaded execution (for example with free-threaded cpython,
https://github.com/jax-ml/jax/issues/26272).
With this PR user can enable the multi-threading: 1) enables additional
locking and 2) set a shared threading pool such that cached contexts can
have one global pool.

[X86] Add test case showing its not always beneficial to fold concat(pack(),pack()) -> pack(concat(),concat())

[mlir] Refactor ConvertVectorToLLVMPass options (#128219)

The `VectorTransformsOptions` on the `ConvertVectorToLLVMPass` is
currently represented as a struct, which makes it not serialisable. This
means a pass pipeline that contains this pass cannot be represented as
textual form, which breaks reproducer generation and options such as
`--dump-pass-pipeline`.

This PR expands the `VectorTransformsOptions` struct into the two
options that are actually used by the Pass' patterns:
`vector-contract-lowering` and `vector-transpose-lowering` . The other
options present in VectorTransformOptions are not used by any patterns
in this pass.

Additionally, I have changed some interfaces to only take these specific
options over the full options struct as, again, the vector contract and
transpose lowering patterns only need one of their respective options.

Finally, I have added a simple lit test that just prints the pass
pipeline using `--dump-pass-pipeline` to ensure the options on this pass
remain serialisable.

Fixes #129046

[IR] Fix assertion error in User new/delete edge case (#129914)

Fixes #129900

If `operator delete` was called after an unsuccessful constructor call
after `operator new`, we ran into undefined behaviour.
This was discovered by our malfunction tests while preparing an upgrade
to LLVM 20, that explicitly check for such kind of bugs.

[MergeFunc] Check full IR and comdat keys in comdat.ll.

Spelling in lit.cfg.py

[X86] combineConcatVectorOps - convert X86ISD::PALIGNR concatenation to use combineConcatVectorOps recursion (#130572)

Only concatenate X86ISD::PALIGNR nodes if at least one operand is beneficial to concatenate

[flang] Move parser invocations into ParserActions (#130309)

FrontendActions.cpp is currently one of the biggest compilation units in
all of flang. Measuring its compilation gives the following metrics:

User time (seconds): 139.21
System time (seconds): 4.65
Maximum resident set size (kbytes): 5891440 (5.61 GB)

This commit separates out explicit invocations of the parser into a
separate compilation unit - ParserActions.cpp - through helper functions
in order to decrease the maximum compilation time and memory usage of a
single unit.
After the split, the measurements of FrontendActions.cpp are as follows:

User time (seconds): 70.08
System time (seconds): 3.16
Maximum resident set size (kbytes): 3961492 (3.7 GB)

While the ones for the newly created ParserActions.cpp as follows:

User time (seconds): 104.33
System time (seconds): 3.37
Maximum resident set size (kbytes): 4185600 (3.99 GB)

---------

Signed-off-by: Kajetan Puchalski <kajetan.puchalski at arm.com>

[TailDuplicator] Do not restrict the computed gotos (#114990)

Fixes #106846.

This is what I learned from GCC. I found that GCC does not duplicate the
BB that has indirect jumps with the jump table. I believe GCC has
provided a clear explanation here:

> Duplicate the blocks containing computed gotos. This basically
unfactors computed gotos that were factored early on in the compilation
process to speed up edge based data flow. We used to not unfactor them
again, which can seriously pessimize code with many computed jumps in
the source code, such as interpreters.

Revert "[lldb][asan] Add temporary logging to ReportRetriever"

This reverts commit 39a4da20d88d797824f0e7be0f732ccaf0c7eee4.

We skipped the failing tests in `6cc8b0bef07f4270303bec0fc203f251a2fde262`.

[lldb] Remove an extraneous `printf` statement. (#130453)

This was missed in review but is showing up in lldb-dap output.

[Clang][AArch64] Fix typo with colon-separated syntax for system registers (#105608)

The range for Op0 was set to 1 instead of 3.

The description of e493f177eeee84a9c6000ca7c92499233490f1d1 visually
explains the encoding of implementation-defined system registers.


https://github.com/llvm/llvm-project/blob/796787d07c30cb9448e1f9ff3f3da06c2fc96ccd/llvm/lib/Target/AArch64/AArch64SystemOperands.td#L658-L674

Gobolt: https://godbolt.org/z/WK9PqPvGE

Co-authored-by: v01dxyz <v01dxyz at v01d.xyz>

[AMDGPU][NewPM] Port AMDGPUReserveWWMRegs to NPM (#123722)

[X86] Add test case showing its not always beneficial to fold concat(pshufb(),pshufb()) -> pshufb(concat(),concat())

[X86] Improve test coverage for concat(pmaddubsw(),pmaddubsw()) -> pmaddubsw(concat(),concat())

Ensure we have tests for both beneficial/non-beneficial concatenation cases

[AArch64][ELF Parser] Fix out-of-scope variable usage (#130576)

Return a reference to a persistent variable instead of a temporary copy.

[LLVM][SVE] Add isel for scalable vector bfloat copysign operations. (#130098)

[clang] NNS: don't print trailing scope resolution operator in diagnostics (#130529)

This clears up the printing of a NestedNameSpecifier so a trailing '::'
is not printed, unless it refers into the global scope.

This fixes a bunch of diagnostics where the trailing :: was awkward.
This also prints the NNS quoted consistenty.

There is a drive-by improvement to error recovery, where now we print
the actual type instead of `<dependent type>`.

This will clear up further uses of NNS printing in further patches.

AMDGPU: Move enqueued block handling into clang (#128519)

The previous implementation wasn't maintaining a faithful IR
representation of how this really works. The value returned by
createEnqueuedBlockKernel wasn't actually used as a function, and
hacked up later to be a pointer to the runtime handle global
variable. In reality, the enqueued block is a struct where the first
field is a pointer to the kernel descriptor, not the kernel itself. We
were also relying on passing around a reference to a global using a
string attribute containing its name. It's better to base this on a
proper IR symbol reference during final emission.

This now avoids using a function attribute on kernels and avoids using
the additional "runtime-handle" attribute to populate the final
metadata. Instead, associate the runtime handle reference to the
kernel with the !associated global metadata. We can then get a final,
correctly mangled name at the end.

I couldn't figure out how to get rename-with-external-symbol behavior
using a combination of comdats and aliases, so leaves an IR pass to
externalize the runtime handles for codegen. If anything breaks, it's
most likely this, so leave avoiding this for a later step. Use a
special section name to enable this behavior. This also means it's
possible to declare enqueuable kernels in source without going through
the dedicated block syntax or other dedicated compiler support.

We could move towards initializing the runtime handle in the
compiler/linker. I have a working patch where the linker sets up the
first field of the handle, avoiding the need to export the block
kernel symbol for the runtime. We would need new relocations to get
the private and group sizes, but that would avoid the runtime's
special case handling that requires the device_enqueue_symbol metadata
field.

https://reviews.llvm.org/D141700

[RISCV][test] Add test case showing case where machine copy propagation leaves behind a no-op reg move

Pre-commit for #129889.

[AArch64][ELF Parser] Fix out-of-scope variable usage (#130594)

Return a reference to a persistent variable instead of a temporary copy.

[DAG] fold AVGFLOORS to AVGFLOORU for non-negative operand (#84746) (#129678)

Fold ISD::AVGFLOORS to ISD::AVGFLOORU for non-negative operand. Cover test is modified for uhadd with zero extension.

Fixes #84746

Revert "[clang] Fix missing diagnostic of declaration use when accessing TypeDecls through typename access (#129681)"

This caused incorrect -Wunguarded-availability warnings. See comment on
the pull request.

> We were missing a call to DiagnoseUseOfDecl when performing typename
> access.
>
> This refactors the code so that TypeDecl lookups funnel through a helper
> which performs all the necessary checks, removing some related
> duplication on the way.
>
> Fixes #58547
>
> Differential Revision: https://reviews.llvm.org/D136533

This reverts commit 4c4fd6b03149348cf11af245ad2603d24144a9d5.

[X86] combineConcatVectorOps - convert X86ISD::HADD/SUB concatenation to use combineConcatVectorOps recursion (#130579)

Only concatenate X86ISD::HADD/SUB nodes if at least one operand is beneficial to concatenate

[X86][APX] Try to replace non-NF with NF instructions when optimizeCompareInstr (#130488)

https://godbolt.org/z/rWYdqnjjx

[clang] fix matching of nested template template parameters (#130447)

When checking the template template parameters of template template
parameters, the PartialOrdering context was not correctly propagated.

This also has a few drive-by fixes, such as checking the template
parameter lists of template template parameters, which was previously
missing and would have been it's own bug, but we need to fix it in order
to prevent crashes in error recovery in a simple way.

Fixes #130362

[Clang] Force expressions with UO_Not to not be non-negative (#126846)

This PR addresses the bug of not throwing warnings for the following
code:
```c++
int test13(unsigned a, int *b) {
        return a > ~(95 != *b); // expected-warning {{comparison of integers of different signs}}
}
```

However, in the original issue, a comment mentioned that negation,
pre-increment, and pre-decrement operators are also incorrect in this
case.

Fixes #18878

[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568)

The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an
address that is a valid device address. Specifically,
`has_device_addr(x)` means that (in C/C++ terms) `&x` is a device
address.

When entering a target region, `x` does not need to be allocated on the
device, or have its contents copied over (in the absence of additional
mapping clauses). Passing its address verbatim to the region for use is
sufficient, and is the intended goal of the clause.

Some Fortran objects use descriptors in their in-memory representation.
If `x` had a descriptor, both the descriptor and the contents of `x`
would be located in the device memory. However, the descriptors are
managed by the compiler, and can be regenerated at various points as
needed. The address of the effective descriptor may change, hence it's
not safe to pass the address of the descriptor to the target region.
Instead, the descriptor itself is always copied, but for objects like
`x`, no further mapping takes place (as this keeps the storage pointer
in the descriptor unchanged).

---------

Co-authored-by: Sergio Afonso <safonsof at amd.com>

[flang][OpenMP] Accept old FLUSH syntax in METADIRECTIVE (#130122)

Accommodate it in OmpDirectiveSpecification, which may become the
primary component of the actual FLUSH construct in the future.

[MachineCopyPropagation] Recognise and delete no-op moves produced after forwarded uses (#129889)

This change removes 189 static instances of no-op reg-reg moves (i.e.
where src == dest) across llvm-test-suite when compiled for RISC-V
rv64gc and with SPEC included.

[gn build] Port 0d2c55cb9682

[X86] combineConcatVectorOps - convert PSHUFB/PSADBW/VPMADDUBSW/VPMADDUBSW concatenation to use combineConcatVectorOps recursion (#130592)

Only concatenate nodes if at least one operand is beneficial to concatenate

[clang][test] Don't require specific alignment in test case (#130589)

https://github.com/llvm/llvm-project/pull/129952 /
42d49a77241df73a17cb442973702fc460e7fb90 added this test which is
failing on 32-bit ARM because the alignment chosen is 4 not 8. Which
would make sense if this is a 32/64 bit difference

https://lab.llvm.org/buildbot/#/builders/154/builds/13059
```
<stdin>:34:30: note: scanning from here
define dso_local void @_Z1fv(ptr dead_on_unwind noalias writable sret(%struct.B) align 4 %agg.result) #0 {
                             ^
<stdin>:38:2: note: possible intended match here
 %0 = load ptr, ptr @x, align 4
 ^
```
The other test does not check alignment, so I'm assuming that it is not
important here.

[SLP]Reduce number of alternate instruction, where possible

Previous version was reviewed here https://github.com/llvm/llvm-project/pull/123360
It is mostly the same, adjusted after graph-to-tree transformation

Patch tries to remove wide alternate operations.
Currently SLP vectorizer emits something like this:
```
%0 = add i32
%1 = sub i32
%2 = add i32
%3 = sub i32
%4 = add i32
%5 = sub i32
%6 = add i32
%7 = sub i32

transformes to

%v1 = add <8 x i32>
%v2 = sub <8 x i32>
%res = shuffle %v1, %v2, <0, 9, 2, 11, 4, 13, 6, 15>
```
i.e. half of the results are just unused. This leads to increased
register pressure and potentially doubles number of operations.

Patch introduces SplitVectorize mode, where it splits the operations by
opcodes and produces instead something like this:
```
%v1 = add <4 x i32>
%v2 = sub <4 x i32>
%res = shuffle %v1, %v2, <0, 4, 1, 5, 2, 6, 3, 7>
```
It allows to improve the performance by reducing number of ops. Also, it
turns on some other improvements, like improved graph reordering.

-O3+LTO, AVX512
Metric: size..text
Program                                                                         size..text
                                                                                results     results0    diff
                       test-suite :: MultiSource/Benchmarks/Olden/tsp/tsp.test     2788.00     2820.00  1.1%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test   278168.00   280904.00  1.0%
               test-suite :: MultiSource/Benchmarks/FreeBench/pifft/pifft.test    82682.00    83258.00  0.7%
                    test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test   139344.00   139712.00  0.3%
     test-suite :: MultiSource/Benchmarks/FreeBench/fourinarow/fourinarow.test    27149.00    27197.00  0.2%
               test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test  1008188.00  1009948.00  0.2%
          test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test    39226.00    39290.00  0.2%
   test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test    39229.00    39293.00  0.2%
 test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test  2074533.00  2076549.00  0.1%
test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test  2074533.00  2076549.00  0.1%
             test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test   798440.00   798952.00  0.1%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test    44123.00    44139.00  0.0%
                       test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   318942.00   319038.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  1159880.00  1160152.00  0.0%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniAMR/miniAMR.test    73595.00    73611.00  0.0%
                test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  1146124.00  1146348.00  0.0%
       test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/CLAMR/CLAMR.test   203831.00   203847.00  0.0%
 test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test   207662.00   207678.00  0.0%
                test-suite :: External/SPEC/CFP2006/447.dealII/447.dealII.test   589851.00   589883.00  0.0%
      test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1398543.00  1398559.00  0.0%
     test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1398543.00  1398559.00  0.0%
        test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2050990.00  2051006.00  0.0%

      test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12559687.00 12559591.00 -0.0%
                     test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test  3074157.00  3074125.00 -0.0%
         test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  1092252.00  1092188.00 -0.0%
            test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test   779763.00   779715.00 -0.0%
         test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test   253517.00   253485.00 -0.0%
                  test-suite :: MultiSource/Applications/JM/lencod/lencod.test   848259.00   848035.00 -0.0%
     test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test    93064.00    93016.00 -0.1%
                  test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test   383747.00   383475.00 -0.1%
          test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   673051.00   662907.00 -1.5%
           test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   673051.00   662907.00 -1.5%

Olden/tsp - small variations
Prolangs-C/TimberWolfMC - small variations, some code not inlined
FreeBench/pifft - extra store <8 x double> vectorized, some other extra
vectorizations
CFP2006/433.milc - better vector code
FreeBench/fourinarow - better vector code
Benchmarks/tramp3d-v4 - extra vector code, small variations
mediabench/gsm/toast - small variations
MiBench/telecomm-gsm - small variations
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - better vector code, small variations
CINT2006/464.h264ref - some smaller code + changes similar to x264
DOE-ProxyApps-C/miniGMG - small variations
Benchmarks/Bullet - small variations
CFP2017rate/511.povray_r - small variations
DOE-ProxyApps-C/miniAMR - small variations
CFP2006/453.povray - small variations
DOE-ProxyApps-C++/CLAMR - small variations
MiBench/consumer-lame - small variations
CFP2006/447.dealII - small variations
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - small variations
CFP2017rate/510.parest_r - better vector code, small variations
CFP2017rate/526.blender_r - small variations
CINT2006/403.gcc - small variations
CINT2006/400.perlbench - small variations
CFP2017rate/508.namd_r - small variations
ASCI_Purple/SMG2000 - small variations
JM/lencod - extra store <16 x i32>, small variations
DOE-ProxyApps-C++/miniFE - small variations
JM/ldecod - extra vector code, small variations, less shuffles
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - the number of instructions increased, but
looks like they are more performant. E.g., for function
x264_pixel_satd_8x8, llvm-mca reports better throughput - 84 for the
current version and 59 for the new version.

-O3+LTO, mcpu=sifive-p470

Metric: size..text

                                                                               results    results0   diff
                                 test-suite :: External/SPEC/CINT2006/464.h264ref/464.h264ref.test  580768.00  581118.00   0.1%
                                        test-suite :: MultiSource/Applications/d/make_dparser.test   78854.00   78894.00   0.1%
                                      test-suite :: MultiSource/Applications/JM/lencod/lencod.test  633448.00  633750.00   0.0%
                                           test-suite :: MultiSource/Benchmarks/Bullet/bullet.test  277002.00  277080.00   0.0%
                             test-suite :: External/SPEC/CINT2006/400.perlbench/400.perlbench.test  931938.00  931960.00   0.0%
                                         test-suite :: External/SPEC/CINT2006/403.gcc/403.gcc.test 2512806.00 2512822.00   0.0%
                                test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7659880.00 7659876.00  -0.0%
                                 test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7659880.00 7659876.00  -0.0%
                            test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test 1602448.00 1602434.00  -0.0%
                          test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9496664.00 9496542.00  -0.0%
                     test-suite :: MultiSource/Benchmarks/MiBench/consumer-lame/consumer-lame.test  147424.00  147422.00  -0.0%
                    test-suite :: External/SPEC/CINT2017speed/600.perlbench_s/600.perlbench_s.test 1764608.00 1764578.00  -0.0%
                     test-suite :: External/SPEC/CINT2017rate/500.perlbench_r/500.perlbench_r.test 1764608.00 1764578.00  -0.0%
                                     test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test  841656.00  841632.00  -0.0%
                                    test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test  949026.00  948962.00  -0.0%
                            test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test  946348.00  946284.00  -0.0%
                                      test-suite :: MultiSource/Applications/JM/ldecod/ldecod.test  279794.00  279764.00  -0.0%
                       test-suite :: MultiSource/Benchmarks/mediabench/g721/g721encode/encode.test    4776.00    4772.00  -0.1%
                              test-suite :: MultiSource/Benchmarks/mediabench/gsm/toast/toast.test   25074.00   25028.00  -0.2%
                       test-suite :: MultiSource/Benchmarks/MiBench/telecomm-gsm/telecomm-gsm.test   25074.00   25028.00  -0.2%
                         test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C/miniGMG/miniGMG.test   29336.00   29184.00  -0.5%
                               test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test  535390.00  510124.00  -4.7%
                              test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test  535390.00  510124.00  -4.7%
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/ieee/GCC-C-execute-ieee-pr50310.test     886.00     608.00 -31.4%

CINT2006/464.h264ref - extra v16i32 reduction
d/make_dparser - better vector code
JM/lencod - extra v16i32 reduction
Benchmarks/Bullet - smaller vector code
CINT2006/400.perlbench - better vector code
CINT2006/403.gcc - small variations
CINT2017speed/602.gcc_s
CINT2017rate/502.gcc_r - small variations
CFP2017rate/510.parest_r - small variations
CFP2017rate/526.blender_r - small variations
MiBench/consumer-lame - small variations
CINT2017speed/600.perlbench_s
CINT2017rate/500.perlbench_r - small variations
Benchmarks/7zip - small variations
CFP2017rate/511.povray_r - small variations
JM/ldecod - extra vector code
mediabench/g721/g721encode - extra vector code
mediabench/gsm - extra vector code
MiBench/telecomm-gsm - extra vector code
DOE-ProxyApps-C/miniGMG - extra vector code
CINT2017rate/525.x264_r
CINT2017speed/625.x264_s - reduced number of wide operations and
shuffles, saving the registers, similar to X86, extra code in
pixel_hadamard_ac vectorized
ieee/GCC-C-execute-ieee-pr50310 - extra code vectorized

CINT2006/464.h264ref - extra vector code in find_sad_16x16
JM/lencod - extra vector code in find_sad_16x16
d/make_dparser - smaller vector code
Benchmarks/Bullet - small variations
CINT2006/400.perlbench - smaller vector code
CFP2017rate/526.blender_r - small variations, extra store <8 x float> in
the loop, extra store <8 x i8> in loop
CINT2017rate/500.perlbench_r
CINT2017speed/600.perlbench_s - small variations
MiBench/consumer-lame - small variations
JM/ldecod - extra vector code
mediabench/g721/g721encode - small variations

Reviewers: hiraditya

Reviewed By: hiraditya

Pull Request: https://github.com/llvm/llvm-project/pull/128907

Revert "[libc++] Don't try to wait on a thread that hasn't started in std::async (#125433)"

This reverts commit 11766a40972f5cc853e296231e5d90ca3c886cc1.

[ARM] Fix HW thread pointer functionality  (#130027)

- Separate check for hardware support of TLS register from check for support of Thumb2 encoding
- Base decision to auto enable TLS on both hardware support and Thumb2 encoding support
- Fix HW support check to correctly exclude M-Profile and include ARMV6K variants

reference:https://reviews.llvm.org/D114116

[X86] combineConcatVectorOps - add missing VT/Subtarget checks for MOV*DUP concatenation folds.

[clang][SPIR-V] Use the SPIR-V backend by default (#129545)

The SPIR-V backend is now a supported backend, and we believe it is
ready to be used by default in Clang over the SPIR-V translator.

Some IR generated by Clang today, such as those requiring SPIR-V target
address spaces, cannot be compiled by the translator for reasons in this
[RFC](https://discourse.llvm.org/t/rfc-the-spir-v-backend-should-change-its-address-space-mappings/82640),
so we expect even more programs to work as well.

Enable it by default, but keep some of the code as it is still called by
the HIP toolchain directly.

---------

Signed-off-by: Sarnie, Nick <nick.sarnie at intel.com>

[OpenMP] Mark Failing OpenMP Tests as XFAIL on Windows (#129040)

This patch marks specific OpenMP runtime tests as XFAIL on Windows due
to failures reported in #129023

[LLD][COFF] Add /noexp for link.exe compatibility (#128814)

See #107346

[mlir] Fix bazel build after f3dcc0fe228f6a1a69147ead0a76ce0fe02d316d

[mlir][TOSA] Fix linalg lowering of depthwise conv2d (#130293)

Current lowering for tosa.depthwise_conv2d assumes if both zero points
are zero then it's a floating-point operation by hardcoding the use of a
arith.addf in the lowered code. Fix code to check for the element type
to decide what add operation to use.

[OpenACC] Implement 'bind' ast/sema for 'routine' directive

The 'bind' clause allows the renaming of a function during code
generation.  There are a few rules about when this can/cannot happen,
and it takes either a string or identifier (previously mis-implemetned
as ID-expression) argument.

Note there are additional rules to this in the implicit-function routine
case, but that isn't implemented in this patch, as implicit-function
routine is not yet implemented either.

[clang][bytecode] Fix builtin_memcmp buffer sizes for pointers (#130570)

Don't use the pointer size, but the number of elements multiplied by the
element size.

[libc++][docs] Remove mis-added entry for P2513R4 (#130581)

P2513R4 neither touched library wording nor required library
implementation to change. So it was probably a mistake to list it in
libc++'s implementation status table.

[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute

FPSCR and FPEXC will be stored in FPStatusRegs, after GPRCS2 has been
saved.

- GPRCS1
- GPRCS2
- FPStatusRegs (new)
- DPRCS
- GPRCS3
- DPRCS2

FPSCR is present on all targets with a VFP, but the FPEXC register is
not present on Cortex-M devices, so different amounts of bytes are
being pushed onto the stack depending on our target, which would
affect alignment for subsequent saves.

DPRCS1 will sum up all previous bytes that were saved, and will emit
extra instructions to ensure that its alignment is correct. My
assumption is that if DPRCS1 is able to correct its alignment to be
correct, then all subsequent saves will also have correct alignment.

Avoid annotating the saving of FPSCR and FPEXC for functions marked
with the interrupt_save_fp attribute, even though this is done as part
of frame setup.  Since these are status registers, there really is no
viable way of annotating this. Since these aren't GPRs or DPRs, they
can't be used with .save or .vsave directives. Instead, just record
that the intermediate registers r4 and r5 are saved to the stack
again.

Co-authored-by: Jake Vossen <jake at vossen.dev>
Co-authored-by: Alan Phipps <a-phipps at ti.com>

Revert "[ARM][Thumb] Save FPSCR + FPEXC for save-vfp attribute"

This reverts commit 1f05703176d43a339b41a474f51c0e8b1a83c9bb.

[HLSL][Driver] Use temporary files correctly (#130436)

This updates the DXV and Metal Converter actions to properly use
temporary files created by the driver. I've abstracted away a check to
determine if an action is the last in the sequence because we may have
between 1 and 3 actions depending on the arguments and environment.

AMDGPU: Rename variable from undef to poison (#130460)

StructurizeCFG: Use poison instead of undef (#130459)

There are a surprising number of codegen changes from this.

Revert "Reland "[clang] Lower modf builtin using `llvm.modf` intrinsic" (#129885)"

This broke modff calls on 32-bit x86 Windows. See comment on the PR.

> This updates the existing modf[f|l] builtin to be lowered via the
> llvm.modf.* intrinsic (rather than directly to a library call).
>
> The legalization issues exposed by the original PR (#126750) should have
> been fixed in #128055 and #129264.

This reverts commit cd1d9a8fab05524a27ffdb251f6def37786b5cc1.

[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811)

Relates to
https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459

Forbid co_await and co_yield in invalid expr contexts (#130455)

Fix #78426 

C++26 introduced braced initializer lists as template arguments.
However, such contexts should be considered invalid for co_await and
co_yield. This commit explicitly rules out the possibility of using
these exprs in template arguments.

---------

Co-authored-by: cor3ntin <corentinjabot at gmail.com>

[X86] Add test cases showing its not always beneficial to fold concat(add/mul(),add/mul()) -> add/mul(concat(),concat())

[lldb] fix set SBLineEntryColumn (#130435)

Calling the public API `SBLineEntry::SetColumn()` sets the row instead
of the column.

This probably should be backported as it has been since version 3.4.

---------

Co-authored-by: Jonas Devlieghere <jonas at devlieghere.com>

[ADT] Use `adl_begin`/`adl_end` in `make_filter_range` (#130512)

This is to make sure that ADT helpers consistently use argument
dependent lookup when dealing with input ranges.

This was a part of https://github.com/llvm/llvm-project/pull/87936 but
reverted due to buildbot failures.

Also fix potential issue with double-move on the input range.

[Libc] Turn implicit to explicit conversion (#130615)

This fixes a build issue on the AMDGPU libc bot after
https://github.com/llvm/llvm-project/pull/126846 landed that introduced
a warning.

---------

Co-authored-by: Joseph Huber <huberjn at outlook.com>

[X86] combineConcatVectorOps - convert ADD/SUB/MUL concatenation to use combineConcatVectorOps recursion

Only concatenate ADD/SUB/MUL nodes if at least one operand is beneficial to concatenate

[mlir] Fix bazel build after f3dcc0f TD files

[mlir][SparseTensor][NFC] Migrate to OpAsmAttrInterface for ASM alias generation (#130483)

After the introduction of `OpAsmAttrInterface`, it is favorable to
migrate code using `OpAsmDialectInterface` for ASM alias generation,
which lives in `Dialect.cpp`, to use `OpAsmAttrInterface`, which lives
in `Attrs.td`. In this way, attribute behavior is placed near its
tablegen definition and people won't need to go through other files to
know what other (unexpected) hooks comes into play.

[libc] Fix implicit conversion warnings. (#130635)

[flang][OpenMP] Parse cancel-directive-name as clause (#130146)

The cancellable construct names on CANCEL or CANCELLATION POINT
directives are actually clauses (with the same names as the
corresponding constructs).

Instead of parsing them into a custom structure, parse them as a clause,
which will make CANCEL/CANCELLATION POINT follow the same uniform scheme
as other constructs (<directive> [(<arguments>)] [clauses]).

[lldb-dap] Migrating terminated statistics to the event body. (#130454)

Per the DAP spec, the event 'body' field should contain any additional
data related to the event. I updated the lldb-dap 'statistics' extension
into the terminated event's body like:

```
{
  "type": "event",
  "seq": 0,
  "event": "terminated",
  "body": {
    "$__lldb_statistics": {...}
  }
}
```

This allows us to more uniformly handle event messages.

[AMDGPU] Fix test failures when expensive checks are enabled

This PR fixes test failures introduced in #127353 when expensive checkes are
enabled.

>From dd09b8ded93da7b4230ddb67b19145219d285269 Mon Sep 17 00:00:00 2001
From: Shilei Tian <i at tianshilei.me>
Date: Mon, 10 Mar 2025 13:41:33 -0400
Subject: [PATCH] [AMDGPU] Fix test failures when expensive checks are enabled

This PR fixes test failures introduced in #127353 when expensive checkes are
enabled.
---
 .../materialize-frame-index-sgpr.gfx10.ll     | 1124 ++++++++++++++---
 .../AMDGPU/materialize-frame-index-sgpr.ll    |  314 +++--
 .../schedule-amdgpu-tracker-physreg-crash.ll  |    2 +-
 3 files changed, 1093 insertions(+), 347 deletions(-)

diff --git a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
index 4ca00f2daf97a..4b5a7c207055a 100644
--- a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
+++ b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.gfx10.ll
@@ -12,7 +12,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_1-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_1-NEXT:    ;;#ASMSTART
@@ -20,16 +26,28 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_1-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_3-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_3-NEXT:    ;;#ASMSTART
@@ -37,17 +55,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_3-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT:    scratch_store_b32 off, v1, s1 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX11-NEXT:    s_add_i32 s0, s32, 64
-; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX11-NEXT:    s_addc_u32 s0, s32, 0x4040
@@ -57,10 +85,16 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX11-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX11-NEXT:    s_bitset0_b32 s0, 0
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT:    scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
@@ -70,7 +104,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v1, s32 offset:16388 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
+; GFX12-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX12-NEXT:    s_add_co_ci_u32 s0, s32, 0x4000
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s32
 ; GFX12-NEXT:    s_wait_alu 0xfffe
@@ -80,34 +120,54 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX12-NEXT:    ; use alloca0 v0
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s0
+; GFX12-NEXT:    s_mov_b32 s55, s0
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v1, off, s32 offset:16388 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
+; GFX8-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX8-NEXT:    ;;#ASMSTART
 ; GFX8-NEXT:    ; use alloca0 v0
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT:    s_movk_i32 s59, 0x4040
-; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s59, v0
+; GFX8-NEXT:    s_movk_i32 s55, 0x4040
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s55, v0
+; GFX8-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
 ; GFX900-NEXT:    ;;#ASMSTART
@@ -115,34 +175,52 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc() #0 {
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX900-NEXT:    v_add_u32_e32 v0, 0x4040, v0
+; GFX900-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX900-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT:    scratch_store_dword off, v1, s2 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_add_i32 s0, s32, 64
 ; GFX942-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX942-NEXT:    s_and_b64 s[0:1], 0, exec
 ; GFX942-NEXT:    s_addc_u32 s0, s32, 0x4040
 ; GFX942-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX942-NEXT:    s_bitset0_b32 s0, 0
+; GFX942-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; use alloca0 v0
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_mov_b32 s59, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT:    scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
   %alloca1 = alloca i32, align 4, addrspace(5)
   call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1, i32 0)
   ret void
 }
 
@@ -152,36 +230,65 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
-; GFX10_1-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_1-NEXT:    s_addk_i32 s59, 0x4040
+; GFX10_1-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_1-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_1-NEXT:    ;;#ASMSTART
 ; GFX10_1-NEXT:    ; use alloca0 v0
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59
+; GFX10_1-NEXT:    ; use s55
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_1-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
-; GFX10_3-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_3-NEXT:    s_addk_i32 s59, 0x4040
+; GFX10_3-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_3-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_3-NEXT:    ;;#ASMSTART
 ; GFX10_3-NEXT:    ; use alloca0 v0
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59
+; GFX10_3-NEXT:    ; use s55
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80880
+; GFX10_3-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT:    scratch_store_b32 off, v1, s1 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX11-NEXT:    s_add_i32 s0, s32, 64
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s0
@@ -189,10 +296,16 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; use alloca0 v0
 ; GFX11-NEXT:    ;;#ASMEND
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59
+; GFX11-NEXT:    ; use s55
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4044
+; GFX11-NEXT:    scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_dead_scc:
@@ -202,67 +315,110 @@ define void @scalar_mov_materializes_frame_index_dead_scc() #0 {
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v1, s32 offset:16388 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX12-NEXT:    s_add_co_i32 s0, s32, 0x4000
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s32
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 s55, s0
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; use alloca0 v0
 ; GFX12-NEXT:    ;;#ASMEND
-; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s0
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59
+; GFX12-NEXT:    ; use s55
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v1, off, s32 offset:16388 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX8-NEXT:    s_lshr_b32 s55, s32, 6
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT:    s_lshr_b32 s59, s32, 6
+; GFX8-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
 ; GFX8-NEXT:    ;;#ASMSTART
 ; GFX8-NEXT:    ; use alloca0 v0
 ; GFX8-NEXT:    ;;#ASMEND
-; GFX8-NEXT:    s_addk_i32 s59, 0x4040
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59
+; GFX8-NEXT:    ; use s55
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX8-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX900-NEXT:    s_lshr_b32 s55, s32, 6
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX900-NEXT:    s_lshr_b32 s59, s32, 6
+; GFX900-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; use alloca0 v0
 ; GFX900-NEXT:    ;;#ASMEND
-; GFX900-NEXT:    s_addk_i32 s59, 0x4040
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59
+; GFX900-NEXT:    ; use s55
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101100
+; GFX900-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_dead_scc:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT:    scratch_store_dword off, v1, s2 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_add_i32 s0, s32, 64
 ; GFX942-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX942-NEXT:    s_add_i32 s0, s32, 0x4040
+; GFX942-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; use alloca0 v0
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_mov_b32 s59, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59
+; GFX942-NEXT:    ; use s55
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4044
+; GFX942-NEXT:    scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
   %alloca1 = alloca i32, align 4, addrspace(5)
   call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
-  call void asm sideeffect "; use $0", "{s59}"(ptr addrspace(5) %alloca1)
+  call void asm sideeffect "; use $0", "{s55}"(ptr addrspace(5) %alloca1)
   ret void
 }
 
@@ -272,8 +428,14 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_1-NEXT:    s_mov_b32 s5, s33
 ; GFX10_1-NEXT:    s_mov_b32 s33, s32
-; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x81000
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80880
+; GFX10_1-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
+; GFX10_1-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x81000
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_1-NEXT:    s_mov_b32 s32, s33
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
@@ -281,12 +443,19 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX10_1-NEXT:    ; use alloca0 v0
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
-; GFX10_1-NEXT:    s_mov_b32 s33, s5
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_1-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80880
+; GFX10_1-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_mov_b32 s33, s5
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_fp:
@@ -294,8 +463,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_3-NEXT:    s_mov_b32 s5, s33
 ; GFX10_3-NEXT:    s_mov_b32 s33, s32
-; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x81000
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80880
+; GFX10_3-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
+; GFX10_3-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x81000
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_3-NEXT:    s_mov_b32 s32, s33
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
@@ -303,12 +477,18 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX10_3-NEXT:    ; use alloca0 v0
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
-; GFX10_3-NEXT:    s_mov_b32 s33, s5
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 0x4040, v0
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_3-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80880
+; GFX10_3-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_mov_b32 s33, s5
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_fp:
@@ -316,9 +496,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    s_mov_b32 s1, s33
 ; GFX11-NEXT:    s_mov_b32 s33, s32
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4044
+; GFX11-NEXT:    scratch_store_b32 off, v1, s2 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX11-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX11-NEXT:    s_add_i32 s0, s33, 64
-; GFX11-NEXT:    s_mov_b32 s32, s33
+; GFX11-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX11-NEXT:    s_addc_u32 s0, s33, 0x4040
@@ -327,11 +511,18 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX11-NEXT:    ;;#ASMEND
 ; GFX11-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX11-NEXT:    s_bitset0_b32 s0, 0
-; GFX11-NEXT:    s_mov_b32 s33, s1
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_mov_b32 s32, s33
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4044
+; GFX11-NEXT:    scratch_load_b32 v1, off, s2 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_mov_b32 s33, s1
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_fp:
@@ -343,9 +534,13 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    s_mov_b32 s1, s33
 ; GFX12-NEXT:    s_mov_b32 s33, s32
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v1, s33 offset:16388 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX12-NEXT:    s_addk_co_i32 s32, 0x4040
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
-; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX12-NEXT:    s_add_co_ci_u32 s0, s33, 0x4000
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s33
 ; GFX12-NEXT:    s_wait_alu 0xfffe
@@ -355,12 +550,18 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX12-NEXT:    ; use alloca0 v0
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s0
+; GFX12-NEXT:    s_mov_b32 s55, s0
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v1, 0
 ; GFX12-NEXT:    s_mov_b32 s32, s33
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v1, off, s33 offset:16388 ; 4-byte Folded Reload
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX12-NEXT:    s_mov_b32 s33, s1
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_wait_alu 0xfffe
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -369,22 +570,33 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 s6, s33
 ; GFX8-NEXT:    s_mov_b32 s33, s32
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s7, s33, 0x101100
+; GFX8-NEXT:    buffer_store_dword v1, off, s[0:3], s7 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
+; GFX8-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX8-NEXT:    ;;#ASMSTART
 ; GFX8-NEXT:    ; use alloca0 v0
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
-; GFX8-NEXT:    s_movk_i32 s59, 0x4040
+; GFX8-NEXT:    s_movk_i32 s55, 0x4040
+; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s55, v0
 ; GFX8-NEXT:    s_add_i32 s32, s32, 0x102000
-; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s59, v0
+; GFX8-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v1, 0
 ; GFX8-NEXT:    s_mov_b32 s32, s33
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s7, s33, 0x101100
+; GFX8-NEXT:    buffer_load_dword v1, off, s[0:3], s7 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    s_mov_b32 s33, s6
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_fp:
@@ -392,21 +604,32 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    s_mov_b32 s6, s33
 ; GFX900-NEXT:    s_mov_b32 s33, s32
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s7, s33, 0x101100
+; GFX900-NEXT:    buffer_store_dword v1, off, s[0:3], s7 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; use alloca0 v0
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
-; GFX900-NEXT:    s_add_i32 s32, s32, 0x102000
 ; GFX900-NEXT:    v_add_u32_e32 v0, 0x4040, v0
+; GFX900-NEXT:    s_add_i32 s32, s32, 0x102000
+; GFX900-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX900-NEXT:    v_readfirstlane_b32 s55, v0
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v1, 0
 ; GFX900-NEXT:    s_mov_b32 s32, s33
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s7, s33, 0x101100
+; GFX900-NEXT:    buffer_load_dword v1, off, s[0:3], s7 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    s_mov_b32 s33, s6
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_fp:
@@ -414,6 +637,10 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    s_mov_b32 s2, s33
 ; GFX942-NEXT:    s_mov_b32 s33, s32
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s3, s33, 0x4044
+; GFX942-NEXT:    scratch_store_dword off, v1, s3 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX942-NEXT:    s_add_i32 s0, s33, 64
 ; GFX942-NEXT:    v_mov_b32_e32 v0, s0
@@ -421,20 +648,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_fp() #1 {
 ; GFX942-NEXT:    s_addc_u32 s0, s33, 0x4040
 ; GFX942-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX942-NEXT:    s_bitset0_b32 s0, 0
+; GFX942-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; use alloca0 v0
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_mov_b32 s59, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v1, 0
 ; GFX942-NEXT:    s_mov_b32 s32, s33
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s3, s33, 0x4044
+; GFX942-NEXT:    scratch_load_dword v1, off, s3 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_mov_b32 s33, s2
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
   %alloca1 = alloca i32, align 4, addrspace(5)
   call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1, i32 0)
   ret void
 }
 
@@ -442,39 +676,75 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset()
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_1-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_lshrrev_b32_e64 v1, 5, s32
+; GFX10_1-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
-; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_1-NEXT:    v_add_nc_u32_e32 v1, 64, v1
+; GFX10_1-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_1-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_3-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_lshrrev_b32_e64 v1, 5, s32
+; GFX10_3-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
-; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_3-NEXT:    v_add_nc_u32_e32 v1, 64, v1
+; GFX10_3-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_3-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4040
+; GFX11-NEXT:    scratch_store_b32 off, v0, s1 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_2) | instid1(SALU_CYCLE_1)
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
+; GFX11-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX11-NEXT:    s_addc_u32 s0, s32, 64
-; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_1) | instid1(SALU_CYCLE_1)
 ; GFX11-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX11-NEXT:    s_bitset0_b32 s0, 0
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4040
+; GFX11-NEXT:    scratch_load_b32 v0, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
@@ -484,53 +754,97 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset()
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v0, s32 offset:16384 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX12-NEXT:    s_mov_b32 s55, s32
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
-; GFX12-NEXT:    s_mov_b32 s59, s32
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v0, off, s32 offset:16384 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT:    s_mov_b32 s59, 64
-; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s59, v0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX8-NEXT:    v_lshrrev_b32_e64 v1, 6, s32
+; GFX8-NEXT:    s_mov_b32 s55, 64
+; GFX8-NEXT:    v_add_u32_e32 v1, vcc, s55, v1
+; GFX8-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX900-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    v_lshrrev_b32_e64 v1, 6, s32
+; GFX900-NEXT:    v_add_u32_e32 v1, 64, v1
+; GFX900-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX900-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX900-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4040
+; GFX942-NEXT:    scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_and_b64 s[0:1], 0, exec
 ; GFX942-NEXT:    s_addc_u32 s0, s32, 64
 ; GFX942-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX942-NEXT:    s_bitset0_b32 s0, 0
-; GFX942-NEXT:    s_mov_b32 s59, s0
+; GFX942-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4040
+; GFX942-NEXT:    scratch_load_dword v0, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca0, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca0, i32 0)
   ret void
 }
 
@@ -538,32 +852,67 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset() #0
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10_1-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_1-NEXT:    s_add_i32 s59, s59, 64
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_1-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_1-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_1-NEXT:    s_add_i32 s55, s55, 64
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59
+; GFX10_1-NEXT:    ; use s55
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_1-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX10_3-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_3-NEXT:    s_add_i32 s59, s59, 64
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_3-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_3-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_3-NEXT:    s_add_i32 s55, s55, 64
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59
+; GFX10_3-NEXT:    ; use s55
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x80800
+; GFX10_3-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4040
+; GFX11-NEXT:    scratch_store_b32 off, v0, s1 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX11-NEXT:    s_add_i32 s0, s32, 64
-; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_1) | instid1(VALU_DEP_1)
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59
+; GFX11-NEXT:    ; use s55
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x4040
+; GFX11-NEXT:    scratch_load_b32 v0, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
@@ -573,44 +922,88 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset() #0
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_mov_b32 s59, s32
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v0, s32 offset:16384 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX12-NEXT:    s_mov_b32 s55, s32
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59
+; GFX12-NEXT:    ; use s55
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
+; GFX12-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v0, off, s32 offset:16384 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX8-NEXT:    s_lshr_b32 s59, s32, 6
-; GFX8-NEXT:    s_add_i32 s59, s59, 64
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX8-NEXT:    s_lshr_b32 s55, s32, 6
+; GFX8-NEXT:    s_add_i32 s55, s55, 64
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59
+; GFX8-NEXT:    ; use s55
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX900-NEXT:    s_lshr_b32 s59, s32, 6
-; GFX900-NEXT:    s_add_i32 s59, s59, 64
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX900-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX900-NEXT:    s_lshr_b32 s55, s32, 6
+; GFX900-NEXT:    s_add_i32 s55, s55, 64
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59
+; GFX900-NEXT:    ; use s55
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x101000
+; GFX900-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4040
+; GFX942-NEXT:    scratch_store_dword off, v0, s2 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_add_i32 s0, s32, 64
-; GFX942-NEXT:    s_mov_b32 s59, s0
+; GFX942-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59
+; GFX942-NEXT:    ; use s55
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x4040
+; GFX942-NEXT:    scratch_load_dword v0, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
-  call void asm sideeffect "; use $0", "{s59}"(ptr addrspace(5) %alloca0)
+  call void asm sideeffect "; use $0", "{s55}"(ptr addrspace(5) %alloca0)
   ret void
 }
 
@@ -620,16 +1013,29 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_1-NEXT:    s_mov_b32 s5, s33
 ; GFX10_1-NEXT:    s_mov_b32 s33, s32
-; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x80800
-; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_1-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_lshrrev_b32_e64 v1, 5, s33
+; GFX10_1-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x81000
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_1-NEXT:    s_mov_b32 s32, s33
-; GFX10_1-NEXT:    s_mov_b32 s33, s5
-; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_1-NEXT:    v_add_nc_u32_e32 v1, 64, v1
+; GFX10_1-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_1-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_mov_b32 s33, s5
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp:
@@ -637,16 +1043,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_3-NEXT:    s_mov_b32 s5, s33
 ; GFX10_3-NEXT:    s_mov_b32 s33, s32
-; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x80800
-; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s33
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_3-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_lshrrev_b32_e64 v1, 5, s33
+; GFX10_3-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x81000
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_3-NEXT:    s_mov_b32 s32, s33
-; GFX10_3-NEXT:    s_mov_b32 s33, s5
-; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX10_3-NEXT:    v_add_nc_u32_e32 v1, 64, v1
+; GFX10_3-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_3-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_mov_b32 s33, s5
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp:
@@ -654,17 +1071,29 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    s_mov_b32 s1, s33
 ; GFX11-NEXT:    s_mov_b32 s33, s32
-; GFX11-NEXT:    s_addk_i32 s32, 0x4040
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4040
+; GFX11-NEXT:    scratch_store_b32 off, v0, s2 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
+; GFX11-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX11-NEXT:    s_addc_u32 s0, s33, 64
 ; GFX11-NEXT:    s_mov_b32 s32, s33
 ; GFX11-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX11-NEXT:    s_bitset0_b32 s0, 0
-; GFX11-NEXT:    s_mov_b32 s33, s1
-; GFX11-NEXT:    s_mov_b32 s59, s0
+; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
+; GFX11-NEXT:    s_mov_b32 s55, s0
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4040
+; GFX11-NEXT:    scratch_load_b32 v0, off, s2 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_mov_b32 s33, s1
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp:
@@ -676,15 +1105,25 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    s_mov_b32 s1, s33
 ; GFX12-NEXT:    s_mov_b32 s33, s32
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v0, s33 offset:16384 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    v_writelane_b32 v0, s55, 0
 ; GFX12-NEXT:    s_addk_co_i32 s32, 0x4040
+; GFX12-NEXT:    s_mov_b32 s55, s33
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
-; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s33
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX12-NEXT:    s_mov_b32 s32, s33
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v0, off, s33 offset:16384 ; 4-byte Folded Reload
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX12-NEXT:    s_mov_b32 s33, s1
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_wait_alu 0xfffe
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -693,17 +1132,28 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 s6, s33
 ; GFX8-NEXT:    s_mov_b32 s33, s32
-; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
-; GFX8-NEXT:    s_mov_b32 s59, 64
-; GFX8-NEXT:    s_add_i32 s32, s32, 0x101000
-; GFX8-NEXT:    v_add_u32_e32 v0, vcc, s59, v0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s7, s33, 0x101000
+; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s7 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX8-NEXT:    v_lshrrev_b32_e64 v1, 6, s33
+; GFX8-NEXT:    s_mov_b32 s55, 64
+; GFX8-NEXT:    v_add_u32_e32 v1, vcc, s55, v1
+; GFX8-NEXT:    s_add_i32 s32, s32, 0x102000
+; GFX8-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX8-NEXT:    s_mov_b32 s32, s33
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s7, s33, 0x101000
+; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s7 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    s_mov_b32 s33, s6
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp:
@@ -711,16 +1161,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    s_mov_b32 s6, s33
 ; GFX900-NEXT:    s_mov_b32 s33, s32
-; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s33
-; GFX900-NEXT:    s_add_i32 s32, s32, 0x101000
-; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s7, s33, 0x101000
+; GFX900-NEXT:    buffer_store_dword v0, off, s[0:3], s7 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    v_lshrrev_b32_e64 v1, 6, s33
+; GFX900-NEXT:    v_add_u32_e32 v1, 64, v1
+; GFX900-NEXT:    s_add_i32 s32, s32, 0x102000
+; GFX900-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX900-NEXT:    v_readfirstlane_b32 s55, v1
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v0
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX900-NEXT:    s_mov_b32 s32, s33
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s7, s33, 0x101000
+; GFX900-NEXT:    buffer_load_dword v0, off, s[0:3], s7 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    s_mov_b32 s33, s6
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp:
@@ -728,20 +1189,31 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc_small_offset_fp
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    s_mov_b32 s2, s33
 ; GFX942-NEXT:    s_mov_b32 s33, s32
-; GFX942-NEXT:    s_addk_i32 s32, 0x4040
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s3, s33, 0x4040
+; GFX942-NEXT:    scratch_store_dword off, v0, s3 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX942-NEXT:    s_and_b64 s[0:1], 0, exec
 ; GFX942-NEXT:    s_addc_u32 s0, s33, 64
 ; GFX942-NEXT:    s_bitcmp1_b32 s0, 0
 ; GFX942-NEXT:    s_bitset0_b32 s0, 0
-; GFX942-NEXT:    s_mov_b32 s59, s0
+; GFX942-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s0
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX942-NEXT:    s_mov_b32 s32, s33
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s3, s33, 0x4040
+; GFX942-NEXT:    scratch_load_dword v0, off, s3 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
 ; GFX942-NEXT:    s_mov_b32 s33, s2
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca0, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca0, i32 0)
   ret void
 }
 
@@ -751,14 +1223,27 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_1-NEXT:    s_mov_b32 s4, s33
 ; GFX10_1-NEXT:    s_mov_b32 s33, s32
-; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x80800
-; GFX10_1-NEXT:    s_lshr_b32 s59, s33, 5
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s5, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_1-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s5
+; GFX10_1-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_1-NEXT:    s_add_i32 s32, s32, 0x81000
+; GFX10_1-NEXT:    s_lshr_b32 s55, s33, 5
 ; GFX10_1-NEXT:    s_mov_b32 s32, s33
-; GFX10_1-NEXT:    s_add_i32 s59, s59, 64
+; GFX10_1-NEXT:    s_add_i32 s55, s55, 64
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59
+; GFX10_1-NEXT:    ; use s55
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s5, -1
+; GFX10_1-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_1-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s5
 ; GFX10_1-NEXT:    s_mov_b32 s33, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset_fp:
@@ -766,14 +1251,25 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10_3-NEXT:    s_mov_b32 s4, s33
 ; GFX10_3-NEXT:    s_mov_b32 s33, s32
-; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x80800
-; GFX10_3-NEXT:    s_lshr_b32 s59, s33, 5
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s5, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_3-NEXT:    buffer_store_dword v0, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s5
+; GFX10_3-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX10_3-NEXT:    s_add_i32 s32, s32, 0x81000
+; GFX10_3-NEXT:    s_lshr_b32 s55, s33, 5
 ; GFX10_3-NEXT:    s_mov_b32 s32, s33
-; GFX10_3-NEXT:    s_add_i32 s59, s59, 64
+; GFX10_3-NEXT:    s_add_i32 s55, s55, 64
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59
+; GFX10_3-NEXT:    ; use s55
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s5, -1
+; GFX10_3-NEXT:    s_add_i32 s6, s33, 0x80800
+; GFX10_3-NEXT:    buffer_load_dword v0, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s5
 ; GFX10_3-NEXT:    s_mov_b32 s33, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset_fp:
@@ -781,14 +1277,25 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX11-NEXT:    s_mov_b32 s0, s33
 ; GFX11-NEXT:    s_mov_b32 s33, s32
-; GFX11-NEXT:    s_addk_i32 s32, 0x4040
+; GFX11-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4040
+; GFX11-NEXT:    scratch_store_b32 off, v0, s2 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s1
+; GFX11-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX11-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX11-NEXT:    s_add_i32 s1, s33, 64
 ; GFX11-NEXT:    s_mov_b32 s32, s33
-; GFX11-NEXT:    s_mov_b32 s59, s1
+; GFX11-NEXT:    s_mov_b32 s55, s1
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59
+; GFX11-NEXT:    ; use s55
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX11-NEXT:    s_add_i32 s2, s33, 0x4040
+; GFX11-NEXT:    scratch_load_b32 v0, off, s2 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s1
 ; GFX11-NEXT:    s_mov_b32 s33, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset_fp:
@@ -800,14 +1307,24 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
 ; GFX12-NEXT:    s_mov_b32 s0, s33
 ; GFX12-NEXT:    s_mov_b32 s33, s32
-; GFX12-NEXT:    s_addk_co_i32 s32, 0x4040
+; GFX12-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX12-NEXT:    scratch_store_b32 off, v0, s33 offset:16384 ; 4-byte Folded Spill
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s33
+; GFX12-NEXT:    s_mov_b32 exec_lo, s1
+; GFX12-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX12-NEXT:    s_addk_co_i32 s32, 0x4040
+; GFX12-NEXT:    s_mov_b32 s55, s33
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59
+; GFX12-NEXT:    ; use s55
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    s_mov_b32 s32, s33
+; GFX12-NEXT:    v_readlane_b32 s55, v0, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX12-NEXT:    scratch_load_b32 v0, off, s33 offset:16384 ; 4-byte Folded Reload
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s1
 ; GFX12-NEXT:    s_mov_b32 s33, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_wait_alu 0xfffe
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
@@ -816,14 +1333,25 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_mov_b32 s4, s33
 ; GFX8-NEXT:    s_mov_b32 s33, s32
-; GFX8-NEXT:    s_add_i32 s32, s32, 0x101000
-; GFX8-NEXT:    s_lshr_b32 s59, s33, 6
-; GFX8-NEXT:    s_add_i32 s59, s59, 64
+; GFX8-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; GFX8-NEXT:    s_add_i32 s5, s33, 0x101000
+; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[6:7]
+; GFX8-NEXT:    s_add_i32 s32, s32, 0x102000
+; GFX8-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX8-NEXT:    s_lshr_b32 s55, s33, 6
+; GFX8-NEXT:    s_add_i32 s55, s55, 64
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59
+; GFX8-NEXT:    ; use s55
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX8-NEXT:    s_mov_b32 s32, s33
+; GFX8-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; GFX8-NEXT:    s_add_i32 s5, s33, 0x101000
+; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[6:7]
 ; GFX8-NEXT:    s_mov_b32 s33, s4
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset_fp:
@@ -831,14 +1359,25 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    s_mov_b32 s4, s33
 ; GFX900-NEXT:    s_mov_b32 s33, s32
-; GFX900-NEXT:    s_add_i32 s32, s32, 0x101000
-; GFX900-NEXT:    s_lshr_b32 s59, s33, 6
-; GFX900-NEXT:    s_add_i32 s59, s59, 64
+; GFX900-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; GFX900-NEXT:    s_add_i32 s5, s33, 0x101000
+; GFX900-NEXT:    buffer_store_dword v0, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[6:7]
+; GFX900-NEXT:    s_add_i32 s32, s32, 0x102000
+; GFX900-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX900-NEXT:    s_lshr_b32 s55, s33, 6
+; GFX900-NEXT:    s_add_i32 s55, s55, 64
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59
+; GFX900-NEXT:    ; use s55
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX900-NEXT:    s_mov_b32 s32, s33
+; GFX900-NEXT:    s_xor_saveexec_b64 s[6:7], -1
+; GFX900-NEXT:    s_add_i32 s5, s33, 0x101000
+; GFX900-NEXT:    buffer_load_dword v0, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[6:7]
 ; GFX900-NEXT:    s_mov_b32 s33, s4
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_available_scc_small_offset_fp:
@@ -846,17 +1385,28 @@ define void @scalar_mov_materializes_frame_index_available_scc_small_offset_fp()
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX942-NEXT:    s_mov_b32 s0, s33
 ; GFX942-NEXT:    s_mov_b32 s33, s32
-; GFX942-NEXT:    s_addk_i32 s32, 0x4040
+; GFX942-NEXT:    s_xor_saveexec_b64 s[2:3], -1
+; GFX942-NEXT:    s_add_i32 s1, s33, 0x4040
+; GFX942-NEXT:    scratch_store_dword off, v0, s1 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[2:3]
+; GFX942-NEXT:    s_addk_i32 s32, 0x4080
 ; GFX942-NEXT:    s_add_i32 s1, s33, 64
-; GFX942-NEXT:    s_mov_b32 s59, s1
+; GFX942-NEXT:    v_writelane_b32 v0, s55, 0
+; GFX942-NEXT:    s_mov_b32 s55, s1
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59
+; GFX942-NEXT:    ; use s55
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v0, 0
 ; GFX942-NEXT:    s_mov_b32 s32, s33
+; GFX942-NEXT:    s_xor_saveexec_b64 s[2:3], -1
+; GFX942-NEXT:    s_add_i32 s1, s33, 0x4040
+; GFX942-NEXT:    scratch_load_dword v0, off, s1 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[2:3]
 ; GFX942-NEXT:    s_mov_b32 s33, s0
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
-  call void asm sideeffect "; use $0", "{s59}"(ptr addrspace(5) %alloca0)
+  call void asm sideeffect "; use $0", "{s55}"(ptr addrspace(5) %alloca0)
   ret void
 }
 
@@ -864,48 +1414,83 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset(
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_1-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_1-NEXT:    s_lshr_b32 s4, s32, 5
-; GFX10_1-NEXT:    s_add_i32 s59, s4, 0x442c
+; GFX10_1-NEXT:    s_add_i32 s55, s4, 0x442c
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_1-NEXT:    ;;#ASMSTART
 ; GFX10_1-NEXT:    ; use alloca0 v0
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_1-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_3-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_3-NEXT:    s_lshr_b32 s4, s32, 5
-; GFX10_3-NEXT:    s_add_i32 s59, s4, 0x442c
+; GFX10_3-NEXT:    s_add_i32 s55, s4, 0x442c
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_3-NEXT:    ;;#ASMSTART
 ; GFX10_3-NEXT:    ; use alloca0 v0
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_3-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x8040
+; GFX11-NEXT:    scratch_store_b32 off, v1, s1 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX11-NEXT:    s_add_i32 s0, s32, 64
-; GFX11-NEXT:    s_add_i32 s59, s32, 0x442c
+; GFX11-NEXT:    s_add_i32 s55, s32, 0x442c
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; use alloca0 v0
 ; GFX11-NEXT:    ;;#ASMEND
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x8040
+; GFX11-NEXT:    scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
@@ -915,23 +1500,38 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset(
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
-; GFX12-NEXT:    s_add_co_i32 s59, s32, 0x43ec
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_store_b32 off, v1, s32 offset:32768 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX12-NEXT:    s_add_co_i32 s55, s32, 0x43ec
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s32
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; use alloca0 v0
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v1, off, s32 offset:32768 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    s_lshr_b32 s4, s32, 6
-; GFX8-NEXT:    s_add_i32 s59, s4, 0x442c
+; GFX8-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX8-NEXT:    s_add_i32 s55, s4, 0x442c
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
 ; GFX8-NEXT:    ;;#ASMSTART
@@ -939,15 +1539,26 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset(
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    s_lshr_b32 s4, s32, 6
-; GFX900-NEXT:    s_add_i32 s59, s4, 0x442c
+; GFX900-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX900-NEXT:    s_add_i32 s55, s4, 0x442c
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
 ; GFX900-NEXT:    ;;#ASMSTART
@@ -955,14 +1566,25 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset(
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-; GFX942-NEXT:    s_add_i32 s59, s32, 0x442c
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x8040
+; GFX942-NEXT:    scratch_store_dword off, v1, s2 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX942-NEXT:    s_add_i32 s55, s32, 0x442c
 ; GFX942-NEXT:    s_add_i32 s0, s32, 64
 ; GFX942-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX942-NEXT:    ;;#ASMSTART
@@ -970,14 +1592,20 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_immoffset(
 ; GFX942-NEXT:    ;;#ASMEND
 ; GFX942-NEXT:    s_and_b64 s[0:1], 0, exec
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x8040
+; GFX942-NEXT:    scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
   %alloca1 = alloca [4096 x i32], align 4, addrspace(5)
   %alloca1.offset = getelementptr [4096 x i32], ptr addrspace(5) %alloca1, i32 0, i32 251
   call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1.offset, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1.offset, i32 0)
   ret void
 }
 
@@ -985,54 +1613,89 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offse
 ; GFX10_1-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX10_1:       ; %bb.0:
 ; GFX10_1-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_1-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_1-NEXT:    s_lshl_b32 s4, s16, 2
-; GFX10_1-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_1-NEXT:    s_add_i32 s59, s59, s4
+; GFX10_1-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_1-NEXT:    s_add_i32 s55, s55, s4
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_1-NEXT:    s_addk_i32 s59, 0x4040
+; GFX10_1-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX10_1-NEXT:    ;;#ASMSTART
 ; GFX10_1-NEXT:    ; use alloca0 v0
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s59, scc
+; GFX10_1-NEXT:    ; use s55, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_1-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_1-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_1-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_1-NEXT:    s_waitcnt_depctr 0xffe3
+; GFX10_1-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_1-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_1-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX10_3-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX10_3:       ; %bb.0:
 ; GFX10_3-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_3-NEXT:    buffer_store_dword v1, off, s[0:3], s5 ; 4-byte Folded Spill
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_3-NEXT:    s_lshl_b32 s4, s16, 2
-; GFX10_3-NEXT:    s_lshr_b32 s59, s32, 5
-; GFX10_3-NEXT:    s_add_i32 s59, s59, s4
+; GFX10_3-NEXT:    s_lshr_b32 s55, s32, 5
+; GFX10_3-NEXT:    s_add_i32 s55, s55, s4
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
-; GFX10_3-NEXT:    s_addk_i32 s59, 0x4040
+; GFX10_3-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX10_3-NEXT:    ;;#ASMSTART
 ; GFX10_3-NEXT:    ; use alloca0 v0
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s59, scc
+; GFX10_3-NEXT:    ; use s55, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX10_3-NEXT:    s_xor_saveexec_b32 s4, -1
+; GFX10_3-NEXT:    s_add_i32 s5, s32, 0x100800
+; GFX10_3-NEXT:    buffer_load_dword v1, off, s[0:3], s5 ; 4-byte Folded Reload
+; GFX10_3-NEXT:    s_mov_b32 exec_lo, s4
+; GFX10_3-NEXT:    s_waitcnt vmcnt(0)
 ; GFX10_3-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX11-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX11:       ; %bb.0:
 ; GFX11-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX11-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX11-NEXT:    s_add_i32 s2, s32, 0x8040
+; GFX11-NEXT:    scratch_store_b32 off, v1, s2 ; 4-byte Folded Spill
+; GFX11-NEXT:    s_mov_b32 exec_lo, s1
 ; GFX11-NEXT:    s_add_i32 s1, s32, 64
+; GFX11-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX11-NEXT:    s_lshl_b32 s0, s0, 2
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s1
-; GFX11-NEXT:    s_add_i32 s59, s32, s0
+; GFX11-NEXT:    s_add_i32 s55, s32, s0
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; use alloca0 v0
 ; GFX11-NEXT:    ;;#ASMEND
-; GFX11-NEXT:    s_addk_i32 s59, 0x4040
+; GFX11-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s59, scc
+; GFX11-NEXT:    ; use s55, scc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX11-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX11-NEXT:    s_add_i32 s1, s32, 0x8040
+; GFX11-NEXT:    scratch_load_b32 v1, off, s1 ; 4-byte Folded Reload
+; GFX11-NEXT:    s_mov_b32 exec_lo, s0
+; GFX11-NEXT:    s_waitcnt vmcnt(0)
 ; GFX11-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX12-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
@@ -1042,29 +1705,44 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offse
 ; GFX12-NEXT:    s_wait_samplecnt 0x0
 ; GFX12-NEXT:    s_wait_bvhcnt 0x0
 ; GFX12-NEXT:    s_wait_kmcnt 0x0
+; GFX12-NEXT:    s_xor_saveexec_b32 s1, -1
+; GFX12-NEXT:    scratch_store_b32 off, v1, s32 offset:32768 ; 4-byte Folded Spill
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s1
+; GFX12-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX12-NEXT:    s_lshl_b32 s0, s0, 2
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s32
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_add_co_i32 s59, s32, s0
+; GFX12-NEXT:    s_add_co_i32 s55, s32, s0
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; use alloca0 v0
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_addk_co_i32 s59, 0x4000
+; GFX12-NEXT:    s_addk_co_i32 s55, 0x4000
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s59, scc
+; GFX12-NEXT:    ; use s55, scc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX12-NEXT:    s_xor_saveexec_b32 s0, -1
+; GFX12-NEXT:    scratch_load_b32 v1, off, s32 offset:32768 ; 4-byte Folded Reload
 ; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 exec_lo, s0
+; GFX12-NEXT:    s_wait_loadcnt 0x0
 ; GFX12-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX8-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX8:       ; %bb.0:
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX8-NEXT:    s_lshl_b32 s4, s16, 2
-; GFX8-NEXT:    s_lshr_b32 s59, s32, 6
-; GFX8-NEXT:    s_add_i32 s59, s59, s4
-; GFX8-NEXT:    s_addk_i32 s59, 0x4040
+; GFX8-NEXT:    s_lshr_b32 s55, s32, 6
+; GFX8-NEXT:    s_add_i32 s55, s55, s4
+; GFX8-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
 ; GFX8-NEXT:    ;;#ASMSTART
@@ -1072,17 +1750,28 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offse
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s59, scc
+; GFX8-NEXT:    ; use s55, scc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX900-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX900:       ; %bb.0:
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_store_dword v1, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    v_writelane_b32 v1, s55, 0
 ; GFX900-NEXT:    s_lshl_b32 s4, s16, 2
-; GFX900-NEXT:    s_lshr_b32 s59, s32, 6
-; GFX900-NEXT:    s_add_i32 s59, s59, s4
-; GFX900-NEXT:    s_addk_i32 s59, 0x4040
+; GFX900-NEXT:    s_lshr_b32 s55, s32, 6
+; GFX900-NEXT:    s_add_i32 s55, s55, s4
+; GFX900-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
 ; GFX900-NEXT:    ;;#ASMSTART
@@ -1090,16 +1779,27 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offse
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s59, scc
+; GFX900-NEXT:    ; use s55, scc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_load_dword v1, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
+; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    s_setpc_b64 s[30:31]
 ;
 ; GFX942-LABEL: scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offset:
 ; GFX942:       ; %bb.0:
 ; GFX942-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
+; GFX942-NEXT:    s_xor_saveexec_b64 s[2:3], -1
+; GFX942-NEXT:    s_add_i32 s1, s32, 0x8040
+; GFX942-NEXT:    scratch_store_dword off, v1, s1 ; 4-byte Folded Spill
+; GFX942-NEXT:    s_mov_b64 exec, s[2:3]
 ; GFX942-NEXT:    s_lshl_b32 s0, s0, 2
-; GFX942-NEXT:    s_add_i32 s59, s32, s0
-; GFX942-NEXT:    s_addk_i32 s59, 0x4040
+; GFX942-NEXT:    v_writelane_b32 v1, s55, 0
+; GFX942-NEXT:    s_add_i32 s55, s32, s0
+; GFX942-NEXT:    s_addk_i32 s55, 0x4040
 ; GFX942-NEXT:    s_add_i32 s0, s32, 64
 ; GFX942-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX942-NEXT:    ;;#ASMSTART
@@ -1107,14 +1807,20 @@ define void @scalar_mov_materializes_frame_index_unavailable_scc__gep_sgpr_offse
 ; GFX942-NEXT:    ;;#ASMEND
 ; GFX942-NEXT:    s_and_b64 s[0:1], 0, exec
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s59, scc
+; GFX942-NEXT:    ; use s55, scc
 ; GFX942-NEXT:    ;;#ASMEND
+; GFX942-NEXT:    v_readlane_b32 s55, v1, 0
+; GFX942-NEXT:    s_xor_saveexec_b64 s[0:1], -1
+; GFX942-NEXT:    s_add_i32 s2, s32, 0x8040
+; GFX942-NEXT:    scratch_load_dword v1, off, s2 ; 4-byte Folded Reload
+; GFX942-NEXT:    s_mov_b64 exec, s[0:1]
+; GFX942-NEXT:    s_waitcnt vmcnt(0)
 ; GFX942-NEXT:    s_setpc_b64 s[30:31]
   %alloca0 = alloca [4096 x i32], align 64, addrspace(5)
   %alloca1 = alloca [4096 x i32], align 4, addrspace(5)
   %alloca1.offset = getelementptr [4096 x i32], ptr addrspace(5) %alloca1, i32 0, i32 %soffset
   call void asm sideeffect "; use alloca0 $0", "v"(ptr addrspace(5) %alloca0)
-  call void asm sideeffect "; use $0, $1", "{s59},{scc}"(ptr addrspace(5) %alloca1.offset, i32 0)
+  call void asm sideeffect "; use $0, $1", "{s55},{scc}"(ptr addrspace(5) %alloca1.offset, i32 0)
   ret void
 }
 
diff --git a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
index e8dacc93a8f3c..17581bcb61e99 100644
--- a/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
+++ b/llvm/test/CodeGen/AMDGPU/materialize-frame-index-sgpr.ll
@@ -67,11 +67,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX7-NEXT:    v_mov_b32_e32 v0, 0x4040
 ; GFX7-NEXT:    v_mad_u32_u24 v0, v0, 64, s32
 ; GFX7-NEXT:    v_lshrrev_b32_e32 v0, 6, v0
-; GFX7-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX7-NEXT:    v_readfirstlane_b32 s54, v0
 ; GFX7-NEXT:    buffer_load_dword v0, off, s[0:3], s32
 ; GFX7-NEXT:    s_waitcnt vmcnt(0)
 ; GFX7-NEXT:    ;;#ASMSTART
-; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX7-NEXT:    ;;#ASMEND
 ; GFX7-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX7-NEXT:    v_readlane_b32 s54, v23, 15
@@ -133,12 +133,13 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX8-NEXT:    buffer_store_dword v0, off, s[0:3], s32
 ; GFX8-NEXT:    v_mov_b32_e32 v0, 0x4040
 ; GFX8-NEXT:    v_mad_u32_u24 v0, v0, 64, s32
+; GFX8-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX8-NEXT:    v_lshrrev_b32_e32 v0, 6, v0
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX8-NEXT:    v_readfirstlane_b32 s54, v0
 ; GFX8-NEXT:    buffer_load_dword v0, off, s[0:3], s32
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX8-NEXT:    v_readlane_b32 s54, v23, 15
@@ -199,12 +200,13 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    buffer_store_dword v0, off, s[0:3], s32
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
+; GFX900-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX900-NEXT:    v_add_u32_e32 v0, 0x4040, v0
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v0
+; GFX900-NEXT:    v_readfirstlane_b32 s54, v0
 ; GFX900-NEXT:    buffer_load_dword v0, off, s[0:3], s32
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX900-NEXT:    v_readlane_b32 s54, v23, 15
@@ -263,12 +265,13 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_addc_u32 s60, s32, 0x4040
-; GFX942-NEXT:    s_bitcmp1_b32 s60, 0
-; GFX942-NEXT:    s_bitset0_b32 s60, 0
-; GFX942-NEXT:    s_mov_b32 s59, s60
+; GFX942-NEXT:    s_addc_u32 s59, s32, 0x4040
+; GFX942-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX942-NEXT:    s_bitcmp1_b32 s59, 0
+; GFX942-NEXT:    s_bitset0_b32 s59, 0
+; GFX942-NEXT:    s_mov_b32 s54, s59
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX942-NEXT:    ;;#ASMEND
 ; GFX942-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX942-NEXT:    v_readlane_b32 s54, v23, 15
@@ -329,10 +332,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX10_1-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v24, 5, s32
+; GFX10_1-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v24, 0x4040, v24
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v24
+; GFX10_1-NEXT:    v_readfirstlane_b32 s54, v24
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX10_1-NEXT:    v_readlane_b32 s54, v23, 15
@@ -393,10 +397,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX10_3-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v24, 5, s32
+; GFX10_3-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v24, 0x4040, v24
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v24
+; GFX10_3-NEXT:    v_readfirstlane_b32 s54, v24
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX10_3-NEXT:    v_readlane_b32 s54, v23, 15
@@ -456,13 +461,14 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc
 ; GFX11-NEXT:    ;;#ASMEND
-; GFX11-NEXT:    s_addc_u32 s60, s32, 0x4040
+; GFX11-NEXT:    s_addc_u32 s59, s32, 0x4040
+; GFX11-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_1) | instid1(SALU_CYCLE_1)
-; GFX11-NEXT:    s_bitcmp1_b32 s60, 0
-; GFX11-NEXT:    s_bitset0_b32 s60, 0
-; GFX11-NEXT:    s_mov_b32 s59, s60
+; GFX11-NEXT:    s_bitcmp1_b32 s59, 0
+; GFX11-NEXT:    s_bitset0_b32 s59, 0
+; GFX11-NEXT:    s_mov_b32 s54, s59
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX11-NEXT:    ;;#ASMEND
 ; GFX11-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX11-NEXT:    v_readlane_b32 s54, v23, 15
@@ -524,14 +530,15 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc
 ; GFX12-NEXT:    ;;#ASMEND
-; GFX12-NEXT:    s_add_co_ci_u32 s60, s32, 0x4000
+; GFX12-NEXT:    s_add_co_ci_u32 s59, s32, 0x4000
+; GFX12-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_bitcmp1_b32 s60, 0
-; GFX12-NEXT:    s_bitset0_b32 s60, 0
+; GFX12-NEXT:    s_bitcmp1_b32 s59, 0
+; GFX12-NEXT:    s_bitset0_b32 s59, 0
 ; GFX12-NEXT:    s_wait_alu 0xfffe
-; GFX12-NEXT:    s_mov_b32 s59, s60
+; GFX12-NEXT:    s_mov_b32 s54, s59
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s59, scc
+; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:22], vcc, s54, scc
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX12-NEXT:    v_readlane_b32 s54, v23, 15
@@ -579,7 +586,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 
   ; scc is unavailable since it is live in
   call void asm sideeffect "; use $0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10",
-                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:22]},{vcc},{s59},{scc}"(
+                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:22]},{vcc},{s54},{scc}"(
     <16 x i32> %s0,
     <16 x i32> %s1,
     <16 x i32> %s2,
@@ -629,9 +636,9 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX7-NEXT:    ;;#ASMEND
 ; GFX7-NEXT:    v_mad_u32_u24 v22, 16, 64, s32
 ; GFX7-NEXT:    v_lshrrev_b32_e32 v22, 6, v22
-; GFX7-NEXT:    v_readfirstlane_b32 s59, v22
+; GFX7-NEXT:    v_readfirstlane_b32 s54, v22
 ; GFX7-NEXT:    ;;#ASMSTART
-; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX7-NEXT:    ;;#ASMEND
 ; GFX7-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX7-NEXT:    v_readlane_b32 s54, v21, 15
@@ -686,10 +693,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX8-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_mad_u32_u24 v22, 16, 64, s32
+; GFX8-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX8-NEXT:    v_lshrrev_b32_e32 v22, 6, v22
-; GFX8-NEXT:    v_readfirstlane_b32 s59, v22
+; GFX8-NEXT:    v_readfirstlane_b32 s54, v22
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX8-NEXT:    v_readlane_b32 s54, v21, 15
@@ -744,10 +752,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX900-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v22, 6, s32
+; GFX900-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX900-NEXT:    v_add_u32_e32 v22, 16, v22
-; GFX900-NEXT:    v_readfirstlane_b32 s59, v22
+; GFX900-NEXT:    v_readfirstlane_b32 s54, v22
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX900-NEXT:    v_readlane_b32 s54, v21, 15
@@ -801,12 +810,13 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_addc_u32 s60, s32, 16
-; GFX942-NEXT:    s_bitcmp1_b32 s60, 0
-; GFX942-NEXT:    s_bitset0_b32 s60, 0
-; GFX942-NEXT:    s_mov_b32 s59, s60
+; GFX942-NEXT:    s_addc_u32 s59, s32, 16
+; GFX942-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX942-NEXT:    s_bitcmp1_b32 s59, 0
+; GFX942-NEXT:    s_bitset0_b32 s59, 0
+; GFX942-NEXT:    s_mov_b32 s54, s59
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX942-NEXT:    ;;#ASMEND
 ; GFX942-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX942-NEXT:    v_readlane_b32 s54, v21, 15
@@ -862,10 +872,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX10_1-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v22, 5, s32
+; GFX10_1-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v22, 16, v22
-; GFX10_1-NEXT:    v_readfirstlane_b32 s59, v22
+; GFX10_1-NEXT:    v_readfirstlane_b32 s54, v22
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX10_1-NEXT:    v_readlane_b32 s54, v21, 15
@@ -921,10 +932,11 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX10_3-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v22, 5, s32
+; GFX10_3-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v22, 16, v22
-; GFX10_3-NEXT:    v_readfirstlane_b32 s59, v22
+; GFX10_3-NEXT:    v_readfirstlane_b32 s54, v22
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX10_3-NEXT:    v_readlane_b32 s54, v21, 15
@@ -978,13 +990,14 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX11-NEXT:    ;;#ASMEND
-; GFX11-NEXT:    s_addc_u32 s60, s32, 16
+; GFX11-NEXT:    s_addc_u32 s59, s32, 16
+; GFX11-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX11-NEXT:    s_delay_alu instid0(SALU_CYCLE_1) | instskip(SKIP_1) | instid1(SALU_CYCLE_1)
-; GFX11-NEXT:    s_bitcmp1_b32 s60, 0
-; GFX11-NEXT:    s_bitset0_b32 s60, 0
-; GFX11-NEXT:    s_mov_b32 s59, s60
+; GFX11-NEXT:    s_bitcmp1_b32 s59, 0
+; GFX11-NEXT:    s_bitset0_b32 s59, 0
+; GFX11-NEXT:    s_mov_b32 s54, s59
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX11-NEXT:    ;;#ASMEND
 ; GFX11-NEXT:    v_readlane_b32 s55, v21, 16
 ; GFX11-NEXT:    v_readlane_b32 s54, v21, 15
@@ -1042,9 +1055,10 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc
 ; GFX12-NEXT:    ;;#ASMEND
-; GFX12-NEXT:    s_mov_b32 s59, s32
+; GFX12-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX12-NEXT:    s_mov_b32 s54, s32
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s59, scc
+; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], s58, v[0:15], v[16:20], vcc, s54, scc
 ; GFX12-NEXT:    ;;#ASMEND
 ; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX12-NEXT:    v_readlane_b32 s55, v21, 16
@@ -1091,7 +1105,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs__lowe
 
   ; scc is unavailable since it is live in
   call void asm sideeffect "; use $0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10",
-                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:20]},{vcc},{s59},{scc}"(
+                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:20]},{vcc},{s54},{scc}"(
     <16 x i32> %s0,
     <16 x i32> %s1,
     <16 x i32> %s2,
@@ -1151,9 +1165,9 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX7-NEXT:    ;;#ASMSTART
 ; GFX7-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX7-NEXT:    ;;#ASMEND
-; GFX7-NEXT:    v_readlane_b32 s59, v22, 0
+; GFX7-NEXT:    v_readlane_b32 s54, v22, 0
 ; GFX7-NEXT:    ;;#ASMSTART
-; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX7-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX7-NEXT:    ;;#ASMEND
 ; GFX7-NEXT:    v_readlane_b32 s55, v23, 16
 ; GFX7-NEXT:    v_readlane_b32 s54, v23, 15
@@ -1188,58 +1202,66 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX8-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
 ; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_store_dword v23, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201100
 ; GFX8-NEXT:    buffer_store_dword v22, off, s[0:3], s6 ; 4-byte Folded Spill
 ; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
-; GFX8-NEXT:    v_writelane_b32 v22, s30, 0
-; GFX8-NEXT:    v_writelane_b32 v22, s31, 1
-; GFX8-NEXT:    v_writelane_b32 v22, s33, 2
-; GFX8-NEXT:    v_writelane_b32 v22, s34, 3
-; GFX8-NEXT:    v_writelane_b32 v22, s35, 4
-; GFX8-NEXT:    v_writelane_b32 v22, s36, 5
-; GFX8-NEXT:    v_writelane_b32 v22, s37, 6
-; GFX8-NEXT:    v_writelane_b32 v22, s38, 7
-; GFX8-NEXT:    v_writelane_b32 v22, s39, 8
-; GFX8-NEXT:    v_writelane_b32 v22, s48, 9
-; GFX8-NEXT:    v_writelane_b32 v22, s49, 10
-; GFX8-NEXT:    v_writelane_b32 v22, s50, 11
-; GFX8-NEXT:    v_writelane_b32 v22, s51, 12
-; GFX8-NEXT:    v_writelane_b32 v22, s52, 13
-; GFX8-NEXT:    s_lshr_b32 s4, s32, 6
-; GFX8-NEXT:    v_writelane_b32 v22, s53, 14
+; GFX8-NEXT:    v_writelane_b32 v23, s30, 0
+; GFX8-NEXT:    v_writelane_b32 v23, s31, 1
+; GFX8-NEXT:    v_writelane_b32 v23, s33, 2
+; GFX8-NEXT:    v_writelane_b32 v23, s34, 3
+; GFX8-NEXT:    v_writelane_b32 v23, s35, 4
+; GFX8-NEXT:    v_writelane_b32 v23, s36, 5
+; GFX8-NEXT:    v_writelane_b32 v23, s37, 6
+; GFX8-NEXT:    v_writelane_b32 v23, s38, 7
+; GFX8-NEXT:    v_writelane_b32 v23, s39, 8
+; GFX8-NEXT:    v_writelane_b32 v23, s48, 9
+; GFX8-NEXT:    v_writelane_b32 v23, s49, 10
+; GFX8-NEXT:    v_writelane_b32 v23, s50, 11
+; GFX8-NEXT:    v_writelane_b32 v23, s51, 12
+; GFX8-NEXT:    v_writelane_b32 v23, s52, 13
+; GFX8-NEXT:    s_lshr_b32 s5, s32, 6
+; GFX8-NEXT:    v_writelane_b32 v23, s53, 14
 ; GFX8-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX8-NEXT:    s_add_i32 s59, s4, 0x4240
-; GFX8-NEXT:    v_writelane_b32 v22, s54, 15
+; GFX8-NEXT:    s_add_i32 s4, s5, 0x4240
+; GFX8-NEXT:    ; implicit-def: $vgpr22 : SGPR spill to VGPR lane
+; GFX8-NEXT:    v_writelane_b32 v23, s54, 15
 ; GFX8-NEXT:    v_add_u32_e32 v0, vcc, 64, v0
+; GFX8-NEXT:    v_writelane_b32 v22, s4, 0
 ; GFX8-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX8-NEXT:    v_writelane_b32 v22, s55, 16
+; GFX8-NEXT:    v_writelane_b32 v23, s55, 16
 ; GFX8-NEXT:    ;;#ASMSTART
 ; GFX8-NEXT:    ; use alloca0 v0
 ; GFX8-NEXT:    ;;#ASMEND
 ; GFX8-NEXT:    ;;#ASMSTART
 ; GFX8-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX8-NEXT:    ;;#ASMEND
+; GFX8-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX8-NEXT:    v_readlane_b32 s54, v22, 0
 ; GFX8-NEXT:    ;;#ASMSTART
-; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX8-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX8-NEXT:    ;;#ASMEND
-; GFX8-NEXT:    v_readlane_b32 s55, v22, 16
-; GFX8-NEXT:    v_readlane_b32 s54, v22, 15
-; GFX8-NEXT:    v_readlane_b32 s53, v22, 14
-; GFX8-NEXT:    v_readlane_b32 s52, v22, 13
-; GFX8-NEXT:    v_readlane_b32 s51, v22, 12
-; GFX8-NEXT:    v_readlane_b32 s50, v22, 11
-; GFX8-NEXT:    v_readlane_b32 s49, v22, 10
-; GFX8-NEXT:    v_readlane_b32 s48, v22, 9
-; GFX8-NEXT:    v_readlane_b32 s39, v22, 8
-; GFX8-NEXT:    v_readlane_b32 s38, v22, 7
-; GFX8-NEXT:    v_readlane_b32 s37, v22, 6
-; GFX8-NEXT:    v_readlane_b32 s36, v22, 5
-; GFX8-NEXT:    v_readlane_b32 s35, v22, 4
-; GFX8-NEXT:    v_readlane_b32 s34, v22, 3
-; GFX8-NEXT:    v_readlane_b32 s33, v22, 2
-; GFX8-NEXT:    v_readlane_b32 s31, v22, 1
-; GFX8-NEXT:    v_readlane_b32 s30, v22, 0
+; GFX8-NEXT:    v_readlane_b32 s55, v23, 16
+; GFX8-NEXT:    v_readlane_b32 s54, v23, 15
+; GFX8-NEXT:    v_readlane_b32 s53, v23, 14
+; GFX8-NEXT:    v_readlane_b32 s52, v23, 13
+; GFX8-NEXT:    v_readlane_b32 s51, v23, 12
+; GFX8-NEXT:    v_readlane_b32 s50, v23, 11
+; GFX8-NEXT:    v_readlane_b32 s49, v23, 10
+; GFX8-NEXT:    v_readlane_b32 s48, v23, 9
+; GFX8-NEXT:    v_readlane_b32 s39, v23, 8
+; GFX8-NEXT:    v_readlane_b32 s38, v23, 7
+; GFX8-NEXT:    v_readlane_b32 s37, v23, 6
+; GFX8-NEXT:    v_readlane_b32 s36, v23, 5
+; GFX8-NEXT:    v_readlane_b32 s35, v23, 4
+; GFX8-NEXT:    v_readlane_b32 s34, v23, 3
+; GFX8-NEXT:    v_readlane_b32 s33, v23, 2
+; GFX8-NEXT:    v_readlane_b32 s31, v23, 1
+; GFX8-NEXT:    v_readlane_b32 s30, v23, 0
 ; GFX8-NEXT:    s_xor_saveexec_b64 s[4:5], -1
 ; GFX8-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX8-NEXT:    buffer_load_dword v23, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX8-NEXT:    s_add_i32 s6, s32, 0x201100
 ; GFX8-NEXT:    buffer_load_dword v22, off, s[0:3], s6 ; 4-byte Folded Reload
 ; GFX8-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX8-NEXT:    s_waitcnt vmcnt(0)
@@ -1250,58 +1272,66 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX900-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
 ; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_store_dword v23, off, s[0:3], s6 ; 4-byte Folded Spill
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201100
 ; GFX900-NEXT:    buffer_store_dword v22, off, s[0:3], s6 ; 4-byte Folded Spill
 ; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
-; GFX900-NEXT:    v_writelane_b32 v22, s30, 0
-; GFX900-NEXT:    v_writelane_b32 v22, s31, 1
-; GFX900-NEXT:    v_writelane_b32 v22, s33, 2
-; GFX900-NEXT:    v_writelane_b32 v22, s34, 3
-; GFX900-NEXT:    v_writelane_b32 v22, s35, 4
-; GFX900-NEXT:    v_writelane_b32 v22, s36, 5
-; GFX900-NEXT:    v_writelane_b32 v22, s37, 6
-; GFX900-NEXT:    v_writelane_b32 v22, s38, 7
-; GFX900-NEXT:    v_writelane_b32 v22, s39, 8
-; GFX900-NEXT:    v_writelane_b32 v22, s48, 9
-; GFX900-NEXT:    v_writelane_b32 v22, s49, 10
-; GFX900-NEXT:    v_writelane_b32 v22, s50, 11
-; GFX900-NEXT:    v_writelane_b32 v22, s51, 12
-; GFX900-NEXT:    v_writelane_b32 v22, s52, 13
-; GFX900-NEXT:    s_lshr_b32 s4, s32, 6
-; GFX900-NEXT:    v_writelane_b32 v22, s53, 14
+; GFX900-NEXT:    v_writelane_b32 v23, s30, 0
+; GFX900-NEXT:    v_writelane_b32 v23, s31, 1
+; GFX900-NEXT:    v_writelane_b32 v23, s33, 2
+; GFX900-NEXT:    v_writelane_b32 v23, s34, 3
+; GFX900-NEXT:    v_writelane_b32 v23, s35, 4
+; GFX900-NEXT:    v_writelane_b32 v23, s36, 5
+; GFX900-NEXT:    v_writelane_b32 v23, s37, 6
+; GFX900-NEXT:    v_writelane_b32 v23, s38, 7
+; GFX900-NEXT:    v_writelane_b32 v23, s39, 8
+; GFX900-NEXT:    v_writelane_b32 v23, s48, 9
+; GFX900-NEXT:    v_writelane_b32 v23, s49, 10
+; GFX900-NEXT:    v_writelane_b32 v23, s50, 11
+; GFX900-NEXT:    v_writelane_b32 v23, s51, 12
+; GFX900-NEXT:    v_writelane_b32 v23, s52, 13
+; GFX900-NEXT:    s_lshr_b32 s5, s32, 6
+; GFX900-NEXT:    v_writelane_b32 v23, s53, 14
 ; GFX900-NEXT:    v_lshrrev_b32_e64 v0, 6, s32
-; GFX900-NEXT:    s_add_i32 s59, s4, 0x4240
-; GFX900-NEXT:    v_writelane_b32 v22, s54, 15
+; GFX900-NEXT:    s_add_i32 s4, s5, 0x4240
+; GFX900-NEXT:    ; implicit-def: $vgpr22 : SGPR spill to VGPR lane
+; GFX900-NEXT:    v_writelane_b32 v23, s54, 15
 ; GFX900-NEXT:    v_add_u32_e32 v0, 64, v0
+; GFX900-NEXT:    v_writelane_b32 v22, s4, 0
 ; GFX900-NEXT:    s_and_b64 s[4:5], 0, exec
-; GFX900-NEXT:    v_writelane_b32 v22, s55, 16
+; GFX900-NEXT:    v_writelane_b32 v23, s55, 16
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; use alloca0 v0
 ; GFX900-NEXT:    ;;#ASMEND
 ; GFX900-NEXT:    ;;#ASMSTART
 ; GFX900-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX900-NEXT:    ;;#ASMEND
+; GFX900-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX900-NEXT:    v_readlane_b32 s54, v22, 0
 ; GFX900-NEXT:    ;;#ASMSTART
-; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX900-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX900-NEXT:    ;;#ASMEND
-; GFX900-NEXT:    v_readlane_b32 s55, v22, 16
-; GFX900-NEXT:    v_readlane_b32 s54, v22, 15
-; GFX900-NEXT:    v_readlane_b32 s53, v22, 14
-; GFX900-NEXT:    v_readlane_b32 s52, v22, 13
-; GFX900-NEXT:    v_readlane_b32 s51, v22, 12
-; GFX900-NEXT:    v_readlane_b32 s50, v22, 11
-; GFX900-NEXT:    v_readlane_b32 s49, v22, 10
-; GFX900-NEXT:    v_readlane_b32 s48, v22, 9
-; GFX900-NEXT:    v_readlane_b32 s39, v22, 8
-; GFX900-NEXT:    v_readlane_b32 s38, v22, 7
-; GFX900-NEXT:    v_readlane_b32 s37, v22, 6
-; GFX900-NEXT:    v_readlane_b32 s36, v22, 5
-; GFX900-NEXT:    v_readlane_b32 s35, v22, 4
-; GFX900-NEXT:    v_readlane_b32 s34, v22, 3
-; GFX900-NEXT:    v_readlane_b32 s33, v22, 2
-; GFX900-NEXT:    v_readlane_b32 s31, v22, 1
-; GFX900-NEXT:    v_readlane_b32 s30, v22, 0
+; GFX900-NEXT:    v_readlane_b32 s55, v23, 16
+; GFX900-NEXT:    v_readlane_b32 s54, v23, 15
+; GFX900-NEXT:    v_readlane_b32 s53, v23, 14
+; GFX900-NEXT:    v_readlane_b32 s52, v23, 13
+; GFX900-NEXT:    v_readlane_b32 s51, v23, 12
+; GFX900-NEXT:    v_readlane_b32 s50, v23, 11
+; GFX900-NEXT:    v_readlane_b32 s49, v23, 10
+; GFX900-NEXT:    v_readlane_b32 s48, v23, 9
+; GFX900-NEXT:    v_readlane_b32 s39, v23, 8
+; GFX900-NEXT:    v_readlane_b32 s38, v23, 7
+; GFX900-NEXT:    v_readlane_b32 s37, v23, 6
+; GFX900-NEXT:    v_readlane_b32 s36, v23, 5
+; GFX900-NEXT:    v_readlane_b32 s35, v23, 4
+; GFX900-NEXT:    v_readlane_b32 s34, v23, 3
+; GFX900-NEXT:    v_readlane_b32 s33, v23, 2
+; GFX900-NEXT:    v_readlane_b32 s31, v23, 1
+; GFX900-NEXT:    v_readlane_b32 s30, v23, 0
 ; GFX900-NEXT:    s_xor_saveexec_b64 s[4:5], -1
 ; GFX900-NEXT:    s_add_i32 s6, s32, 0x201000
+; GFX900-NEXT:    buffer_load_dword v23, off, s[0:3], s6 ; 4-byte Folded Reload
+; GFX900-NEXT:    s_add_i32 s6, s32, 0x201100
 ; GFX900-NEXT:    buffer_load_dword v22, off, s[0:3], s6 ; 4-byte Folded Reload
 ; GFX900-NEXT:    s_mov_b64 exec, s[4:5]
 ; GFX900-NEXT:    s_waitcnt vmcnt(0)
@@ -1339,10 +1369,12 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX942-NEXT:    ;;#ASMSTART
 ; GFX942-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX942-NEXT:    ;;#ASMEND
-; GFX942-NEXT:    s_add_i32 s59, s32, 0x4240
+; GFX942-NEXT:    s_add_i32 s58, s32, 0x4240
+; GFX942-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
 ; GFX942-NEXT:    s_and_b64 s[60:61], 0, exec
+; GFX942-NEXT:    s_mov_b32 s54, s58
 ; GFX942-NEXT:    ;;#ASMSTART
-; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX942-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX942-NEXT:    ;;#ASMEND
 ; GFX942-NEXT:    v_readlane_b32 s55, v22, 16
 ; GFX942-NEXT:    v_readlane_b32 s54, v22, 15
@@ -1379,7 +1411,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX10_1-NEXT:    v_writelane_b32 v22, s30, 0
 ; GFX10_1-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_1-NEXT:    s_lshr_b32 s4, s32, 5
-; GFX10_1-NEXT:    s_add_i32 s59, s4, 0x4240
+; GFX10_1-NEXT:    s_add_i32 s58, s4, 0x4240
 ; GFX10_1-NEXT:    v_writelane_b32 v22, s31, 1
 ; GFX10_1-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_1-NEXT:    s_and_b32 s4, 0, exec_lo
@@ -1404,8 +1436,10 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX10_1-NEXT:    ;;#ASMSTART
 ; GFX10_1-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX10_1-NEXT:    ;;#ASMEND
+; GFX10_1-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX10_1-NEXT:    s_mov_b32 s54, s58
 ; GFX10_1-NEXT:    ;;#ASMSTART
-; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX10_1-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX10_1-NEXT:    ;;#ASMEND
 ; GFX10_1-NEXT:    v_readlane_b32 s55, v22, 16
 ; GFX10_1-NEXT:    v_readlane_b32 s54, v22, 15
@@ -1442,7 +1476,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX10_3-NEXT:    v_writelane_b32 v22, s30, 0
 ; GFX10_3-NEXT:    v_lshrrev_b32_e64 v0, 5, s32
 ; GFX10_3-NEXT:    s_lshr_b32 s4, s32, 5
-; GFX10_3-NEXT:    s_add_i32 s59, s4, 0x4240
+; GFX10_3-NEXT:    s_add_i32 s58, s4, 0x4240
 ; GFX10_3-NEXT:    v_writelane_b32 v22, s31, 1
 ; GFX10_3-NEXT:    v_add_nc_u32_e32 v0, 64, v0
 ; GFX10_3-NEXT:    s_and_b32 s4, 0, exec_lo
@@ -1467,8 +1501,10 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX10_3-NEXT:    ;;#ASMSTART
 ; GFX10_3-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX10_3-NEXT:    ;;#ASMEND
+; GFX10_3-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX10_3-NEXT:    s_mov_b32 s54, s58
 ; GFX10_3-NEXT:    ;;#ASMSTART
-; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX10_3-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX10_3-NEXT:    ;;#ASMEND
 ; GFX10_3-NEXT:    v_readlane_b32 s55, v22, 16
 ; GFX10_3-NEXT:    v_readlane_b32 s54, v22, 15
@@ -1503,7 +1539,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX11-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX11-NEXT:    v_writelane_b32 v22, s30, 0
 ; GFX11-NEXT:    s_add_i32 s0, s32, 64
-; GFX11-NEXT:    s_add_i32 s59, s32, 0x4240
+; GFX11-NEXT:    s_add_i32 s58, s32, 0x4240
 ; GFX11-NEXT:    v_mov_b32_e32 v0, s0
 ; GFX11-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX11-NEXT:    v_writelane_b32 v22, s31, 1
@@ -1528,8 +1564,10 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX11-NEXT:    ;;#ASMSTART
 ; GFX11-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX11-NEXT:    ;;#ASMEND
+; GFX11-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX11-NEXT:    s_mov_b32 s54, s58
 ; GFX11-NEXT:    ;;#ASMSTART
-; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX11-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX11-NEXT:    ;;#ASMEND
 ; GFX11-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX11-NEXT:    v_readlane_b32 s55, v22, 16
@@ -1568,7 +1606,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX12-NEXT:    s_wait_alu 0xfffe
 ; GFX12-NEXT:    s_mov_b32 exec_lo, s0
 ; GFX12-NEXT:    v_writelane_b32 v22, s30, 0
-; GFX12-NEXT:    s_add_co_i32 s59, s32, 0x4200
+; GFX12-NEXT:    s_add_co_i32 s58, s32, 0x4200
 ; GFX12-NEXT:    v_mov_b32_e32 v0, s32
 ; GFX12-NEXT:    s_and_b32 s0, 0, exec_lo
 ; GFX12-NEXT:    ;;#ASMSTART
@@ -1593,10 +1631,12 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 ; GFX12-NEXT:    ;;#ASMSTART
 ; GFX12-NEXT:    ; def s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc
 ; GFX12-NEXT:    ;;#ASMEND
+; GFX12-NEXT:    ; kill: def $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 killed $sgpr48_sgpr49_sgpr50_sgpr51_sgpr52_sgpr53_sgpr54_sgpr55 def $sgpr54
+; GFX12-NEXT:    s_wait_alu 0xfffe
+; GFX12-NEXT:    s_mov_b32 s54, s58
 ; GFX12-NEXT:    ;;#ASMSTART
-; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s59, scc
+; GFX12-NEXT:    ; use s[0:15], s[16:31], s[32:47], s[48:55], s[56:57], v[0:15], v[16:21], vcc, s54, scc
 ; GFX12-NEXT:    ;;#ASMEND
-; GFX12-NEXT:    s_delay_alu instid0(VALU_DEP_1)
 ; GFX12-NEXT:    v_readlane_b32 s55, v22, 16
 ; GFX12-NEXT:    v_readlane_b32 s54, v22, 15
 ; GFX12-NEXT:    v_readlane_b32 s53, v22, 14
@@ -1644,7 +1684,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs_gep_i
 
   ; scc is unavailable since it is live in
   call void asm sideeffect "; use $0, $1, $2, $3, $4, $5, $6, $7, $8, $9",
-                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{v[0:15]},{v[16:21]},{vcc},{s59},{scc}"(
+                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{v[0:15]},{v[16:21]},{vcc},{s54},{scc}"(
     <16 x i32> %s0,
     <16 x i32> %s1,
     <16 x i32> %s2,
diff --git a/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll b/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll
index 79187f51af0d2..f70cd6816a966 100644
--- a/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll
+++ b/llvm/test/CodeGen/AMDGPU/schedule-amdgpu-tracker-physreg-crash.ll
@@ -44,7 +44,7 @@ define void @scalar_mov_materializes_frame_index_no_live_scc_no_live_sgprs() #0
 
   ; scc is unavailable since it is live in
   call void asm sideeffect "; use $0, $1, $2, $3, $4, $5, $6, $7, $8, $9, $10",
-                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:22]},{vcc},{s59},{scc}"(
+                           "{s[0:15]},{s[16:31]},{s[32:47]},{s[48:55]},{s[56:57]},{s58},{v[0:15]},{v[16:22]},{vcc},{s64},{scc}"(
     <16 x i32> %s0,
     <16 x i32> %s1,
     <16 x i32> %s2,



More information about the llvm-commits mailing list